3 Replies Latest reply: Nov 1, 2018 9:10 AM by Andrey Krylov RSS

    Join all Excel files in folder

    Anton Korobkov

      Hello.

       

      I have a following problem.

      Suppose I have two data sources, which are folders with Excel files. There are multiple spreadsheets in both folders, in which there are different columns, which share common key in a following manner:

       

      Folder1:

      Dataset_one_1.xlsx

       

      key | value

      1       7

      2       14

      3       9

      4       2

      5       6

       

      Folder2:

      Dataset_two_1.xlsx

       

      key| text

      1      one

      2      two

       

      Dataset_two_2.xlsx

       

      key| text

      3      three

      4      four

      5      five

       

      I want to join these two data sources, and attempt to achieve so looks like this:

       

       

      first_source:
      LOAD
          key,
          value
      FROM [lib://my_source_one/Dataset_one*.xlsx]
      (ooxml, embedded labels, table is Sheet1);
      
      
      
      
      
      
      Left Join (first_source)
      second_source:
      LOAD
          key,
          "text"
      FROM [lib://my_source_two/dataset_two*.xlsx]
      (ooxml, embedded labels, table is Sheet1);
      

       

      So far so good.

       

      When I try to investigate loaded data and create a table to validate load procedure, I see following picture:

       

      test_join.png

      For some reason, rows from second file are not present. How do I fix that? This is a test example, the structure of an actual dataset is far more complicated and may contain tens of Excel files in both directories.

       

      Thanks in advance!