23 Replies Latest reply: Oct 31, 2018 3:22 PM by Nandhakumar P RSS

    Incremental multiple csv files load

    Nandhakumar P

      Hi There

      Please help to achieve the below,

       

      I need to load multiple .CSV files on Incremental basis and folder location is same but file name will differ by date suffix and based on the new or updated record in csv files should updated into base qvd (Insert New record, Update existing record). Next time if a new file comes it should be iterated.

       

      Please help to provide the script.

       

      Thanks

      Nandhakumar

        • Re: Incremental multiple csv files load
          Juraj Misina

          There are quite a few unclear things in your description:

          - will all processed files remain in the folder?

          - is there any reliable record identification in those files?

          - do you only need to replace individual updated records or you need to replace whole contents of a file (e.g. file abc-20181025.csv will be replaced by abc-20181026.csv)?

          Juraj

            • Re: Incremental multiple csv files load
              Nandhakumar P

              Hi Juraj

               

              First, Thanks for your reply. Let Me explain you further.

              - will all processed files remain in the folder?

              Answer: No, It will be moved to Archive folder

              - is there any reliable record identification in those files?

              Answer: ContactID & Email address will be unique identification in each files

              - do you only need to replace individual updated records or you need to replace whole contents of a file (e.g. file abc-20181025.csv will be replaced by abc-20181026.csv)?

              Answer:

              1. For a specific file, i need to update & Delete the specific records only from the new files not the entire qvd and other records should remains there.

              2.Also in another qvd file i should keep updating the new records only and no delete required there.

               

              I hope i made things clear and i ready to explain further if required.

              Thanks again

              Nandhakumar

                • Re: Incremental multiple csv files load
                  Dilip Ranjith

                  How are you identifying if a record has changed and needs updation?

                    • Re: Incremental multiple csv files load
                      Nandhakumar P

                      Hi Dilip

                       

                      No identification will be there inside the file but if the record is updated in source and then file with the updated records will be place in the specific folder. We have to consider the file as updates and concatenate the same in QVD.

                       

                      I hope my answer is clear. Please let me know if any clarifications.

                       

                      Thanks

                      Nandhakumar

                        • Re: Incremental multiple csv files load
                          Dilip Ranjith

                          My question was regarding individual records do you have a updated/changed date or flag or something similar.

                          if yes look at link I shared earlier on incremental load.

                           

                          If not you may need to use hash functions to check if a row has changed. I.e. always create a hash of the columns concantenated together use that as a check if row has changed

                           

                           

                           

                          Thank You

                          Dilip Ranjith

                           

                          Sent via mobile

                            • Re: Incremental multiple csv files load
                              Dilip Ranjith

                              Expanding on above

                              Create a hash key from all the fields (or at least all non-key fields). You can check for each row identifiers/keys if the calculated hash has changed. If the hash value is different at least one field has a new value. And you need to update

                               

                              See below for some additional details

                              www.qlikfix.com/2014/03/11/hash-functions-collisions/

                                • Re: Incremental multiple csv files load
                                  Nandhakumar P

                                  dilipranjith

                                  Here is the script i'm using now,

                                   

                                  set vRoot = 'LIB://Conn';

                                  set vFirstLoad=0; //set 1 for first load

                                   

                                  FOR Each File in filelist ('$(vRoot)'&'\*Contact*.csv')

                                  CurrentFileList:

                                  Load '$(File)' as File

                                  autogenerate 1;

                                  next File

                                  if(vFirstLoad=1) then

                                  MyTable:

                                  LOAD *,Date(Right(Left(FileBaseName(),37),10),'MM/DD/YYYY') as [UpdateDate]

                                  FROM [LIB://Conn\*Contact*.csv]

                                      (txt, codepage is 28591, embedded labels, delimiter is ',', msq);

                                  else

                                   

                                  OldFileList:

                                  LOAD File as History

                                  FROM [LIB://Conn\FileList.qvd] (qvd);

                                   

                                  NoConcatenate

                                  NewFiles:

                                  Load File as NewFile

                                      resident CurrentFileList

                                      Where not exists(History, File);

                                   

                                  MyTable:

                                      LOAD *

                                  FROM [LIB://Conn\MyTable.qvd] (qvd);

                                       

                                  FOR EACH File in FieldValueList('NewFile')

                                      Concatenate(MyTable)

                                      LOAD *,Date(Right(Left(FileBaseName(),37),10),'MM/DD/YYYY') as [UpdateDate]

                                  FROM [$(File)] (txt, codepage is 28591, embedded labels, delimiter is ',', msq);

                                   

                                      next;

                                     

                                  END IF

                                   

                                  STORE CurrentFileList into [LIB://Conn\FileList.qvd] (qvd);

                                   

                                  STORE MyTable into [LIB://Conn\MyTable.qvd] (qvd);

                                   

                                  drop table CurrentFileList;

                                  --------------------------------------------------------------------------

                                  The above script loads everything on daily basis. I need another resident table or qvd which should have only unique contact records with all updates applied. Can you help on this?

                                   

                                  Thanks

                                  Nandhakumar

                          • Re: Incremental multiple csv files load
                            Juraj Misina

                            Hi,

                            this one sounds tricky. You say there's no update indication on the record, so I assume this:

                            A file xyz_20181026.csv contains following records:

                            ContactID, Email, Value
                            1, a@a.a, 10
                            2, a@a.a, 20
                            3, b@b.b, 30
                            4, c@c.c, 40
                            

                            The next day, record 2 gets updated and record 4 gets deleted, so you'll get a new version of the file xyz_20181027.csv with following content:

                            ContactID, Email, Value
                            1, a@a.a, 10
                            2, d@d.d, 25
                            3, b@b.b, 30
                            

                            IF my assumption is correct, then you need something like this (not tested, but you'll get the idea)

                            NewData:
                            LOAD
                            FileBaseName() as SourceFileFull,
                            SubField(FileBaseName(), '_', 1) as SourceFileKey, //just a file name without date info
                            SubField(FileBaseName(), '_', 1)&'|'&ContactID&'|'&Email as RecordKey,
                            *
                            From [lib://SourceDir/*.csv]
                            (txt, embedded labels, delimiter is ;);
                            
                            If(FileSize('lib://SourceDir/HistoryQVD.qvd')>0) then
                            Concatenate(NewData)
                            LOAD
                            *
                            From [lib://SourceDir/HistoryQVD.qvd](qvd)
                            Where
                            Not(Exists(SourceFileKey)
                            ;
                            EndIf
                            
                            Store NewData into [lib://SourceDir/HistoryQVD.qvd](qvd);
                            

                             

                            If only updated records are exported in the file again, then your increment key would be RecordKey field (so you'd use it in he Exists function instead of SourceFileKey field). But, in such case there's no way to find out wich records were deleted.

                            Hope this helps.

                            Juraj

                        • Re: Incremental multiple csv files load
                          Dilip Ranjith

                          As mentioned by Juraj

                          there are a lot of unknowns here.

                          Assuming that the csv files all remain in same folder and that you want to do incremental (updates and possibly deletes)

                          2 steps.

                          Step 1 - Set up the csv file load (i.e. load only the new files

                          Basically for the first load (based on a flag) load the file names and data into a qvds

                          Subsequent loads

                          - get list of files names

                          - load old list of file names from qvd

                          - create another table with latest filenames only

                          - load data from qvd

                          - loop through new file names and load which gets autoconcatenated

                          - drop unnecessary tables (havent done this in the app to prove it works)

                          - save qvds

                          see attached script


                          Step 2 - implement incremental load (update and/or) delete on the latest loads. check below

                          https://www.analyticsvidhya.com/blog/2014/09/qlikview-incremental-load/

                          • Re: Incremental multiple csv files load
                            Chris Allen

                            It is actually quit simple, but the explanation is kind of complex so I will try my best to break it down.

                             

                            1. Make sure that in all of your csv files the column names are consistent in naming and location.

                            2. Create a loop to cycle through the directory/folder where your csv files are located. After the first cycle you will need to concatenate.

                            3. Create a key field. This is important in the incremental load portion as you will not bring in those items from the qvd that are existing in your csv load. This will act as an insert/update.

                             

                            Example:

                             

                             

                            DIRECTORY ["Location of the files"];

                            FOR EACH File in FileList ('*.csv')

                            temp:
                            LOAD "Your columns and other derived fields needed"
                            FROM $(File)
                            (
                            txt, codepage is 1252, no labels, delimiter is spaces, msq, header is 1 lines);


                            SET Concatenate = concatenate;

                            next File

                             

                             

                            LOAD *
                            Resident ACHFile;

                            Concatenate

                            LOAD *
                            FROM "Path to qvd file";

                            store temp into "Path to qvd"(qvd);

                            drop table temp;

                             

                              • Re: Incremental multiple csv files load
                                Nandhakumar P

                                Hi Chris

                                 

                                Thanks for your inputs but i tried but still it is not concatenating. Also in above script yo mentioned as "ACHFile", please clarify what is for? And it is still not concatenating the records.

                                 

                                Please review my code based on your help.

                                 

                                Thanks

                                Nandhakumar

                                  • Re: Incremental multiple csv files load
                                    Chris Allen

                                    Apologies, I should of replaced ACHFile with temp, and my code base is for QlikView.

                                    • Re: Incremental multiple csv files load
                                      Chris Allen

                                      set vRoot = 'LIB://Conn';


                                      FOR Each File in filelist ('$(vRoot)'&'\*.csv')
                                      CurrentFileList:
                                      Load '$(File)' as File
                                      (txt, codepage is 28591, embedded labels, delimiter is ',', csv)
                                      autogenerate 1;

                                      SET Concatenate = concatenate;  
                                      next File

                                       

                                      if(NOT ISNULL(QvdCreationTime('Path to qvd'))) -- check if qvd exists
                                      RESULTS:
                                      LOAD *
                                      RESIDENT CurrentFileList;

                                      Concatenate

                                      LOAD *
                                      FROM [LIB://Conn\MyTable.qvd] (qvd)
                                           Where not exists(ContactID);

                                       

                                      STORE RESULTS INTO into [LIB://Conn\FileList.qvd] (qvd);
                                      DROP TABLE RESULTS;
                                      DROP TABLE CurrentFileList;
                                      ELSE
                                      STORE CurrentFileList; INTO into [LIB://Conn\FileList.qvd] (qvd);
                                      DROP TABLE CurrentFileList;
                                      END IF

                                        • Re: Incremental multiple csv files load
                                          Nandhakumar P

                                          Hi Chris

                                           

                                          Here is my updated code and i'm working on Qlik sense desktop version.

                                          set vRoot = 'LIB://Conn';

                                           

                                          FOR Each File in filelist ('$(vRoot)'&'\*contact*.csv')

                                          CurrentFileList:

                                          Load '$(File)' as File //(txt, codepage is 28591, embedded labels, delimiter is ',', txt);

                                          autogenerate 1;

                                          SET Concatenate = concatenate; 

                                          next File

                                           

                                           

                                          if Not ISNULL(QvdCreateTime('LIB://Conn\MyFile.qvd')) then // check if qvd exists

                                          RESULTS:

                                          LOAD *

                                          RESIDENT CurrentFileList;

                                          Concatenate

                                           

                                          LOAD *

                                          FROM [LIB://Conn\MyTable.qvd] (qvd)

                                          Where not exists(ContactID);

                                           

                                          STORE RESULTS INTO into [LIB://Conn\FileList.qvd] (qvd);

                                          DROP TABLE RESULTS;

                                          DROP TABLE CurrentFileList;

                                          ELSE

                                          STORE CurrentFileList into [LIB://Conn\FileList.qvd] (qvd);

                                          DROP TABLE CurrentFileList;

                                          END IF

                                           

                                          Temp:

                                          LOAD *

                                          FROM [LIB://Conn\MyTable.qvd] (qvd);

                                           

                                          Still Concatenate is not working. Also attached the sample data file FYR.

                                           

                                          Assume attached 3 files will be loaded on daily basis. (i.e.) each file will be loaded on consecutive days.

                                          For initial Load use: contact_eloqua_full_export2018-10-29

                                          Day 2 load: contact_eloqua_full_export2018-10-30 (For Update example: Record no: 1 is updated with FirstName, LastName, Address1)

                                          Day3 load: contact_eloqua_full_export2018-10-31  (Example to Add New Record)

                                           

                                          Finally i should have unique record (i.e.) No Duplicate Records. I hope things are clear.

                                           

                                          Please help to fix this.

                                           

                                          Thanks

                                          Nandhakumar