Announcement Announcement Module
Collapse
No announcement yet.
Can I achieve this using chunks of some kind? Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can I achieve this using chunks of some kind?

    Hi!

    I'm currently RTFM, but so far I haven't seen anything that may help me in my case;

    Case:
    I have a .csv file that I want to process. This file contains lots of records, and may contain several records pr "actor" in my system. I want to process every actor as one chunk; not just each record.
    Are there any way of doing this except for setting the commitInterval to a value of.... 99999999 and sorting the records myself?
    Setting the commitInterval to a lower value will not help me as the number of records pr actor may vary from 1 -> 20ish.

    Any help will be apreciated

  • #2
    There are two samples (multilineJob and multiLineOrderJob) that have most of the features and patterns that people use to read files where records span multiple lines. As long as your actor data are contiguous you should be able to use something along those lines?

    Comment


    • #3
      Hi! Thanks for your reply, but I didn't see how I'd use it to my advantage (please tell me to try again if I'm wrong...)... But I'll write more about what I'm trying to achieve;

      Example csv file:

      some;row;name;actorId;contractId;product;fromdate; todate
      foo;bar;some;25;3332;TV;12.05.2009;
      FOO;BAR;stupid;25;3332;TV;24.07.2009;
      fOo;bAr;reference;30;225;TV;10.05.2008;12.05.2009
      FoO;BaR;without;30;225;TV;12.05.2009;
      foo;bar;any;25;3464;TV;01.07.2009;28.02.2010
      FOO;BAR;signifficance;25;3464;TV;01.10.2009;
      fOo;bAr;for;25;3464;TV;01.10.2009;
      FoO;BaR;us;25;3332;TV;01.10.2009;

      I want the reader to read the file ... and pass it to the writer as chunks of Records like this (as you may notice, it's a chunk sorted by actorId and contractId);

      foo;bar;some;25;3332;TV;12.05.2009;
      FOO;BAR;stupid;25;3332;TV;24.07.2009;
      FoO;BaR;us;25;3332;TV;01.10.2009;

      fOo;bAr;reference;30;225;TV;10.05.2008;12.05.2009
      FoO;BaR;without;30;225;TV;12.05.2009;

      foo;bar;any;25;3464;TV;01.07.2009;28.02.2010
      FOO;BAR;signifficance;25;3464;TV;01.10.2009;
      fOo;bAr;for;25;3464;TV;01.10.2009;

      .... and the writer will then process the list of records that's input to the writer ...., and should commit when it's finished processing that specific set of records.

      Comment


      • #4
        If you need to sort your file, then you'll probably have to load the file directly into a staging table and then do a query that has an ORDER BY clause.

        Comment


        • #5
          Well.... a workaround with a database sounds much like a bad thing when it comes to performance. Anyway; thanks for the tip.

          .., but I made myself a custom FlatFileItemReader that on the first call to doRead() reads the whole file, sort it and returns the first chunk. On the next call, it'll give the next one, etc, etc..

          This way, I managed to get it the way I wanted it. ... for now.

          Comment


          • #6
            I wouldn't really consider the staging table approach to be a workaround. While your approach works for small files, you should be wary of running out of memory if you try to load a file that's very large.

            The database approach also has benefits for restartability. If you load the file into the database, you can use standard restart approaches (like a process indicator flag or saving the cursor state). With your approach you'd have to load the entire file back into memory (ensuring that the order is the same) and then skip down to the record where you left off.

            Comment


            • #7
              Yes, you are right... and I have to test this with a much heavier load than anticipated to ensure the memory won't be a problem. Thanks for your input!

              Comment

              Working...
              X