Announcement Announcement Module
No announcement yet.
Multiple Files For Input Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple Files For Input

    Our file-based batch jobs usually involve several files. For example, if we are getting order feeds, it might look like

    order-2008-03-21-08.xml (8am file)
    order-2008-03-21-09.xml (9am file)
    order-2008-03-21-10.xml (10am file)

    I'd like to configure my batch job process as many of these order files that are present in the drop directory (and move them to an archive dir after its done).

    Is there any way to do this short of configuring the job to run over and over again?

    We thought about wrapping the file item reader with a guy who monitors the list of files, but the issues around mark(), reset(), etc. are kind of nasty.

    Any thoughts on a good way to do this?

  • #2
    This might be a good use case for a ResourceItemReader (there's a JIRA for that already I think but I can't check the id at the moment). The idea there was to have an ItemReader that returned each file as a Resource from a directory. It won't be in 1.0 but there's the idea - it wouldn't take much to implement if you needed it in the mean time.
    Last edited by Dave Syer; Mar 21st, 2008, 03:40 PM. Reason: spelling


    • #3
      I understand what you're saying, if a chunk spans two files it could get kind of nasty. I would think about buffering it for mark and reset support. At least that's the only way I can think to do it now. That way, you wouldn't need to try and go back to another file in case of a reset.


      • #4
        I think we can assume that such files *could* be processed one at a time and it would make sense (so there can't be any issues with chunk boundaries)?

        Anyway the ResourceItemReader issue is You could use that to consolidate the files into one, and then process that in a subsequent step. Partail failure and restart would be handled as normal.
        Last edited by Dave Syer; Mar 22nd, 2008, 11:57 AM. Reason: spelling


        • #5
          It really depends on the size of the files. If you have 3 files that are 150 mb each, you would have to worry about chunk boundaries because they're too big to read in one at a time. Consolidating them with one step and reading the resulting file might not be too bad of an idea, unless they're about 1 gb or so a piece, then that starts to get time consuming as well. I suppose it depends on how big the files will be.


          • #6
            We implemented this and contributed on the JIRA.


            Let us know what you think!