Announcement Announcement Module
No announcement yet.
How To Tackle Common Situations With Spring Batch Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • How To Tackle Common Situations With Spring Batch

    Hello and thank you for taking the time to read.

    I have a couple of questions that I have to believe are common occurrences but that I am having a hard time finding any solid answers on. I have implemented these already in SB, but it feels clunky to me. I'm hoping to get these posted in one place for other people.

    Here is a simplified setup of one of my current SB rigs:

    I have a legacy, external process that continuously generates XML files into a directory.

    My task is to:

    1) For each hour in the day
    2) Get a directory listing of all the XML files in the directory
    3) Read each file and validate them against an XML schema
    4) Load values from the XML files into a database table
    5) Compress all the files that were processed into an archive (ZIP, TGZ, etc.) file
    6) Delete the files
    7) For each file that fails validation, move the file to a designated directory for manual review

    Here is what I would do, but I'll leave out some decisions actions so that I can inquire about their implementation later.

    1) Use Quartz or some other scheduler to launch this job every hour
    2) Create an ItemStreamReader. In the open() method, call File.listFiles() and use its results to populate a Queue. Each call to read() should remove the next File in the queue and return it.
    3) Create an ItemProcessor that accepts a File and returns a Document. The processor reads the File from disk and passes it through an XML validator. If the validation fails, return null to indicate a filter action occurred, otherwise, return a File/Document object.
    4) Create an ItemWriter that accepts a list of File/Document objects. For each Document, extract the necessary values and write them to a new row in the database. Use the JdbcBatchItemWriter.
    5) Create an ItemWriter that accepts a list of File/Document objects. For each File, write the file's contents to an archive file.
    6) Create a CompositeItemWriter to combine the two writers

    So here are my questions. These are just my initial impressions, but I could be misunderstanding the SB framework.

    1) At what point can I delete the files?
    1.a) I cannot delete them as part of this step because if for some reason a database commit fails, an entire chunk will be rolled-back but the files in the chunk will have already been deleted. This leads to data loss as the files do no exist on disk anymore and their contents have been removed from the database as part of the roll-back.
    1.b) I cannot delete them as part of a separate step because I do not have a way to communicate the list of files that was processed in the first step. I cannot delete the entire folder because new files may have appeared while processing the first file.

    2) How do I archive the files?
    2.a) I would like to again, come up with a list of all the files that have been successfully processed and write them all to a single archive. I can see that it would be trivial to write a single chunk to a file, but how do I write all chunks to a single archive? For example, if I have 100 XML files to process, and SB is configured with a chunk-size of ten, how do I archive all 100 files into the same archive? Do I have to do it as 10 separate archives?

    3) How do I move files that fail XML validation to a designated directory?
    3.a) I suppose I could simply move the files to the directory immediately after failing. However, where
    would that take place? In the processor itself or a listener? I wanted to use an ItemProcessListener to monitor when an item had been filtered out, report the reason for the filtering, and then move the file to the directory. Moving the file to the directory was easy, but I could not figure out a way to log the reason for the filtering. I therefore had to ditch the ItemProcessListener approach and move all the logic into the Processor. That just felt clunky.

  • #2
    Sorry for not getting to this sooner. We are in the process of moving to StackOverflow for our forums.
    This question is probably a better candidate for StackOverflow, perhaps against the #spring-batch tag. If you do post it there, please reply here with the link. Thanks!