Announcement Announcement Module
No announcement yet.
Async processing during an ItemProcessor step - is that ok? Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Async processing during an ItemProcessor step - is that ok?

    Hi folks

    Over the last day or so I have been evaluating Spring Batch as a candidate for our batch framework which we need to replace as a result of additional use cases that must be (and are not currently) supported. Overall I like what I see but I'd like to hear whether anybody has any comments on this suggested use for the framework - specifically the Item Processor bit.

    suggested batch process:
    - Read files from filesystem
    - For each file found:

    1. Open Batch
    2. Pick up file from file system
    3. ItemReader - read file contents and parse XML elements using Stax
    4. As they get parsed out pass each element through a series of ItemProcessor steps: (maybe use the remote chunking pattern for each ItemProcessor as these will be longer-running)
    a. validate element
    b. transform element into a SOAP message
    c. send element via 2-way SOAP over JMS (asynchronous) to service where it is consumed and a response is sent back
    d. receive asynchronous response
    e. parse out response element and convert to custom DTO
    5. ItemWriter - write out (append) response element to outputfile (one for entire batch)
    6. ItemProcessor - additional post-processing step may be required here to update db table or send another JMS message.
    7. Close Batch

    My concern is the asynchronous nature of the middle ItemProcessor step and whether Spring Batch may have any problems with it - with it being a longer-running process - is this feasible?

  • #2
    Nothing wrong with background processing, especially if it is backed by durable middleware (the biggest problem is always knowing whether a piece of data has been processed before on a restart, so durability helps a lot there).

    The bottleneck in your case is the input file. You might find that processing many small files in parallel is much quicker than a single large file, or if all you have is a single large file it might help to stage it into a database before business processing starts.