Announcement Announcement Module
No announcement yet.
File processing in sequence Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • File processing in sequence

    I have the following use case:

    1. Some external Report is arriving in parts (ftp location) in not predictable time intervals but in the correct sequence.
    2. Each part is one file with the name: Something_2_10 where 2 denotes the current part number and 10 is the total number of parts
    3. I know that the all parts are transferred when I see Something_10_10 in ftp.
    4. I need to concatenate all rows for all parts of the same report preserving order by part/row in some queue (backed by redis or jms, or anything else you'd suggest)
    5. Split that list of rows into batches. List of N rows should be split into X batches of M rows.
    6. Write in batches to elasticsearch in multiple threads.

    I think I know how to do 6, if I have a payload of type List in my service activator that completes the pipeline.
    Or maybe I should write custom elasticsearch output adapter.

    For every other step, any hint is much appreciated.

    Thanks in advance,

  • #2
    ftp inbound channel adapter -> header enricher -> resequencer -> custom splitter.

    The header enricher should populate correlationId, sequenceNumber and sequenceSize headers based on the file name. If the files aren't too big, you can add a file-to-string-transformer before the splitter. Otherwise you'd need to read the rows in your custom splitter.