Announcement Announcement Module
Collapse
No announcement yet.
Multithreading while reading a flat file Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multithreading while reading a flat file

    Hi,

    I am pretty new to spring batch. My batch application has to read from an enormous flat file, proccess the data and write it into various tables of the database. I want to partition the flat file and let one thread process each partition. Is this possible? If not then what can be the optimal solution for achieving good concurrency while reading flat files. Please help.

  • #2
    The TaskExecutorPartitionHandler might work for you. The biggest problem might be knowing how to partition the file into line ranges - you need to know how many lines there are before you start, which could be tough for large files. Some projects have used native DB loaders to load the file into a database table (one column for the whole line) and found that this was quicker than trying to partition the file itself.

    If you partition the file, you can limit the number of lines read by an individual instance of FlatFileItemReader using execution context entries <reader_name>.item.count, <reader_name>.item.count.max (where reader_name is the name you gave the component). This feature is not very clearly documented (and I think the "max" property was added since RC1), so I'd be interested to hear what your experience is.

    Comment


    • #3
      Thanks for the reply. We currently use staging tables and sql loader to load the columns. But what I feel is that the hits to the staging table are redundant and expensive too. Plus we have to depend on DB procs and write a new proc everytime the DB changes. But I also feel that the staging table pattern is a tried and tested batch approach. So it is a trade off between performance and complexity. I think this disucussion is going towards design decisions. I ll stop here . I ll try the approach that you suggested and see if the performance gains (if at all there are gains) outweigh the complexity of managing the transaction state for a flat file.

      Comment


      • #4
        Originally posted by Dave Syer View Post
        The TaskExecutorPartitionHandler might work for you. The biggest problem might be knowing how to partition the file into line ranges - you need to know how many lines there are before you start, which could be tough for large files. Some projects have used native DB loaders to load the file into a database table (one column for the whole line) and found that this was quicker than trying to partition the file itself.

        If you partition the file, you can limit the number of lines read by an individual instance of FlatFileItemReader using execution context entries <reader_name>.item.count, <reader_name>.item.count.max (where reader_name is the name you gave the component). This feature is not very clearly documented (and I think the "max" property was added since RC1), so I'd be interested to hear what your experience is.
        Dave can you please give a small example of using TaskExecutorPartitionHandler with StepExecutionSplitter.
        to use partitioning.
        This will be helpfull.
        Thanks.

        Comment

        Working...
        X