Announcement Announcement Module
No announcement yet.
input data partioning for parallel run Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • input data partioning for parallel run

    After reading Spring Batch use cases from the web site, I see that Spring Batch is supposed to support parallel processing of chunks. I guess the natural way of doing it, is to set a TaskExecutorRepeatTemplate as the stepOperations of the SimpleStepExecutor (instead of the default RepeatTemplate), and configure this template with an appropriate TaskExecutor.
    All this is provided by the Spring Batch infrastructure. But this is not sufficent for parallel execution : I'm missing input data partitioning; how shall I partition input data so that each chunk only processes a part of it (and does not overlap other chunks data) ? Is there any help from the framework to achieve this in a clean way ? This is mentionned in the Use case page as implementation issues, but I do not see neither samples nor framework classes handling this.
    Or did I miss something ?

    I guess this depends on the type of input data (flat file, SQL), and I therefore appreciate the mentioned way of proxying the ItemProvider with a partitioning interceptor (or maybe proxying the InputSource in the case of an InputSourceItemProvider). This would make the tasklet unaware of parallel execution, which is the ultimate goal; but this would also force the tasklet to be ItemProviderProcessorTasklet and/or the ItemProvider to be an InputSourceItemProvider (one can assume this is mandatory for parallel execution).

    Thanks in advance.

  • #2
    You did not miss anything and your analysis is sound (see also There are opportunities to use the infrastructure to set up parallel processing, but you will have to do a small amount of coding, and it will be limited to special situations (like middleware or ESB mediated input and/or output channels). To make it more generic will take some domain model changes and (as you point out) some special features for the file and SQL channels. This kind of support using the Spring Batch Core API will not be included in 1.0.