Announcement Announcement Module
No announcement yet.
Can scale batch job by using distrubuted steps ? Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can scale batch job by using distrubuted steps ?

    Hi there,
    My use case is that a batch jobs consists of 3 steps. The first step will need to read input item and for each item it saves the downloaded result to files.
    The step 2 reads items from files and processes, writes to db.

    Now I want to be able to scale it. However, the scaling solution, remote chunking, only allows the delegate of chunk processing of one step to remote batch slaves. So it will be a problem to my case. If for the first step, the chunk processing is done at each slave, the downloaded files are also saved at different slaves. And if I use remote chunking pattern for the step 2, it won't be able to correctly pick up these files as the processing might be delegated to different slaves.

    Is there any way I can remote chunking for not only a chunk processor but also a series of steps ? For example, a slave after processing its chunk of items, can start the next steps of the jobs as well ? Theoritically I can programmatically trigger the job with these steps at the slave end but then it will create a separate job entry.

    Now, the only solution I can think of is to write the data to a message queue instead of local files. But I prefer the solution that don't have to change the program much.

    Thank you for your help.

  • #2
    Can you mount the location of the files in a common place? I've seen this typically handled where the files to be processed are stored on a network share somewhere...

    If that isn't an option, we have seen the use of JMS to remotely execute jobs which is basically what you're looking to do (the only way to execute multiple steps the way you're asking is to execute them as a job).


    • #3
      Thanks Minella,
      I think i will go with the JMS way. It is possible to move these steps into a job but then I won't have an aggregated execution information anymore.