Announcement Announcement Module
No announcement yet.
Run Spring Batch jobs on grids and local clusters? Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Run Spring Batch jobs on grids and local clusters?


    I would like to know if Spring Batch can be used to run jobs on compute grids and local clusters.

    We run third-party Perl scripts and C binaries to perform CPU-intensive analyses on protein sequences. The inputs to these programs are two flat files (protein sequences and mathematical models), and the output is a flat file containing the results (predictions of protein domains and families). The programs are wrapped by Java methods that execute the scripts and binaries, and parse the result files. We have additional Java code to process and persist the results. Most of the tasks are embarrassingly parallel (there are no dependencies between the tasks) so are ideally suited to parallel processing. We run calculations several times per day, requiring thousands of CPU hours per month.

    We have a Linux cluster of 60 8-core machines (480 cores in total). We share these machines with other departments in the organisation, and use Plaftorm LSF to schedule compute jobs based on job priority, resource availability ...etc. Our software is downloaded and installed by third parties, some of whom use Sun Grid Engine (SGE) or OpenPBS instead of LSF to schedule jobs. Other users are interested in running our software on compute grids via Globus, gLite and Condor, and on clouds such as Amazon EC2.

    The key point is that we do not have exclusive use of the Linux cluster. It is therefore vital that Spring Batch only runs jobs on cluster nodes that have been allocated to it by LSF. The same is true for those people who download our software and run it on SGE and OpenPBS. In this important respect, ProActive Parallel Suite is the only job scheduling framework we have found that can submit and monitor jobs on LSF, SGE, OpenPBS, grids and clouds. We are interested in using Spring Batch to manage our workflows, possibly in combination with ProActive Parallel Suite.

    I would appreciate any advice on using Spring Batch with local job scheduling systems, compute grids and clouds. A possible solution is to combine Spring Batch with ProActive Parallel Suite, but please feel free to suggest alternatives.



  • #2
    Originally posted by aquinn View Post
    I would like to know if Spring Batch can be used to run jobs on compute grids and local clusters.

    The scheduling and provisioning constraints that you mention would have to be handled by the grid provider. I assume the ProActive product can do this, but would be interested to hear about your experience. I suggest you look into implementing the PartitionHandler for ProActive Parallel Suite if that is your choice (I haven't tried that one myself but it does work for a variety of alternatives).