Announcement Announcement Module
No announcement yet.
Announcement: Spring Integration Prototype Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Announcement: Spring Integration Prototype

    I committed some prototype code using Spring Integration and Spring Batch to show how to use the Enterprise Integration Patterns to scale up batch applications. There are six main use cases implemented, which should be familiar to anyone who has seen me giving a presentation recently (if you haven't then there is a Webinar on Wednesday next week - check the SpringSource website for registration details).

    The old spring-batch-integration project has been moved to spring-batch-infrastructure-tests (because that's what it is), and the new stuff is in the spring-batch-integration slot. So SVN co-ordinates are:

    Each use case has its own package:

    1. Automatic declarative transactional retry and repeat of a MessageHandler (unit tests only in package "retry" using existing features in Spring Integration).

    2. Chunk-oriented middleware (package "chunk"). Uses a special ItemWriter to dispatch messages to a channel, which should be durable for restartable production use, and a listener on the worker process that wraps a business ItemWriter.

    3. Complex item-processing using a message flow (package "item"). Business logic for item processing can be arbitrarily complex using content-based routing to steer the item through a set of business process stages.

    4. Message trigger for a batch job (package "launch"). Spring Integration can be used to bridge between any external trigger (e.g. Quartz, file directory polling, web service, etc.) and a JobLauncher.

    5. Non-linear Job flow (package "job"). Provides a Step implementation that acts as a MessageEndpoint, so that instead of a linear series of steps , a job can be composed of an arbitrary message flow.

    6. Idempotent file processing (package "file"). A special purpose Job that processes all the files in a directory, treating each one as a source of messages. Spring Batch restartability means that no file will ever be processed more than once, even if the job fails, or is stopped.

    There is a seventh use case to be added (Asynchronous Aggregator), as well as much work to be done polishing and finishing off the existing stuff. If anyone has any more suggestions, feel free to ask.

    Please note that this code will be distributed and published in the usual repositories, but will not be part of the "official" 1.1 release (mainly because it requires Java 5, and we are not going to go to Java 5 until Spring Batch 2.0). It is work in progress, and subject to heavy refactoring until we get to 2.0 proper. We are publishing it now owing to high demand for samples of how to use Spring Batch in a concurrent and massively parallel setting.

    Finally, I would like to thank Jonas Partner for his work writing some of the code.
    Last edited by Dave Syer; Jun 13th, 2008, 08:22 AM. Reason: Add SVN co-ordinates

  • #2
    Thanks Dave. This should be really helpful for someone who is just starting with spring batch and has a complex business requirement

    However I am getting this error when trying to download source using Tortoise SVN

    Error: PROPFIND request failed on '/svnroot/springframework/spring-batch'  
    Error: PROPFIND of '/svnroot/springframework/spring-batch': Could not resolve hostname `': The requested name is valid and was found in the database, but   
    Error: it does not have the correct associated data being resolved for.   (


    • #3
      Looks like you have a connectivity problem with sourceforge. I suspect that would be local, since I can access it easily, and no-one else has complained. Can you see it in a web browser?


      • #4
        Yes, I can see it in the browser.

        looks like some other issue...i'll try it and let you know.


        • #5
          Thanks for posting it in early time. We can find a lot of things (seems like doing integration things). But could you outline a bit how you use the componenet to do the 'massively parallel processing' in multiple JVM.

          A concrete use case may be:

          1. In day 1, the batch job is running on single machine single JVM and works fine.
          2. In day 2, the job load grew to a level that the processing time cannot accept. But the job processing logic has been written and complex. We want to partition the input records to execute in multiple JVM.

          When running it distributed, we also need to cater how to do the restart / failover / monitoring on sub-jobs.

          Could you outline your desired approach to handle it by using the posted classes?


          • #6
            Your "day 2" could be implemented using my use case #2 (assuming that the bottleneck was in the processing - this is usually the case for any complex job). Thus the relevant classes are the ones in the "chunk" package. You split your step up into an input and chunk dispatch (master), using your original ItemReader plugged into a ChunkMessageChannelItemWriter, and a set of listeners (workers) using an ItemWriterChunkHandler and your original ItemWriter. You have to connect the master to the workers with durable channels (JMS) to be sure of correct transactional and restart behaviour. Restart and failures are handled in the master process. I have tried this on a single VM to test that it works, but (as expected if the processing is mainly CPU bound) in that case the performance is not significantly improved over the single-threaded case.


            • #7
              Thanks for your pointer. To clarify, I have some follow up queries

              1. On Master side, it is running a SB job and have a step that dispatch chunks to workers.

              2. On worker side, is it another SB job? or technically can be any program. The only criteria is the program need to be able to reply the chunk processing result to master. Please clarify or elaborate more on this.

              3. You mentioned you have tried on a single VM. Is that something difficult for you to try on multiple VM or even multiple machine? If so, could you mention the issues you think you need to tackle or what issues blocked you to proceed further easily? I believe your concerns will defintely be our concerns as well.


              • #8
                1: correct

                2: I guess, in principal "any program". In practice it will be a listener of some sort (e.g. MessageContainerListener), the sort depending on the middleware.

                3: Nothing difficult about it. I just haven't had time to set it up.


                • #9
                  The proposed implementation seems still require developers to do a lot to handle the 'coordination' things for parallel processing, which they should not need to care. Also, the logic seems only catered the 'basic' of parallel processing. Seems some more things to consider, e.g. handling of worker failure, failover, high availability, load balancing, etc. Will next version of SB (i.e. 2.0) will cater all these essential issues for batch?

                  In parallel, we are studying to use GridGain with Spring Batch to handle the parallel issues. To our current paper work research, GridGain seems able to do the parallel neatly and handled all things I mentioned above. How to you position between the SB parallel processing strategy and the GridGain solution? Are they mutually exclusive? or one better than another on a particular area?


                  • #10
                    It's only a prototype you know! I'm pleased if it generates feedback, so please try it out and let us improve it. I have yet to try this stuff out in any realistic scenarios. I hope other people will have more time to do so and provide some commentary, so we can see where to take it next.

                    I don't think GridGain and Spring Batch are mutually exclusive at all. But as far as I know their model does not deal with transactional message processing. Maybe a combination of JMS and GridGain to spawn the worker processes would meet your requirements.

                    I actually prefer GigaSpaces as a grid provider because they provide JMS out of the box. It's a different model than GridGain, and more closely aligned with the data than the processing., hence it is easier to imagine transactional concerns being adequately covered.

                    If I had to stick my neck out, I might say that the robustness of the worker pool is unlikely to be a direct concern of Spring Batch. I wouldn't rule it out completely, but it seems to me that such concerns are more to do with the runtime than the programming model.


                    • #11
                      I actually prefer GigaSpaces as a grid provider because they provide JMS out of the box.
                      A bit different understaning. Seems that GridGain support JMS for its Discovery and Communication SPIs:

                      It's a different model than GridGain, and more closely aligned with the data than the processing., hence it is easier to imagine transactional concerns being adequately covered.
                      GridGain and Gigaspaces are of different positioning. GridGain focus on Compute Grid and Gigaspaces focus on Data Grid. In fact, they are not multually exclusive. GridGain is actually providing Gigaspaces checkpoint SPI implmentation out-of-the-box.


                      • #12

                        The latest from SVN won't compile for me:

                        $ mvn compile
                        [INFO] Scanning for projects...
                        [INFO] ------------------------------------------------------------------------
                        [ERROR] FATAL ERROR
                        [INFO] ------------------------------------------------------------------------
                        [INFO] Failed to resolve artifact.
                        GroupId: org.springframework.batch
                        ArtifactId: spring-batch
                        Version: 2.0.0.CI-SNAPSHOT
                        Reason: Unable to download the artifact from any repository
                        from the specified remote repositories:
                          central (,
                          springsource-external (,
                          springsource-snapshot (,
                          springsource-release (,
                          springsource-milestone (
                        [INFO] ------------------------------------------------------------------------
                        [INFO] Trace
                        org.apache.maven.reactor.MavenExecutionException: Cannot find parent: org.springframework.batch:spring-batch for project: null:spring-batch-integration:jar:null for project null:spring-batch-integrat


                        • #13
                          Where did you grab the code from? It looks like a recent snapshot, which isn't going to be in any maven repository right now. Do you have all the projects downloaded? And if so, did you do an install on all of them?


                          • #14
                            Ah yes. I had just done a svn checkout (from the David's first post in this thread) and tried mvn compile. I'm not a heavy Maven user and had forgotten that it can't auto-magically grab every dependency under the sun. I've done a checkout one level up from spring-batch/trunk, and all is well.

                            Thanks and sorry for the noise.


                            • #15
                              You need the parent pom as well - it's a chicken and egg situation (the parent pom defines the repository where it can be downloaded). The easiest thing would be to checkout the whole source tree (one level above where you started).