Announcement Announcement Module
Collapse
No announcement yet.
General Questions Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • General Questions

    Hi All,

    I've been evaluating Spring-Batch - got as far as reading available docs, downloading source, running samples and trying out my own PoC.

    All very good , but I have a few questions (please forgive the fact that I haven't read all the source):

    - What threading is implemented in the execution process? Is is safe to run within a J2EE application server? I noticed a dependency to the CommonJ WorkManager package: are these used by Spring-Batch?

    - What is the ideal work scope of the tasklet? Is seems better design to have lots of small 'steps' but what if the business code is quite complex and requires lots of data processing? Should these be encapsulated into a separate 'service'?

    - Has there been any work into integrating third party rules processing engines (e.g. Drools,JRules)? Is that best placed within the tasklet/processor code?

    - Has there been any work into integrating third party data cache solutions (e.g. Tangosol)? Again, is that best placed within the tasklet/processor code?

    - Is there a method of passing a business data model in memory between steps?

    - (I think this question has been asked on this list before but it was a while ago so apologies for asking it again!) Is there a method of passing runtime information from the job script to the input/output sources? I'm thinking particularly of how to specify what files to process or date constraints etc.

    Many thanks for any response,

    Chris

  • #2
    Originally posted by rotis23 View Post
    - What threading is implemented in the execution process? Is is safe to run within a J2EE application server? I noticed a dependency to the CommonJ WorkManager package: are these used by Spring-Batch?
    It is safe to run out of the box because no threading is done. Where is the dependency on WorkManager by the way? I assume it is only in JavaDocs because we delegate all threading concerns to the Spring TaskExecutor. There are issues with restartability when you use concurrent execution, but some jobs do not require that. After 1.0 we plan to address that more generally.

    - What is the ideal work scope of the tasklet? Is seems better design to have lots of small 'steps' but what if the business code is quite complex and requires lots of data processing? Should these be encapsulated into a separate 'service'?
    I think of it as like a controller or action in a presentation tier - i.e. above the service layer. Business delegate pattern is definitely recommended when implementing ItemProcessor.

    - Has there been any work into integrating third party rules processing engines (e.g. Drools,JRules)? Is that best placed within the tasklet/processor code?
    No, and yes, in that order. A rules processing engine would be trivial to integrate with Spring Modules, so we might not do anything concrete in Spring Batch. Let us know if there is anything awkward that you think ought to be provided natively.

    - Has there been any work into integrating third party data cache solutions (e.g. Tangosol)? Again, is that best placed within the tasklet/processor code?
    Same answer! But actually the details depend pretty heavily on what you wanted to do with the data cache.

    - Is there a method of passing a business data model in memory between steps?
    Not right now - we had that feature in an early unreleased version actually, and we decided it wasn't adding enough value. If you pass state between steps in memory you also need to persist it in case there is a failure and a restart, so actually there isn't much benefit without some cleverness. It might be worth the effort to look at it again at some point.

    - (I think this question has been asked on this list before but it was a while ago so apologies for asking it again!) Is there a method of passing runtime information from the job script to the input/output sources? I'm thinking particularly of how to specify what files to process or date constraints etc.
    BatchResourceFactoryBean is the standard way of doing that. It is something we are discussing internally a lot at the minute so if you have any observations or suggestions we'd be quite pleased to get some input. I am particularly keen to retain the traceability of the JobIdentifier - you shouln't be able to simply run a job twice with different parameters and not be able to tell the differences later in some sort of persistent storage (audit trails). See also http://opensource.atlassian.com/proj...rowse/BATCH-84.
    Last edited by Dave Syer; Nov 12th, 2007, 02:05 PM. Reason: spelling

    Comment


    • #3
      Thanks for your response Dave - looking at spring-batch as a controller makes sense.

      Where is the dependency on WorkManager by the way?
      Umm - I thought I saw it when Maven was resolving dependencies by its not in the repository. I guess, if Spring is starting new threads it would have to use WorkManagers with a J2EE container.

      If you pass state between steps in memory you also need to persist it in case there is a failure and a restart, so actually there isn't much benefit without some cleverness.
      I was thinking of optimising throughput, but I guess that depends on how coupled the steps are...which leads me to the questions below.

      It is something we are discussing internally a lot at the minute so if you have any observations or suggestions we'd be quite pleased to get some input.
      All I can suggest is the InputSource resource could be defined via a database lookup that would return a filename(s). Not sure if that can be done right now but it would solve one of my problems.

      Just a couple further questions just to clarify exactly where spring-batch sits architecturally:

      - Does/will spring-batch support FSM type functionality whereby the next steps will be invoked based on controller context and state? Is this something you're trying to avoid?

      - What about simple Control-M type control whereby dependencies (next steps) can be called based on an exit condition? I think this is done already for retry functionality?

      - Also, the docs mention SEDA and potential work with Mule; will we see event-driven tasklets/processors over an ESB? If so, will those events contain context and state info? Or is it intended that that technology will only extend the input/output source services?

      Comment


      • #4
        Originally posted by rotis23 View Post
        All I can suggest is the InputSource resource could be defined via a database lookup that would return a filename(s). Not sure if that can be done right now but it would solve one of my problems.
        You can write your own ResourceFactoryBean, or just a label generator for the BatchResourceFactoryBean. The key question is "how does the query get its parameters?" What do you need to know to locate a filename (wherever it comes from)? The general idea is that it should be nothing that you can't find from the JobIdentifier - otherwise the JobIdentifier doesn't uniquely identify the job. If you agree on that point all we need is more detail on your use case and we can figure out the right strategy.

        - Does/will spring-batch support FSM type functionality whereby the next steps will be invoked based on controller context and state? Is this something you're trying to avoid?
        We de-scoped that for now, but a few people have asked on the forum about integrating rules/workflow engines. I think it has to be inside a step for now. Later we might add something closer to FSM as an alternative to a sequence of steps.

        - What about simple Control-M type control whereby dependencies (next steps) can be called based on an exit condition? I think this is done already for retry functionality?
        Retry/ Restart only happens after failure, so if that's one of the features you are thinking of then yes. But I think conditional execution of a step based on the exit code of the previous step sounds like a sensible use case. Currently you would have to work it out for yourself inside the step logic. If you can describe a simple way to implement something better we will definitely consider it (limited bandwidth for thinking at the moment).

        - Also, the docs mention SEDA and potential work with Mule; will we see event-driven tasklets/processors over an ESB? If so, will those events contain context and state info? Or is it intended that that technology will only extend the input/output source services?
        Yes you will see event-driven tasklets (not in 1.0, but we aim to be sure that we can easily do it later). The experiments and prototypes we have do not yet contain context/state, but I have been thinking that it might be necessary in the long term. Shouldn't be a problem.

        Comment


        • #5
          Thanks for your help Dave.

          Comment


          • #6
            May I add a comment...

            Regarding Control-M, as far as I know it will not execute steps based on conditions, but Jobs based on conditions. So this use case still applies to Spring Batch.

            Still I think it could definitely be useful to have some kind of conditions before starting a step. At the earlier stages of the Batch there was an interface that let you decide weather to start a given Step or not, but then that was removed.

            It does not seem overly complex and as the framework gets more users it will be an often requirement.

            Regarding the threading question, the framework will not create any Threads unless you use a specific parallel execution strategy for a Step (which I think it is not even documented yet). The only thread that is created by the framework is when you use a Launcher.

            Still, the "do not start threads in an App Server" rule is not being honored in lots of other places (just take a look at the JMS support in Spring, or ehCache, just to name the ones I remember). BEA seems to cope with it just fine, and most servers do as well.

            Also take into consideration what triggers a Job when you run the Batch inside an App Server. For example, Quartz is not part of the Framework so strictly speaking it is not Spring Batch the one creating threads. You could also have some external trigger launch jobs, in which case the app server will be in charge of controlling threads. Or you could also use the Timer MBean to start jobs (more simple scheduling but....) where again the app server is the on starting the jobs. The same thing with JMS, or WS..... I.e I don't see the framework creating threads for Jobs.

            You could also have "workers" installed in App Servers and a controller send messages to them to execute jobs (i.e. distributed execution) but this is the same case as before. I think the Spring Batch team is working in a distributed solution as well. (Dave?)

            Again, the only place I see the framework creating threads is for parallel processing in steps; in that particular case a specific "in container" strategy could be useful.

            Just my 2 cents
            Regards
            AB

            Comment

            Working...
            X