Announcement Announcement Module
Collapse
No announcement yet.
Processing streams... Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Processing streams...

    Hiyas,

    I have two questions with regard to processing items:

    1.) Is there a way to pass data from one step to the next without writing it to a repository of some sort - whether it be a file, db, or jms?

    2.) Is it possible to start a step processing as soon as input is available for it - say if the output of one step is the input to the next?

    The repeat templates appear as though this may enable such functionality but there seems to be a few caveats which make it ambigous:

    1.) The fact that JDBC writers are not multi-threadable. (at least I think - i am not sure if (6.12. Preventing State Persistence) will work in my scenario.

    2.) I am not running repeated operation per item... but rather want to repeat the operations for each item.

    Essentially what I am trying to figure out is the best way to create a pipeline that allows me to make consecutive services calls for data. (i.e. step 1: collect it, step 2: validate it, step 3: modify it)

    Are batch steps the right tool for the job or do I need to use integration techniques - i.e. Spring Integration?

    Thanks.

    Keith

  • #2
    1.) Is there a way to pass data from one step to the next without writing it to a repository of some sort - whether it be a file, db, or jms?
    Usually you *want* to write it to the repository in case the second step fails. If step 1 puts data in the context and the step 2 read it, but fails, then when the job is restarted on step 2, you still need that information.

    2.) Is it possible to start a step processing as soon as input is available for it - say if the output of one step is the input to the next?
    The next step won't start until the current step is done. The kind of pipeline you are describing would have to be simulated within the step.

    Essentially what I am trying to figure out is the best way to create a pipeline that allows me to make consecutive services calls for data. (i.e. step 1: collect it, step 2: validate it, step 3: modify it)
    Isn't this just ItemReader + ItemProcessor + ItemWriter? Or ItemReader + Composite ItemWriter?

    Comment


    • #3
      Originally posted by DHGarrette View Post
      Usually you *want* to write it to the repository in case the second step fails. If step 1 puts data in the context and the step 2 read it, but fails, then when the job is restarted on step 2, you still need that information.
      Good point... I failed to think of that.


      Originally posted by DHGarrette View Post
      The next step won't start until the current step is done. The kind of pipeline you are describing would have to be simulated within the step.
      Hrm... sounds like it would be possible with the RepeatTemplate? It would boil the whole job down to a single step though...

      Originally posted by DHGarrette View Post
      Isn't this just ItemReader + ItemProcessor + ItemWriter? Or ItemReader + Composite ItemWriter?
      Yes... but the steps I mentioned were high-level steps for the purposes of the example... in reality, each of those steps detailed would be composed of several steps and what I am trying to achieve is a successful pipeline that will not wait for one item to get all the way through before the next item launches.

      The intent behind the questioning was to try and find the best way to create my pipeline within the batch framework. I have done a bit of research to which have come to realize that I am actually looking for a combination of tools. The best course of action seems to be to use BPEL and an ESB for hosting my service steps. I still need an invocation method that I would prefer to work in parallel sequences though. I am hoping to be able to use spring batch for that.

      The one way that I am thinking it might work is if I configure parallel steps in batch but the question I have is what happens if the 2nd step in the process attempts to read (say from a JMS queue) and the 1st step has not written an output yet... will the process quit or wait for input? How is the end of the chain determined?

      If it fails, I may have to just use batch to launch the job and have standalone processes which just run and listen to the queue and process anything in it. the downside is they would have to run constantly instead of just when the job executes.

      Thoughts/Comments?

      Thanks.

      Keith

      Comment


      • #4
        My personal preference would be for Spring Integration if you have complex, data-driven or dynamic paths for your processing. If the path is fixed there's nothing wrong with ItemProcessor/ItemWriter. If you like BPEL, knock yourself out.

        As far as the "end of the chain" goes, the contract for Spring Batch with existing implementations of Step is easy: if you use an ItemReader the step finishes when it returns null; if you use a vanilla Tasklet the step finishes when the tasklet says it is complete.

        If you are reading from JMS you can easily get a null message if your timeout is short, so you would have to make sure that you know when to expect the items to be ready before reading them. It's not hard to do.

        Comment


        • #5
          Originally posted by Dave Syer View Post
          My personal preference would be for Spring Integration if you have complex, data-driven or dynamic paths for your processing. If the path is fixed there's nothing wrong with ItemProcessor/ItemWriter. If you like BPEL, knock yourself out.
          Yes... I am siding with the BPEL approach right now because the JBI containers seems to help abstract and standardize the integration pattern application. I can use Spring Integration but then I have to manually wire all the components myself instead of taking advantage of resources a container offers me.

          Originally posted by Dave Syer View Post
          As far as the "end of the chain" goes, the contract for Spring Batch with existing implementations of Step is easy: if you use an ItemReader the step finishes when it returns null; if you use a vanilla Tasklet the step finishes when the tasklet says it is complete.

          If you are reading from JMS you can easily get a null message if your timeout is short, so you would have to make sure that you know when to expect the items to be ready before reading them. It's not hard to do.
          Thanks for the clarification but the more I thought about it... I think I will probably just use spring batch as the initiator and not the step manager. The reasoning is that should I decide to invoke this process for another means, I do not want to have to rewire the steps together. It's also looser coupling to pass from one system to the next only once instead of weaving them together. Since I am opting for the process management approach, I think it would behoove me to build it as a process.

          Thanks.

          Keith

          Comment

          Working...
          X