Announcement Announcement Module
No announcement yet.
Problem resuming a job at restart position provided at "stop on" Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem resuming a job at restart position provided at "stop on"

    I've a problem with restarting a job, which was stopped via "<stop on..."
    All steps are tasklet steps.
    The job is stopped correctly on the specified condition, but when I restart the job the "restart" attribute of the <stop on..> state is ignored.
    If I leave the step in FAILED or STOPPED it is re-executed again. If I leave it at COMPLETED the job complains it has nothing to do with the step.
    But the restart step is never run.
    Here is the part of my job definition containing the restart declaration:

        <job id="simpleRestart" parent="simpleJob">
            <step id="first" parent="firstStep" next="second"/>
            <step id="second" parent="secondStep" next="third"/>
            <step id="third">
                <tasklet ref="thirdStepWF" allow-start-if-complete="false" transaction-manager="transactionManager">
        		   <listener ref="step3Listener"/>
                <end on="COMPLETED"/>
                <stop on="PROTOKOLL" restart="restartThird" />
            <step id="restartThird" parent="thirdStepRestart"/>
    The listener sets the exit status to PROTOKOLL. SimpleJob uses the provided SimpleJob with restartable=true.

    Btw.: The end and stop transitions only seam to work if the default namespace is set to Batch. As soon as I specify them as "<batch:stop" they are not recognized but the repeat mechnism of the tasklet will move on to restart if the step fails.

    Did I muss understand this functionality? Any hint would be appreciated.


  • #2
    The job will always in general only re-execute a Step that didn't complete successfully on the last try. The two exceptions are:

    1. it has status COMPLETED, but has asked to be re-executed using the allowStartIfComplete flag, in which case it is re-executed (not what you want).

    2. it has status ABANDONED, in which case it is not re-executed.

    Maybe you can use your listener to set the status to ABANDONED or COMPLETED before exiting? Not sure if that will work, but give it a try.
    Last edited by Dave Syer; Jul 7th, 2009, 11:06 AM. Reason: added comment


    • #3
      I set the status of the step to ABANDONED. I get the same message ("Step already complete or not restartable, so no action to execute") as if it is completed, but the specified restart step is not called.

      Under which conditions will the restart step be activated?

      I looked at the Unit test in FlowJobTests and it behaves quit different. Within the unit test the end states are processed like steps. But if I set a breakpoint in EndState within my workflow, it is never reached.

      Does anybody have a running job using this feature?



      • #4
        There are plenty of integration tests in org.springframework.batch.core.configuration.xml. But none of them use a FAILED step. If you are OK with breakpoints, set one in AbstractJob#shouldStart() and see how the decision is being made with your step on restart.


        • #5
          I added a new integration test for a failed step with resume on another ( That shows that ADANDONED should already be the status of your stopped step execution. There are actually a couple of similar tests, but this one seems closest to your use case. It works.

          Maybe the problem is that you don't have a transition on="*"?


          • #6
            Thank you, I will give your integration test a closer look and compare the behaviour here with those within my job.


            • #7
              Thanks Dave,
              with the integration test you provided I could identify the problem.
              If the first step of a job has the property "allowStartIfComplete" set to true the resume processing does not work. Instead the EndState will be processed as STOPPED and the whole job will be stopped again.
              If I set it to false - so this step is not restarted - the resume works fine.

              I was not able to solve this, but I think this might be a bug in the framework.
              I can reproduce this error, if I amend your integration test as follows:

                  <job id="job">
                      <step id="s0" parent="step0" next="s1"/>
                  	<step id="s1" parent="step1">
                          <stop on="*" restart="s2"/>
                      <step id="s2" parent="step2"/>
                  <beans:bean id="step0" parent="step1">
                      <beans:property name="allowStartIfComplete" value="true" />
              Within the asserts for the size of the stepNameList have to be set to 2. The test will now fail because s2 is never reached.


              • #8
                OK, I see the problem. I raised an issue for it: