Announcement Announcement Module
Collapse
No announcement yet.
Disable step restart after subsequent step has completed Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Disable step restart after subsequent step has completed

    [edit] just to clarify the title wasn't precise. It probably should be "restart job only from first failed step, ignoring preceding completed but restartable steps"

    The job flow I have is as follows

    StepA -> StepB -> Decider -> StepC.

    The decider will determine whether there is more input for StepB to process, if yes it will go to StepB again, else it will proceed to StepC.

    As such StepB has been made restartable via allow-restart-if-complete=true. However, when the job fails at StepC and I restart the job, StepB gets executed again.

    Is there a way to restart job only from the first failed step, ignoring any preceding steps that are completed but set to (restartable when complete)?
    Last edited by Dacheng; Feb 10th, 2010, 01:28 AM. Reason: clarify title

  • #2
    It sounds like allow-restart-if-complete=false (the default)? What are we missing?

    Comment


    • #3
      Originally posted by Dave Syer View Post
      It sounds like allow-restart-if-complete=false (the default)? What are we missing?
      Hi Dave,

      Thanks for the quick reply.

      If it' set to false, then when the decider determines there is more work for StepB to do, stepB can't be started again:

      <Step already complete or not restartable, so no action to execute>

      Comment


      • #4
        What version of Batch is this? I think if you upgrade to 2.1 then a step is always repeatable in the context of a single JobExecution.

        Comment


        • #5
          We have a similar problem. We have a multi-steps job with the ability to skip items in order to process them later.

          Step1 -> Step2 -> Step3 -> Step4

          The following scenarios need to be supported:

          * If at least an item was skipped at step2 and provided we have not reached the limit, we go on on Step3 but without executing those skipped items. Same thing for Step4 of course
          * When such a job completes it can be restarted and only the items that were skipped are processed again (so, ideally Step1 does not run, Step2 runs only the skipped items and it completes this time, Step3 and Step4 are also executed with those skipped items)

          There's one problem with this approach: it seems I can't plug some custom logic to decide whether or not the step has to restart. Since I have to set the exit status of the step as COMPLETED (otherwise the whole job stops at first execution), every step needs to be fully restartable (allow-restart-if-complete=true), no matter what and then the reader extracts the actual skipped items to process

          It would be very nice if we could:

          * Set that a step is neither completed, nor failed but contains items that were skipped
          * Use this in-between status to decide whether or not the step needs to restart

          Is there a full description of the batch state machine somewhere? It seems to me that the "skip item" functionality is not fully integrated in there. If one can configure that a given exception should not fail the batch but just skip the related item, the engine should provide a coherent mechanism to handle restart of such job. The most obvious case is a admin UI where an operator can have a look to the error, fix the underlying issue and restart the job to reprocess previously failed items.

          Comment


          • #6
            I don't think this is quite the same scenario as the original post, since you plan to restart the whole job to get the step to process successfully, where in the original there is a decision within the job that drives the retry of a step automatically (i.e. no need for manual intervention).

            I have to set the exit status of the step as COMPLETED
            Not true I think. You need the step to fail in order that it is executed on restart, so it has to have BatchStatus.FAILED, and you need the whole job to fail at the end so it can be restarted. But there's no reason why a failed step cannot be followed by a successful one, and there is also no reason why a job cannot fail even if the last step was successful.

            And its ExitStatus can be anything - and you can use that to drive the decisions in a state machine (if you need to).

            Comment


            • #7
              Originally posted by Dave Syer View Post
              I don't think this is quite the same scenario as the original post, since you plan to restart the whole job to get the step to process successfully, where in the original there is a decision within the job that drives the retry of a step automatically (i.e. no need for manual intervention).



              Not true I think. You need the step to fail in order that it is executed on restart, so it has to have BatchStatus.FAILED, and you need the whole job to fail at the end so it can be restarted. But there's no reason why a failed step cannot be followed by a successful one, and there is also no reason why a job cannot fail even if the last step was successful.

              And its ExitStatus can be anything - and you can use that to drive the decisions in a state machine (if you need to).
              How do I set it to failed if it contains only skipped items? Its batch status is COMPLETED and we change the exit status to COMPLETED WITH ERROR but I don't see a way to change the actual status in a listener. Any idea?

              Comment


              • #8
                Originally posted by snicoll View Post
                How do I set it to failed if it contains only skipped items? Its batch status is COMPLETED and we change the exit status to COMPLETED WITH ERROR but I don't see a way to change the actual status in a listener. Any idea?
                Nevermind, found it

                I have the following code in my step listener

                Code:
                public ExitStatus afterStep(StepExecution stepExecution) {
                		if(stepExecution.getSkipCount()>0
                				&& stepExecution.getStatus()==BatchStatus.COMPLETED) {
                
                            stepExecution.setStatus(BatchStatus.FAILED);
                            return AdvancedExitStatus.COMPLETED_WITH_ERROR;
                		}
                		return null;
                	}
                and a job listener that sets the status of the job to fail if any of the step is failed (probably unnecessary now).

                When I try to restart the step that was marked as FAILED / COMPLETED WITH ERROR, I get the following exit status


                exitCode=NOOP;exitDescription=All steps already completed or no steps configured for this job


                When Spring batch checks the step, I can see that the status was set to ABANDONED but I have no idea who set that value

                Code:
                2010-02-16 10:44:25 [main] SimpleJobLauncher [INFO] Job: [FlowJob: [name=completedWithErrorJob]] launched with the following parameters: [{date=1266313465212, sourceFile=classpath:/com/bsb/sf/incubator/batch/simple/CompletedWithErrorJobTest.csv}]
                2010-02-16 10:44:25 [main] AbstractJob [DEBUG] Job execution starting: JobExecution: id=2, startTime=null, endTime=null, lastUpdated=Tue Feb 16 10:44:25 CET 2010, status=STARTING, exitStatus=exitCode=UNKNOWN;exitDescription=, job=[JobInstance: id=1, JobParameters=[{date=1266313465212, sourceFile=classpath:/com/bsb/sf/incubator/batch/simple/CompletedWithErrorJobTest.csv}], Job=[completedWithErrorJob]]
                2010-02-16 10:44:25 [main] SimpleFlow [DEBUG] Resuming state=completedWithErrorJob.step1 with status=UNKNOWN
                2010-02-16 10:44:25 [main] SimpleFlow [DEBUG] Handling state=completedWithErrorJob.step1
                2010-02-16 10:44:25 [main] SimpleStepHandler [INFO] Step already complete or not restartable, so no action to execute: StepExecution: id=1, name=step1, status=ABANDONED, exitStatus=COMPLETED WITH ERROR, readCount=10, filterCount=0, writeCount=7 readSkipCount=0, writeSkipCount=0, processSkipCount=3, commitCount=11, rollbackCount=3, exitDescription=
                why is it abandoned now?

                Comment


                • #9
                  any idea here?

                  Comment


                  • #10
                    That's a detail I forgot to mention before, and I can see now that it is awkward for you. If a step fails and the job does not fail immediately the framework assumes that you have recovered from the failure and marks the step execution as ABANDONED, so that it can be skipped on restart. So if you play with the step execution status, you also need to abide by that rule, or find a workaround.

                    Comment

                    Working...
                    X