Announcement Announcement Module
Collapse
No announcement yet.
How to reset a stale running job / restart a completed job? Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to reset a stale running job / restart a completed job?

    Hi everyone,

    Sometimes my application server fails in a way, that executing job is not marked as FAILED (e.g. the application server is shut down). In this case the DB is not updated, and I can't re-launch the job. So I've implemented a simple method to mark such job executions as STOPPED to allow them to be re-run:

    Code:
    public boolean resetStaleJob(long jobExecutionId) {
    	JobExecution jobExecution = jobExplorer.getJobExecution(Long.valueOf(jobExecutionId));
    
    	if (jobExecution == null) {
    		return false;
    	}
    
    	final BatchStatus status = jobExecution.getStatus();
    
    	if (status.isGreaterThan(BatchStatus.STARTED)) {
    		return false;
    	}
    
    	jobExecution.setStatus(BatchStatus.STOPPED);
    	jobExecution.setEndTime(new Date());
    	jobRepository.update(jobExecution);
    
    	return true;
    }
    I am not sure if the hacking like that is good way.

    Also I can't restart the completed job, because JobInstanceAlreadyCompleteException is thrown. I have set <batch:tasklet allow-start-if-complete="true">, but that does not help. Is there any legal way to restart the job, except of introducing the dummy Job parameter (e.g. equal to execution date)?

    Thanks.

  • #2
    Also I can't restart the completed job, because JobInstanceAlreadyCompleteException is thrown. I have set <batch:tasklet allow-start-if-complete="true">, but that does not help. Is there any legal way to restart the job, except of introducing the dummy Job parameter (e.g. equal to execution date)?
    there's no way to restart a completed job instance. The allow-start-if-complete flag applies to tasklets, to re-execute an already completed tasklet on a restart.

    Comment


    • #3
      Originally posted by arno View Post
      there's no way to restart a completed job instance. The allow-start-if-complete flag applies to tasklets, to re-execute an already completed tasklet on a restart.
      arno, thanks for reply. I still do not understand the correlation between <job restartable="xxx"> and <tasklet allow-start-if-complete="...">. Do you agree that documentation should be perhaps more clear about this?

      And in my particular situation: Can framework be extended to support above (the restart of a completed job + fixing of stale jobs), or is it against the framework ideology (and I should basically do it myself)? In the later case, which way is the best (to restart a completed job and to fix stale jobs)?

      Comment


      • #4
        I still do not understand the correlation between <job restartable="xxx"> and <tasklet allow-start-if-complete="...">.
        for <job restartable="true/false" />. The default is true: you can restart any non-COMPLETED/ABANDONED instance. Set it to false if a failed job instance shouldn't be restarted (because the job doesn't handle restart and could process the same data twice, which is bad).

        for <tasklet allow-start-if-complete="true/false">. Let's take an example: a job has three steps (step1, step2, step3). A first execution runs and fails during step3. On a restart, Spring Batch will go directly to step3 and try to finish the execution. It's because allow-start-if-complete defaults to false. Now, same thing, except allow-start-if-complete=true on step1 tasklet. On a restart (after a failure on step3), Spring Batch re-executes step1, skips step2, and re-executes step3. Clearer now? :-)

        I think the documentation is pretty clear on this part: http://static.springsource.org/sprin...tartIfComplete

        Can framework be extended to support above (the restart of a completed job), or is it against the framework ideology
        I think Spring Batch's behavior makes sense here: why would you want to restart something already finished? There could be some flag to set, but what are the semantics: the instance is done, where should it restart from? Can't you use the STOPPED status? There's plenty of support in Spring Batch to choose the final status of an job instance (and avoid the COMPLETED step if necessary), you should perhaps take at that (you'll still need to know exactly where the job should resume).

        Can framework be extended to support above (fixing of stale jobs), or is it against the framework ideology
        you could perhaps raise a JIRA for that.

        Comment


        • #5
          for <tasklet allow-start-if-complete="true/false">. Let's take an example: a job has three steps (step1, step2, step3). A first execution runs and fails during step3. On a restart, Spring Batch will go directly to step3 and try to finish the execution. It's because allow-start-if-complete defaults to false. Now, same thing, except allow-start-if-complete=true on step1 tasklet. On a restart (after a failure on step3), Spring Batch re-executes step1, skips step2, and re-executes step3. Clearer now? :-)
          I think the documentation is pretty clear on this part: http://static.springsource.org/sprin...tartIfComplete
          Thanks, absolutely clear. I was looking into wrong chapter of documentation.

          I think Spring Batch's behavior makes sense here: why would you want to restart something already finished? There could be some flag to set, but what are the semantics: the instance is done, where should it restart from?
          In my case I have new data added to DB, and I want a job to start from the beginning (so it can hardly be called "restart", better "start from blank").

          Can't you use the STOPPED status? There's plenty of support in Spring Batch to choose the final status of an job instance (and avoid the COMPLETED step if necessary), you should perhaps take at that (you'll still need to know exactly where the job should resume).
          The STOPPED status is good if there is some problem detected (after fixing the problem to let the job to continue from the interrupted point). I just want to start the job again on periodical basis (say, every Monday run from the beginning).

          Comment


          • #6
            The STOPPED status is good if there is some problem detected (after fixing the problem to let the job to continue from the interrupted point). I just want to start the job again on periodical basis (say, every Monday run from the beginning).
            looks to me the concept of a new job instance matches what you want. Each instance would have a date job parameter if it works on the same input data (table, file). I can't see why you need to restart the already completed job, just like if you wanted only one, "eternal" job instance for a particular job. Can you tell more about what you want to achieve?

            Comment


            • #7
              Originally posted by arno View Post
              looks to me the concept of a new job instance matches what you want. Each instance would have a date job parameter if it works on the same input data (table, file).
              I think, that is what I asked with my first post: do I need to introduce the dummy date job parameter (set to execution date) to start a new job?

              I can't see why you need to restart the already completed job, just like if you wanted only one, "eternal" job instance for a particular job. Can you tell more about what you want to achieve?
              My fault here: now I understand the term "restart" better in a way what it means to Batch. Indeed, I do not need to restart a completed job: I want to start a new one. Let the completed job be archived, as it should. On the other hand I would like the framework to check, that the job with this name is not currently being executed (regardless of parameters). So in brief the steps should look like:

              Code:
              if (job is COMPLETED)
              {
                start_new_job()
              }
              else if (job has FAILED)
              {
                restart_old_job()
              }
              else
              {
                exception: "The job is currently running"
              }
              How to implement start_new_job()?

              Comment


              • #8
                if you job runs daily, you should indeed use a job parameters for the day. If the concepts of job instances, job executions, etc... aren't clear to you, take a look at the documentation: http://static.springsource.org/sprin...html#domainJob.

                you can use the JobExplorer interface to check if there's a running execution of a job.

                Comment


                • #9
                  I think there is a part of the original question which was not replied and is also a problem for me. How do you resume job executions that have started but are not completed nor failed nor stopped because the server for some reason went down? How is it possible to resume these cases and have the job pickup where it left-off?

                  Comment


                  • #10
                    Originally posted by arno View Post
                    if you job runs daily, you should indeed use a job parameters for the day
                    OK, clear. I need to add one parameter, which is dependant on the day of start.

                    I hope, the framework could provide the method like restartOrStartNewJob(Job job) like this:

                    Code:
                    void restartOrStartNewJob(Job job) {
                    	JobParameters jobParameters = null;
                    
                    	List<JobInstance> lastInstances = jobExplorer.getJobInstances(job.getJobName(), 0, 1);
                    
                    	if (!CollectionUtils.isEmpty(lastInstances)) {
                    		// Try to restart the last execution:
                    		jobParameters = lastInstances.get(0).getJobParameters();
                    	}
                    
                    	if (jobParameters == null) {
                    		// Try to start a new instance:
                    		jobParameters = job.getJobParametersIncrementer().getNext(createDefaultJobParameters());
                    	}
                    
                    	try {
                    		logger.info("Attempting to re-launch job with parameters " + jobParameters);
                    
                    		Long executionId;
                    
                    		try {
                    			executionId = jobLauncher.run(job, jobParameters).getId();
                    		}
                    		catch (JobInstanceAlreadyCompleteException e) {
                    			jobParameters = job.getJobParametersIncrementer().getNext(jobParameters);
                    
                    			logger.info("Attempting to start new job with parameters " + jobParameters);
                    
                    			executionId = jobLauncher.run(job, jobParameters).getId();
                    		}
                    
                    		...
                    	}
                    	catch (JobInstanceAlreadyCompleteException e) {
                    		// This situation should never happen, as we have taken steps to increment the parameters:
                    		...
                    	}
                    	catch (JobExecutionAlreadyRunningException e) {
                    		...
                    	}
                    	catch (JobRestartException e) {
                    		...
                    	}
                    	catch (JobParametersInvalidException e) {
                    		...
                    	}
                    }
                    If the concepts of job instances, job executions, etc... aren't clear to you, take a look at the documentation: http://static.springsource.org/sprin...html#domainJob.
                    Thanks for the link. After reading the docu again, I understand the basics better

                    2eparchas:

                    Originally posted by eparchas View Post
                    How do you resume job executions that have started but are not completed nor failed nor stopped because the server for some reason went down? How is it possible to resume these cases and have the job pickup where it left-off?
                    I have a kind of ugly solution, which I put into the initial post. I would like to hear from Batch core developers, if resetStaleJob() above looks good. eparchas, you can use it as is – it works absolutely fine for me.

                    Comment


                    • #11
                      Thanks, I have tried using it but it seems that the step executions do not resume from where they left off, but a new one is created. I am using a FlatFileItemReader and would like reading of the file to resume from the last commit. I would assume this is done automatically since it implements an ItemStream but haven't managed to get it working.

                      I too would like to hear Spring batch opinions on recovering jobs from server failures.

                      Comment


                      • #12
                        I too would like to hear Spring batch opinions on recovering jobs from server failures.

                        Comment


                        • #13
                          Originally posted by eparchas View Post
                          I have tried using it but it seems that the step executions do not resume from where they left off, but a new one is created. I am using a FlatFileItemReader and would like reading of the file to resume from the last commit. I would assume this is done automatically since it implements an ItemStream but haven't managed to get it working.
                          I suppose, what is happening in your case is that JobParameters which you pass to JobLauncher.run() differ somehow (e.g. JobParametersIncrementer is involved or ...). In this case new job instance will be created. If I am wrong, try to debug JobLauncher.run(), maybe you need to set allow-start-if-complete="true" for your tasklet...

                          Comment

                          Working...
                          X