Announcement Announcement Module
Collapse
No announcement yet.
skipping a block of code in the Partitioner code Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • skipping a block of code in the Partitioner code

    Hi,

    I'm working on a Spring Batch application which runs a job and partitions it into a number of Tasklet steps using the PartitionStep. In the Partitioner that I feed into the StepExecutionSplitter, I need to be able to skip the block of code that actually figures out how the steps will be partitioned so that if one of the Tasklet steps fail, only the unsuccessful ones will be run if the job is restarted and even the code that does the splitting of the job into a number of steps can be skipped. Is there a way for the Partitioner to have access to an ExecutionContext so it could check for the existence of a variable if it should skip the code that does the splitting of the job into small steps? I'm looking for something similar to the code snippet in the Spring Batch documentation section 3.3. Here's the code snippet in the Spring Batch documentation..
    Code:
    if (executionContext.containsKey(getKey(LINES_READ_COUNT))) {
        log.debug("Initializing for restart. Restart data is: " + executionContext);
    
        long lineCount = executionContext.getLong(getKey(LINES_READ_COUNT));
    
        LineReader reader = getReader();
    
        Object record = "";
        while (reader.getPosition() < lineCount && record != null) {
            record = readLine();
        }
    }
    Thanks.

  • #2
    I actually have the same issue. We use the progress indicator pattern where we've implemented a complete staging capability with error management. The first step of our job reads the data to process and store it in a staging table. One row in that table corresponds to one item to process. Metadata such as status, stepName and message are available.

    Each next step(s) of that job is a partition step that splits the data to process in a partition. Each partition uses the staging table so they do not need to save a state since the staging table reflects the status of each individual item (not processed, completed, skipped, error). We have noticed that if we want to restart such step, the partitioner is called again, which is something I don't really understand. Worse, if we try to create new partitions which is possible for us since we know exactly what needs to be reprocessed, Spring batch overrides the new execution context created by the partitioner by the old one if the partition did not completed sucessfully.

    Any idea about this use case? Are we doing something wrong? I would say that the default behavior in case of restart would be to restart only the failed partitions, isn't it?

    Comment


    • #3
      It's good to know that I'm not the only one facing this problem. :-) I guess snicoll, we'll just have to wait for the Spring Batch guys to tell us if there is a workaround to this problem.

      Comment


      • #4
        Stephane: thanks for the detail. I guess maybe if the framework just checked for allowStartIfComplete and always regenerated the partitions in that case that would work for you, maybe? I'm still not sure though why you need allowStartIfComplete=true - the semantics are really for steps that undo previous failed work (e.g. deleting output files), or send informational messages that always need to go for each execution.

        gerson721: Do you have allowStartIfComplete=true (see discussion here http://forum.springsource.org/showthread.php?t=84722)? If so is it intended or necessary for that step? The default behaviour is to only re-process the partitions that failed, and to give trhem the "same" input data as the failed execution., This makes perfect sense to me (and there is a redundant code block where the partitions are regenerated anyway, but that shouldn't matter).

        Comment


        • #5
          Originally posted by Dave Syer View Post
          Stephane: thanks for the detail. I guess maybe if the framework just checked for allowStartIfComplete and always regenerated the partitions in that case that would work for you, maybe? I'm still not sure though why you need allowStartIfComplete=true - the semantics are really for steps that undo previous failed work (e.g. deleting output files), or send informational messages that always need to go for each execution.

          gerson721: Do you have allowStartIfComplete=true (see discussion here http://forum.springsource.org/showthread.php?t=84722)? If so is it intended or necessary for that step? The default behaviour is to only re-process the partitions that failed, and to give trhem the "same" input data as the failed execution., This makes perfect sense to me (and there is a redundant code block where the partitions are regenerated anyway, but that shouldn't matter).
          Dave,

          Can you post a sample/skeleton partition config that would be coherent with this scenario? we need allowStartIfComplete because we have a multi-steps job so If something failed in step1 and is fixed upon a restart on the job, we want to execute step2, step3, ... even if they don't have any error at all. If this is not clear, I can explain a bit more. Just let me know.

          Comment


          • #6
            It's not really clear yet, no. Why would you want to execute a step again if it already completed successfully for this input data set?

            Comment


            • #7
              ok, let me try to explain again.

              Our job uses the progress indicator where we first stage the data to process in a staging table holding a status (pending, completed, error, skipped). Say that our job has 4 steps: step1 stage the data, the other steps uses the staging table to process the data with a partition.

              Here's a typical flow we have the problem:

              * Step1 stage 10 items
              * Step2 processes 8 items and skipped 2 items

              What we want at this point is the following:

              * Step2 has a different exit code so that we know items were skipped
              * Step3 is processed (because the error was skippable and not fatal)
              * Step3 ONLY PROCESSES the 8 items that step2 processed
              * Assuming no error occurred, step4 does the same (only processes the 8 items)

              When our job ends, step2 needs to be restart but, even though step3 and 4 went fine they need to be restarted as well in order to process the 2 remaining items. A typical scenario is an intervention that fixes the underlying issue, then someone restart the batch. Step2 starts again, processes only the 2 skipped items and then Step3 and Step4 process those as well.

              Implementing this scenario with what Spring Batch offers was not hugely difficult but we are not satisfied with the result. For instance, there is no "INCOMPLETE" status in Spring batch that states that the step completed but skipped some items. The other problem we have is that we get wrong statistics. While Step2 states that it read 10 items, processed 8 items and skipped 2 items, step3 and 4 states that 8 items were read and processed. It would be better of course if we can states that 2 items were skipped there as well but we haven't found a way to do that yet.

              Anyway, back to the point, can you post a multi-steps job using partitioning that would fit with our use case. Recomputing the partition makes no sense to me. Maybe we're putting the allow-restart-if-complete flag at the wrong spot.

              Comment


              • #8
                I suppose you could go with allowStartIfComplete=true for step2. If there are skips but no unrecoverable errors then all partitions end with status=COMPLETED. Maybe this scenario is what you are trying to implement already and it doesn't work? We can maybe fix the framework to allow that by checking the flag and always regenerating the partitions (as I proposed). If you patch the StepExecutionSplitter and it works for you post the patch in a JIRA and I'll see if that makes it clearer.

                Otherwise we can look at what happens if step2 is allowStartIfComplete=false (the default). Then you have two choices:

                1) Treat the re-processing of skipped items as a new job instance (so allow the initial job to complete normally). In this case step3, step4 are also allowStartIfComplete=false.

                2) Treat the re-processing of skipped items as a restart of the same job instance. Then you need to ensure that any partition with skips in step2 exits with status=FAILED, and then when it restarts it only processes items which are not already marked complete (presumably this is part of the input query anyway). The whole job execution also has to finish with status=FAILED. In this case step3, step4 are allowStartIfComplete=true and also contain a filter in the input to ignore processed items.

                You hypothetical status=INCOMPLETE is really already realized but not explicitly enumerated: it is a standard interpretation of status=FAILED and exitStatus.exitCode=COMPLETE_WITH_SKIPS (or whatever you want to call it).

                Comment


                • #9
                  Originally posted by Dave Syer View Post
                  gerson721: Do you have allowStartIfComplete=true (see discussion here http://forum.springsource.org/showthread.php?t=84722)? If so is it intended or necessary for that step? The default behaviour is to only re-process the partitions that failed, and to give trhem the "same" input data as the failed execution., This makes perfect sense to me (and there is a redundant code block where the partitions are regenerated anyway, but that shouldn't matter).
                  Thanks for your response Dave,

                  I haven't actually set the value of allowStartIfComplete to true, that's why whenever I restart the failed job, spring batch just kept on restarting the failed steps. The one that I was trying to skip was the execution of the redundant code which kept on regenerating the partitions even its output is not used anyway.

                  For our case, the execution of this redundant code is expensive and I would prefer to not run it the second, third time if it's output is not going to be used on the second, third job retry anyway.

                  A fix for this problem might be to have that redundant code run as a separate job step outside of PartitionStep and store its output to the JobExecutionContext. I'll have a go at that and let you know if that works.

                  Thanks again,
                  Gerson

                  Comment


                  • #10
                    The framework could skip that block if allowStartIfComplete=false and we are in a restart. Wouldn't that be better? There's no need to pollute the JobExecution context with data that is already available at step level. (But if it works as a workaround for you that's great.)

                    Comment


                    • #11
                      Originally posted by Dave Syer View Post
                      The framework could skip that block if allowStartIfComplete=false and we are in a restart. Wouldn't that be better?
                      Hi Dave,

                      Reading the API, it sounded like setting allowStartIfComplete to false would only make my application skip the steps that have successfully finished and only rerun the failed ones. The API also states that the default value of allowStartIfComplete is false so I didn't set it in my code anymore. I had a go at it again and explicitly set
                      Code:
                      allow-start-if-complete="false"
                      in my tasklet but the block of code that does the partitioning still runs everytime the job is restarted.

                      Comment


                      • #12
                        I assume you meant true. That's exactly my problem. IMO, the partitioner should not run again but just reuse the stepexecution that were created previously.

                        Comment


                        • #13
                          Originally posted by gerson721 View Post
                          the block of code that does the partitioning still runs everytime the job is restarted.
                          I know. I understood the last post already. I offered a couple of workarounds and asked a question about whether a potential solution would work in your case. I'll ask that one again: if the framework would skip that block only if allowStartIfComplete=false and the partition step is restarting would that help? (No promises that will be the actual implementation or in what future version, but open a JIRA, and it might make it to 2.1.1.)

                          Comment


                          • #14
                            As I understood it, that is allowStartIfComplete=false on the step defining the partition and the underlying step is restarting then no partition will be computed again, yes it'll help us.

                            The current behavior is broken right now anyway since if the partitioner decides to create optimized partitions for the restart, spring batch overrides the created execution context with the previous one if a given partition has failed (and does not if the partition has succeeded!)

                            Comment


                            • #15
                              Originally posted by snicoll View Post
                              I assume you meant true. That's exactly my problem. IMO, the partitioner should not run again but just reuse the stepexecution that were created previously.
                              Actually I meant false. I think the default behavior for the partitioned steps (the slave steps) is to only rerun the failed steps if allowStartIfComplete is set to false. The PartitionStep (the master step) however still runs the block of code that does the partitioning everytime you restart the job even if allowStartIfComplete's value is set to false.

                              I managed to solve my problem by just moving that partitioning bit of the code to its own Step and saved its output to the JobExecutionContext. I wouldn't want the partitioner to run this block of code everytime I restart the job as it will sometimes take a few minutes for it to finish executing.

                              Comment

                              Working...
                              X