Announcement Announcement Module
Collapse
No announcement yet.
DeadlockLoserDataAccessException Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Looks like the retry operation interceptor is not working.

    As a test I explicitly threw an exception in the processor, based on the logs I don't see any retries

    Please find the log attached.

    Comment


    • #17
      I've lost track now of what your configuration looks like. The exception in the processor will only be retried if you have set up retry config for the step.

      Comment


      • #18
        Sorry about that.

        Having the interceptor over Step methods doesn't work - below is the configuration for that and the logs are attached in previous post

        Code:
        <aop:config>
            	<aop:pointcut id="stepPointCut" expression="execution(* org.springframework.batch.core.Step.*(..))"/>
            	<aop:advisor pointcut-ref="stepPointCut" advice-ref="retryAdvice" order="-1"/>
        </aop:config>
        
        <b:bean id="retryAdvice" class="org.springframework.batch.retry.interceptor.RetryOperationsInterceptor/>
        The step retry config from refrerence guide - 5.1.6. works.

        We have several steps and its a tedious process to add the retry config to every step , is there easier way to one config that applies across all the step ( I was hoping the interceptor around Step should do it )

        Comment


        • #19
          Originally posted by Nitty View Post
          Having the interceptor over Step methods doesn't work
          I never said it would.

          The step retry config from refrerence guide - 5.1.6. works.
          That is the correct way to add retry to a step.

          We have several steps and its a tedious process to add the retry config to every step , is there easier way to one config that applies across all the step ( I was hoping the interceptor around Step should do it )
          You can use a parent step as a template for all the steps that use the same configuration.

          Comment


          • #20
            This would work for the step which have chunks.

            We have steps which are tasklets either invoking a storeproc and doing some other house keeping work.

            How can we retry for these for a deadlock situation on the repository?

            Comment


            • #21
              How about a retry interceptor on the tasklet execute()?

              Comment


              • #22
                Dave,

                The jobs continue to fail due to dealocks.
                The deadlocks are happening outside of chunck after moving the retry to step/chunk.

                It happens at various places
                1. UPDATE BATCH_STEP_EXECUTION (after all the chuncks are exhaused )
                2. UPDATE BATCH_STEP_EXECUTION_CONTEXT....
                3. INSERT into BATCH_JOB_EXECUTION......
                4. SELECT STEP_EXECUTION_ID, STEP_NAME..... (querying job repository)

                I can't think of clean way to handle this other than to catch the exception and restart the entire job which unfortunately is not efficient.

                Comment


                • #23
                  If you get a deadlock on "INSERT into BATCH_JOB_EXECUTION" that is probably a good thing, though, and retrying the job could be the right approach. The others are unavoidable, but I don't know why you can't retry them. This thread is so long that I'm never sure what state your configuration is in. Maybe you could create a GIST with a project that we can share (although I don't know what we would do for a database)?

                  N.B. you can always reduce contention in the database by running fewer concurrent jobs and increasing the chunk sizes in steps that support it.

                  Comment


                  • #24
                    The others are unavoidable, but I don't know why you can't retry them
                    As per your previous suggestion, the retry configuration is in step/chunk ( as in reference guide - 5.1.6.)


                    I think the deadlock is occuring outside of step/chunk. (Eg - BATCH_STEP_EXECUTION_CONTEXT, BATCH_JOB_EXECUTION etc... including BATCH_STEP_EXECUTION before the chunking starts)

                    Since the retry interceptor on repository didn't work, I am thinking we are left with no choice other than to restart the entire Job.

                    N.B. you can always reduce contention in the database by running fewer concurrent jobs and increasing the chunk sizes in steps that support it.
                    During peak time we will have around 200 jobs waiting to be processed, currently we are only running 4 jobs in parallel during this time.
                    Each flow has 2 or more steps, one for loading data in stage area where we control the chunk sizes, the others load data from staging to target area - here the chunck is always 1 and is controlled by "sql group by" business logic.

                    So we have a very little to no room for improvement in configuration.

                    Comment


                    • #25
                      Originally posted by Nitty View Post
                      As per your previous suggestion, the retry configuration is in step/chunk ( as in reference guide - 5.1.6.)
                      Good, but that won't stop deadlocks in the JobRepository for operations outside the chunk transaction. You need both approaches to retry all deadlocks. If you could share a project it would be a lot easier to see what you are doing.

                      Comment


                      • #26
                        Dave,

                        Can you tell me if this configuration is right?

                        This is to include all the methods around the repository except for updateStepExecution (which is taken care in the step/chunk retry)

                        Code:
                        <aop:config>
                        	<aop:pointcut id="repositoryPointcut" expression="execution(* org.springframework.batch.core..*Repository+.*(..)) 
                        					      &amp;&amp; !execution(* org.springframework.batch.core..*Repository+.updateStepExecution*(..))"/>
                        	<aop:advisor pointcut-ref="repositoryPointcut" advice-ref="retryAdvice"/>
                        </aop:config>
                        
                        <b:bean id="retryAdvice" class="org.springframework.batch.retry.interceptor.RetryOperationsInterceptor"/>

                        Comment


                        • #27
                          Close, but the pointcut looks wrong because there is no method in JobRepository that matches the second expression (you can see in the interface for yourself). The methods you need to exclude are update(StepExecution) and updateExecutionContext(StepExecution). Remember to use the FQDN for the StepExecution in XML config.

                          Comment


                          • #28
                            Thanks Dave,

                            How will the deadlocks on updates to stepExecution and stepExecutionContext be taken care in the AbstractStep before and after doExecute() (updating step startTime/STARTED, COMPLETED/endTime...) ?

                            Comment


                            • #29
                              Good point. I guess we need to control the retry according to whether or not there is an existing transaction. I was thinking about this and realised we also probably need to ensure that the retry interceptor gets applied outside the transaction interceptor that lives with the JobRepository (as created by the factory bean) - maybe it already is, but you can check. If it is then you need a retry policy that wraps the existing one and only retries if there is no existing transaction (as determined by the TransactionSynchronizationManager).

                              Comment


                              • #30
                                This approach sounds good..

                                ensure that the retry interceptor gets applied outside the transaction interceptor that lives with the JobRepository (as created by the factory bean) - maybe it already is, but you can check
                                Not sure where to check this. (I see tx interceptor in the factory bean created when the object get initialized, since the retry interceptor is around the JobRepository, doesn't that mean retry interceptor is applied before/outside transaction interceptor ?)
                                If it is then you need a retry policy that wraps the existing one and only retries if there is no existing transaction (as determined by the TransactionSynchronizationManager).
                                So the wrapper would check TransactionSynchronizationManager.isActualTransact ionActive() , if true proceed, else handle it to RetryInterceptor.

                                Comment

                                Working...
                                X