Announcement Announcement Module
Collapse
No announcement yet.
Parallel Job Execution Leads to deadlock Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Parallel Job Execution Leads to deadlock

    There are two batch jobs. They share the same meta data database.

    They are going into a deadlock when run simultaniously. It happens when creating a new step i.e. generating the batch_step_execution_id and in few situations when creating batch_job_execution i.e. batch_job_execution_id. One of the job is being killed showing a message that this process is selected as the deadlock victim, Rerun the transaction.

    I tried with four transaction isolation levels i.e. SERIALIZABLE, READ REPEATABLE, READ COMMITED, READ UNCOMMITED. Non of them solved the problem.

    Database is MS SqlServer. I tried with derby also but same problem.
    Last edited by springilu; Apr 13th, 2010, 04:55 PM. Reason: I added a sample code which cases the exceptions.

  • #2
    If you have a test case it would be useful. However, there is no way to prevent deadlocks in general when sharing a database (they are telling you something useful from the point of view of the server). The best you can do is retry the job launch: you can automate that with RetryOperationsInterceptor (declarative) or RetryTemplate (imperative) from Spring Batch.

    Comment


    • #3
      Thank you Dave.
      I dont have any test case. I pasted the error stacktrace below. I also high lighted some portion of the exception in red which is may have caused this deadlock. i.e.
      org.springframework.batch.core.job.flow.FlowJob.

      2010-04-06 17:27:10,171 ERROR atch.core.job.AbstractJob.execute :274 - Encountered fatal error executing job
      org.springframework.batch.core.JobExecutionExcepti on: Flow execution ended unexpectedly
      at org.springframework.batch.core.job.flow.FlowJob.do Execute(FlowJob.java:110)
      at org.springframework.batch.core.job.AbstractJob.exe cute(AbstractJob.java:250)
      at org.springframework.batch.core.launch.support.Simp leJobLauncher$1.run(SimpleJobLauncher.java:110)
      at org.springframework.core.task.SyncTaskExecutor.exe cute(SyncTaskExecutor.java:49)
      at org.springframework.batch.core.launch.support.Simp leJobLauncher.run(SimpleJobLauncher.java:105)
      at com.ussco.springbatch.util.WmsJobLauncher.start(Wm sJobLauncher.java:262)
      at com.ussco.springbatch.util.WmsJobLauncher.main(Wms JobLauncher.java:315)
      Caused by: org.springframework.batch.core.job.flow.FlowExecut ionException: Ended flow=wmctnint at state=moveInputFile with exception
      at org.springframework.batch.core.job.flow.support.Si mpleFlow.resume(SimpleFlow.java:148)
      at org.springframework.batch.core.job.flow.support.Si mpleFlow.start(SimpleFlow.java:124)
      at org.springframework.batch.core.job.flow.FlowJob.do Execute(FlowJob.java:105)
      ... 6 more
      Caused by: org.springframework.dao.DataAccessResourceFailureE xception: Could not obtain IDENTITY value; nested exception is java.sql.SQLTransactionRollbackException: A lock could not be obtained due to a deadlock, cycle of locks and waiters is:
      Lock : ROW, BATCH_STEP_EXECUTION_SEQ, (2,179)
      Waiting XID : {9761, S} , SAI, select IDENTITY_VAL_LOCAL() from BATCH_STEP_EXECUTION_SEQ
      Granted XID : {9764, X}
      Lock : ROW, BATCH_STEP_EXECUTION_SEQ, (2,178)
      Waiting XID : {9764, S} , SAI, select IDENTITY_VAL_LOCAL() from BATCH_STEP_EXECUTION_SEQ
      Granted XID : {9761, X}
      . The selected victim is XID : 9761.
      at org.springframework.jdbc.support.incrementer.Derby MaxValueIncrementer.getNextKey(DerbyMaxValueIncrem enter.java:160)
      at org.springframework.jdbc.support.incrementer.Abstr actDataFieldMaxValueIncrementer.nextLongValue(Abst ractDataFieldMaxValueIncrementer.java:125)
      at org.springframework.batch.core.repository.dao.Jdbc StepExecutionDao.saveStepExecution(JdbcStepExecuti onDao.java:117)
      at org.springframework.batch.core.repository.support. SimpleJobRepository.add(SimpleJobRepository.java:1 58)
      at sun.reflect.GeneratedMethodAccessor52.invoke(Unkno wn Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(Un known Source)
      at java.lang.reflect.Method.invoke(Unknown Source)
      at org.springframework.aop.support.AopUtils.invokeJoi npointUsingReflection(AopUtils.java:310)
      at org.springframework.aop.framework.ReflectiveMethod Invocation.invokeJoinpoint(ReflectiveMethodInvocat ion.java:182)
      at org.springframework.aop.framework.ReflectiveMethod Invocation.proceed(ReflectiveMethodInvocation.java :149)
      at org.springframework.transaction.interceptor.Transa ctionInterceptor.invoke(TransactionInterceptor.jav a:106)
      at org.springframework.aop.framework.ReflectiveMethod Invocation.proceed(ReflectiveMethodInvocation.java :171)
      at org.springframework.aop.framework.JdkDynamicAopPro xy.invoke(JdkDynamicAopProxy.java:204)
      at $Proxy0.add(Unknown Source)
      at org.springframework.batch.core.job.AbstractJob.han dleStep(AbstractJob.java:345)
      at org.springframework.batch.core.job.flow.FlowJob.ac cess$100(FlowJob.java:43)
      at org.springframework.batch.core.job.flow.FlowJob$Jo bFlowExecutor.executeStep(FlowJob.java:137)
      at org.springframework.batch.core.job.flow.support.st ate.StepState.handle(StepState.java:60)
      at org.springframework.batch.core.job.flow.support.Si mpleFlow.resume(SimpleFlow.java:144)
      ... 8 more
      Caused by: java.sql.SQLTransactionRollbackException: A lock could not be obtained due to a deadlock, cycle of locks and waiters is:
      Lock : ROW, BATCH_STEP_EXECUTION_SEQ, (2,179)
      Waiting XID : {9761, S} , SAI, select IDENTITY_VAL_LOCAL() from BATCH_STEP_EXECUTION_SEQ
      Granted XID : {9764, X}
      Lock : ROW, BATCH_STEP_EXECUTION_SEQ, (2,178)
      Waiting XID : {9764, S} , SAI, select IDENTITY_VAL_LOCAL() from BATCH_STEP_EXECUTION_SEQ
      Granted XID : {9761, X}
      . The selected victim is XID : 9761.
      at org.apache.derby.client.am.SQLExceptionFactory40.g etSQLException(Unknown Source)
      at org.apache.derby.client.am.SqlException.getSQLExce ption(Unknown Source)
      at org.apache.derby.client.am.Statement.executeQuery( Unknown Source)
      at org.enhydra.jdbc.core.CoreStatement.executeQuery(C oreStatement.java:107)
      at org.springframework.jdbc.support.incrementer.Derby MaxValueIncrementer.getNextKey(DerbyMaxValueIncrem enter.java:145)
      ... 26 more
      Caused by: org.apache.derby.client.am.SqlException: A lock could not be obtained due to a deadlock, cycle of locks and waiters is:
      Lock : ROW, BATCH_STEP_EXECUTION_SEQ, (2,179)
      Waiting XID : {9761, S} , SAI, select IDENTITY_VAL_LOCAL() from BATCH_STEP_EXECUTION_SEQ
      Granted XID : {9764, X}
      Lock : ROW, BATCH_STEP_EXECUTION_SEQ, (2,178)
      Waiting XID : {9764, S} , SAI, select IDENTITY_VAL_LOCAL() from BATCH_STEP_EXECUTION_SEQ
      Granted XID : {9761, X}
      . The selected victim is XID : 9761.
      at org.apache.derby.client.am.Statement.completeSqlca (Unknown Source)
      at org.apache.derby.client.net.NetStatementReply.pars eOpenQueryError(Unknown Source)
      at org.apache.derby.client.net.NetStatementReply.pars eOPNQRYreply(Unknown Source)
      at org.apache.derby.client.net.NetStatementReply.read OpenQuery(Unknown Source)
      at org.apache.derby.client.net.StatementReply.readOpe nQuery(Unknown Source)
      at org.apache.derby.client.net.NetStatement.readOpenQ uery_(Unknown Source)
      at org.apache.derby.client.am.Statement.readOpenQuery (Unknown Source)
      at org.apache.derby.client.am.Statement.flowExecute(U nknown Source)
      at org.apache.derby.client.am.Statement.executeQueryX (Unknown Source)
      ... 29 more

      Comment


      • #4
        It's a database lock that is broken, not an application or framework lock. My conclusion is the same: it is unavoidable (some RDBMS platforms are less likely to lock than others, and some can be tuned by careful definition of the table meta data), but you can work around it by retrying the Job execution in your WmsJobLauncher.

        Comment


        • #5
          Added the sample config that causes the exception

          Hello,

          I added a simple job configuration that causes the exception. Please let me know if there is anything wrong in the configuration.

          <beans xmlns="http://www.springframework.org/schema/beans" xmlns:j="http://www.springframework.org/schema/batch"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://www.springframework.org/schema/batch http://www.springframework.org/schem...-batch-2.1.xsd
          http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd">

          <j:job id="loopFlowSample" xmlns="http://www.springframework.org/schema/batch">
          <j:decision id="limitDecision" decider="limitDecider">
          <j:next on="CONTINUE" to="step1" />
          <j:end on="COMPLETED" />
          </j:decision>
          <j:step id="step1" next="step2">
          <j:tasklet ref="step1Bean" allow-start-if-complete="true"/>
          </j:step>
          <j:step id="step2" next="limitDecision">
          <j:tasklet ref="step2Bean" allow-start-if-complete="true"/>
          </j:step>
          </j:job>

          <bean id="limitDecider" class="com.test.MyDecider">
          <property name="limit" value="9" />
          </bean>
          <bean id="step1Bean"
          class="org.springframework.batch.core.step.tasklet .MethodInvokingTaskletAdapter">
          <property name="targetObject">
          <bean class="com.test.TestStep">
          </bean>
          </property>
          <property name="targetMethod" value="doIt" />
          </bean>
          <bean id="step2Bean"
          class="org.springframework.batch.core.step.tasklet .MethodInvokingTaskletAdapter">
          <property name="targetObject">
          <bean class="com.test.TestStep">
          </bean>
          </property>
          <property name="targetMethod" value="doIt" />
          </bean>
          <bean id="jobLauncher"
          class="org.springframework.batch.core.launch.suppo rt.SimpleJobLauncher">
          <property name="jobRepository" ref="jobRepository" />
          </bean>

          <bean id="BatchDataSource" class="org.enhydra.jdbc.pool.StandardXAPoolDataSou rce" destroy-method="shutdown">
          <property name="dataSource">
          <bean id="actualDataSource" class="org.enhydra.jdbc.standard.StandardXADataSou rce" destroy-method="shutdown">
          <property name="transactionManager" ref="jotm"/>
          <property name="driverName" value="org.apache.derby.jdbc.ClientDriver"/>
          <property name="url" value="jdbc:derby://localhost:1527/seconddb;user=sai;password=sai;"/>
          </bean>
          </property>
          </bean>
          <j:job-repository id="jobRepository" data-source="BatchDataSource"
          transaction-manager="transactionManager"></j:job-repository>
          <bean id="jotm" class="org.springframework.transaction.jta.JotmFac toryBean"/>

          <bean id="transactionManager" class="org.springframework.transaction.jta.JtaTran sactionManager">
          <property name="userTransaction" ref="jotm"/>
          <property name="allowCustomIsolationLevels" value="true"/>
          </bean>
          </beans>

          Comment


          • #6
            Nothing wrong with that, but you haven't really said how you are launching these jobs concurrently. Since no-one else experiences this issue, I would guess that you are doing something unusual.

            One thing that might help is to set the ID generators up to cache ranges of ID values. You would need to inject a custom incrementer factory into the JobRepositoryFactoryBean (so use that directly instead of the <batch:job-repository/> namespace shortcut).

            Comment


            • #7
              Added the dos script that starts the jobs

              Thank you Dave.

              I included the dos script that creates the batch jobs below.

              Thay are two files.

              1)
              --------------------test.cmd--------------------------
              setlocal
              @echo off

              @rem 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

              FOR %%x IN (1 2 3 4 5) DO start testbatch %0%%x

              echo %Errorlevel%

              endlocal
              -------------------------------------------------------

              2)
              ------------------------testbatch.cmd-----------------
              setLocal
              @echo off

              java -cp %batchclasspath% org.springframework.batch.core.launch.support.Comm andLineJobRunner sample.xml loopFlowSample arg1=%1
              exit

              setLocal
              -------------------------------------------------------

              Comment


              • #8
                I don't understand what you think is wrong. You launch several no-op jobs simulataneously, and their steps complete faster than your transaction manager can write its logs. It's not very realistic, and it's hardly surprising that the database can't cope with it.

                Comment


                • #9
                  I'm facing same as above. In our case, job repository is on on SQL Server 2000 database server, getting database same error as above. We are launching multiple jobs at same time using "Computer Associate's AUTOSYS' scheduling tool. Here is our definitions:

                  Comment

                  Working...
                  X