Announcement Announcement Module
Collapse
No announcement yet.
Deadlocks when job instances/executions/steps are being created Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deadlocks when job instances/executions/steps are being created

    Hi,
    We are getting deadlocks when job instances/executions/steps are being created. This occurs when multiple jobs are schedule and started at the same time. We have jobs which runs for every 2 minutes. This causes each transaction to take longer, providing more opportunity for deadlock. Is anyone encountered this issue? What might be the issue?

    This is one of the the exceptions we have got
    org.springframework.dao.DeadlockLoserDataAccessExc eption: PreparedStatementCallback; SQL [INSERT into BATCH_JOB_INSTANCE(JOB_INSTANCE_ID, JOB_NAME, JOB_KEY, VERSION) values (?, ?, ?, ?)]; Deadlock found when trying to get lock; try restarting transaction; nested exception is com.mysql.jdbc.exceptions.MySQLTransactionRollback Exception: Deadlock found when trying to get lock; try restarting transaction
    at org.springframework.jdbc.support.SQLErrorCodeSQLEx ceptionTranslator.translate(SQLErrorCodeSQLExcepti onTranslator.java:300)
    at org.springframework.jdbc.core.JdbcTemplate.execute (JdbcTemplate.java:606)
    at org.springframework.jdbc.core.JdbcTemplate.update( JdbcTemplate.java:791)
    at org.springframework.jdbc.core.JdbcTemplate.update( JdbcTemplate.java:849)
    at org.springframework.jdbc.core.JdbcTemplate.update( JdbcTemplate.java:853)
    at org.springframework.batch.core.repository.dao.Jdbc JobInstanceDao.createJobInstance(JdbcJobInstanceDa o.java:68)
    at org.springframework.batch.core.repository.support. SimpleJobRepository.createJobExecution(SimpleJobRe pository.java:180)
    at sun.reflect.GeneratedMethodAccessor445.invoke(Unkn own Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(De legatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at org.springframework.aop.support.AopUtils.invokeJoi npointUsingReflection(AopUtils.java:310)
    at org.springframework.aop.framework.ReflectiveMethod Invocation.invokeJoinpoint(ReflectiveMethodInvocat ion.java:182)
    at org.springframework.aop.framework.ReflectiveMethod Invocation.proceed(ReflectiveMethodInvocation.java :149)
    at org.springframework.transaction.interceptor.Transa ctionInterceptor.invoke(TransactionInterceptor.jav a:106)
    at org.springframework.aop.framework.ReflectiveMethod Invocation.proceed(ReflectiveMethodInvocation.java :171)
    at org.springframework.aop.interceptor.ExposeInvocati onInterceptor.invoke(ExposeInvocationInterceptor.j ava:89)
    at org.springframework.aop.framework.ReflectiveMethod Invocation.proceed(ReflectiveMethodInvocation.java :171)
    at org.springframework.aop.framework.JdkDynamicAopPro xy.invoke(JdkDynamicAopProxy.java:204)
    at $Proxy120.createJobExecution(Unknown Source)
    at org.springframework.batch.core.launch.support.Simp leJobLauncher.run(SimpleJobLauncher.java:79)
    at com.om.dh.batch.core.ProxyJobBean.runJob(ProxyJobB ean.java:124)
    at com.om.dh.batch.core.ProxyJobBean.executeInternal( ProxyJobBean.java:88)
    at org.springframework.scheduling.quartz.QuartzJobBea n.execute(QuartzJobBean.java:86)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:2 02)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run (SimpleThreadPool.java:529)
    Caused by: com.mysql.jdbc.exceptions.MySQLTransactionRollback Exception: Deadlock found when trying to get lock; try restarting transaction
    at com.mysql.jdbc.SQLError.createSQLException(SQLErro r.java:1042)
    at com.mysql.jdbc.SQLError.createSQLException(SQLErro r.java:957)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.ja va:3376)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.ja va:3308)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:18 37)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java :1961)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionIm pl.java:2543)
    at com.mysql.jdbc.PreparedStatement.executeInternal(P reparedStatement.java:1737)
    at com.mysql.jdbc.PreparedStatement.executeUpdate(Pre paredStatement.java:2022)
    at com.mysql.jdbc.PreparedStatement.executeUpdate(Pre paredStatement.java:1940)
    at com.mysql.jdbc.PreparedStatement.executeUpdate(Pre paredStatement.java:1925)
    at org.jboss.resource.adapter.jdbc.CachedPreparedStat ement.executeUpdate(CachedPreparedStatement.java:9 5)
    at org.jboss.resource.adapter.jdbc.WrappedPreparedSta tement.executeUpdate(WrappedPreparedStatement.java :251)
    at org.springframework.jdbc.core.JdbcTemplate$2.doInP reparedStatement(JdbcTemplate.java:797)
    at org.springframework.jdbc.core.JdbcTemplate.execute (JdbcTemplate.java:590)
    ... 23 more
    I think we are facing this issue only in 1.0.1. We haven't tested completely in 1.0.0. We used to have lot of jobs even in m4, but we haven't faced this issue. But i might be wrong.

    regards,
    Ramkris

  • #2
    How many different jobs are you launching every two minutes? You would have to be launching quite a few for the instance insert to deadlock.

    Comment


    • #3
      Originally posted by lucasward View Post
      How many different jobs are you launching every two minutes? You would have to be launching quite a few for the instance insert to deadlock.
      we have only two jobs which runs for every 2 mts. They don't do much. They just update a flag.

      Comment


      • #4
        What database are you using? When you create a new JobInstance it runs the query you listed in your issue, but only when creating a new instance. Even while the job is running, the job instance table shouldn't be touched at all. Unless you have something else hitting the instance table, there shouldn't be a deadlock issue.

        Comment


        • #5
          Originally posted by lucasward View Post
          What database are you using? When you create a new JobInstance it runs the query you listed in your issue, but only when creating a new instance. Even while the job is running, the job instance table shouldn't be touched at all. Unless you have something else hitting the instance table, there shouldn't be a deadlock issue.
          Thats for your reply. We are using mysql 5.1.23. What i am saying is that we are getting this exception when multiple jobs are scheduled. There are some jobs which takes 30-60 mts to complete. Also we have jobs which runs for every 2 mts. During this time, deadlock occurs. It happens not just creating jobs instances, but also while creating job executions and step executions.

          Comment


          • #6
            Yes, but inserts are creating new records into the database, so there shouldn't be any locking issues unless the database is escalating the updates to other records to table locks.

            Comment


            • #7
              The deadlock might just be a sign that you are trying to launch the same job *simultaneuously* (i.e. not 2 minutes apart) with the same JobParameters (that's what the big deal about the isolation level is on the create* method in Jobrepository - it actually protects you from something bad happening).

              Comment


              • #8
                The same problem

                Hi all,

                I have a very similar problem but with in-memory map. Here is the exception I get:
                Code:
                java.util.ConcurrentModificationException
                	at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
                	at java.util.AbstractList$Itr.next(AbstractList.java:343)
                	at org.springframework.batch.core.repository.dao.MapJobInstanceDao.getJobInstance(MapJobInstanceDao.java:39)
                	at org.springframework.batch.core.repository.dao.MapJobInstanceDao.createJobInstance(MapJobInstanceDao.java:27)
                	at org.springframework.batch.core.repository.support.SimpleJobRepository.createJobExecution(SimpleJobRepository.java:180)
                	at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:79)
                	at com.mycompany.server.batch.job.JobBootstrapper.startJob(JobBootstrapper.java:170)
                	at com.mycompany.server.batch.job.JobBootstrapper.blockingStartJob(JobBootstrapper.java:122)
                	at com.mycompany.server.batch.job.JobEndListener.afterJob(JobEndListener.java:55)
                	at org.springframework.batch.core.listener.CompositeExecutionJobListener.afterJob(CompositeExecutionJobListener.java:57)
                	at org.springframework.batch.core.job.SimpleJob.execute(SimpleJob.java:132)
                	at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:86)
                	at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:49)
                	at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:81)
                	at com.mycompany.server.batch.job.JobBootstrapper.startJob(JobBootstrapper.java:170)
                	at com.mycompany.server.batch.job.JobBootstrapper.blockingStartJob(JobBootstrapper.java:122)
                	at com.mycompany.server.batch.job.BaseJobStarter$1.call(BaseJobStarter.java:25)
                	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
                	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
                	at java.lang.Thread.run(Thread.java:619)
                and
                Code:
                java.util.ConcurrentModificationException
                	at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
                	at java.util.HashMap$ValueIterator.next(HashMap.java:822)
                	at org.springframework.batch.core.repository.dao.MapJobExecutionDao.findJobExecutions(MapJobExecutionDao.java:49)
                	at org.springframework.batch.core.repository.support.SimpleJobRepository.getLastStepExecution(SimpleJobRepository.java:256)
                	at org.springframework.batch.core.job.SimpleJob.shouldStart(SimpleJob.java:180)
                	at org.springframework.batch.core.job.SimpleJob.execute(SimpleJob.java:107)
                	at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:86)
                	at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:49)
                	at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:81)
                	at com.mycompany.server.batch.job.JobBootstrapper.startJob(JobBootstrapper.java:170)
                	at com.mycompany.server.batch.job.JobBootstrapper.blockingStartJob(JobBootstrapper.java:122)
                	at com.mycompany.server.batch.job.BaseJobStarter$1.call(BaseJobStarter.java:25)
                	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
                	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
                	at java.lang.Thread.run(Thread.java:619)
                I have 14 jobs running concurrently (they are started and stopped randomly according to the data received in blocking queues (1 queue per 1 job)). They are bootstrapped via spring @Component annotation. After a job finishes its execution it is restarted via afterJob() method in my JobListener implementation. A job waits for a data to appear in a blocking queue, and when it is received the job is initially created and started.
                Everything seemed work correctly until at some point I saw a race condition and the above exception started to be thrown for different jobs until some of them stopped to restart.
                I've upgraded the spring-batch to 1.0.1.RELEASE. In 1.0.0.FINAL I didn't get such exceptions.

                Can anybody explain to me what is wrong with restarting a job (when restarting I change a start date parameter and file name so the job configuration is different) in JobListener.afterJob()? Or is it a new spring-batch version issue?

                Thanks in advance,
                Alex

                Comment


                • #9
                  Do you have a separate repository for each job you are running? Or are they all hitting the same one?

                  Comment


                  • #10
                    The Map*Daos were never designed to be thread safe. To be honest, we only ever thought they would be used for testing. They can be made thread safe if you open an issue in JIRA. (N.B. this is not related to the original topic of this thread.)

                    Comment


                    • #11
                      I have simulated the select/insert condition for a new job instance which gives a deadlock. Here is the setup (in mysql):

                      1) I created a table to mimic BATCH_JOB_INSTANCE:

                      create table A
                      (
                      JI bigint(20) not null,
                      JV bigint(20) null,
                      JN varchar(100) not null,
                      JP varchar(2500) null,
                      primary key using btree ( JI )
                      );

                      2) Start a mysql session and enter the following command to ensure that serializable transactions are used:

                      set @@global.tx_isolation = SERIALIZABLE

                      3) Quit the mysql session. I found that I needed to restart my mysql command line app in order for session tx_isolation to behave consistently.

                      4) Start two mysql sessions. I used the command line mysql app.

                      5) Ensure that the session level txn isolation level is serializable at both sql prompts:

                      select @@session.tx_isolation;

                      6) In the first command prompt, issue these commands:

                      start transaction;
                      select JI from A where JN = 'job1' and JP = 'jp1';

                      *** no results should appear

                      7) In the second command prompt, issue these commands:

                      start transaction;
                      select JI from A where JN = 'job2' and JP = 'jp2';
                      insert into A values ( 1, 0, 'job2', 'jp2' );

                      *** The select should return no results and the insert should block.

                      8) In the first command prompt, issue the following command:

                      insert into A values ( 1, 0, 'job1', 'jp1' );

                      *** A deadlock error should be displayed in one of the two command prompts.


                      As a solution, it would be nice to be able to inject my own SQL in the Jdbc DAO's. At first glance, I would use SELECT ... FOR UPDATE in for these queries. There may be some other dependencies that make this unwise, but I haven't looked that far yet.

                      Comment


                      • #12
                        I can see what you are doing. You are simulating two different job instances starting simultaneously, except that the primary keys are equal (I hope Spring Batch is not trying to do that). I can't see any need to block there, except for the overlapping primary key, but I guess maybe there are platform-specific variations in the interpretation of the isolation level.

                        Just to clarify: if the JN and JP were the same, the exception would be expected and necessary, to prevent a duplicate job execution from being created.

                        What happens if you use a different key? What happens if you use REPEATABLE_READ instead of SERIALIZABLE?

                        We don't use SELECT ... FOR UPDATE because it isn't supported on some platforms. The isolation level seemed to be an easy way to gracefully downgrade in the case that the platform fell in that category.

                        Comment


                        • #13
                          Deadlocks can occur on indexes

                          Originally posted by Dave Syer View Post
                          The deadlock might just be a sign that you are trying to launch the same job *simultaneuously* (i.e. not 2 minutes apart) with the same JobParameters (that's what the big deal about the isolation level is on the create* method in Jobrepository - it actually protects you from something bad happening).
                          I've seen problems in the past where the deadlocks occured on the indexes, especially when the tables were small - meaning very few records accumulated yet. The only way I know how to find these are with the help of the DBA's. I'm not sure how this is done with MySQL as I've never used it in a system test or production environment but you should at least be able to run something like a show plan and see how its accessing the indexes. The running of concurrent batch jobs is one of the scenarios where we've seen deadlocks in the past on legacy batch architectures.

                          Comment


                          • #14
                            I apologize ...

                            I had a typo in my post. The primary keys should have been different. I have retested with different primary keys and reproduced the deadlock.

                            I tried using REPEATABLE_READ and it worked. I dont see any problem using REPEATABLE_READ based on what I see in the code. The samples were using SERIALIZABLE, so I want to make sure no one else has an issue with using REPEATABLE_READ. Anyone have any thoughts on this?

                            Still, it would be nice to be able to inject my own SQL. I'm trying to extend the JDBC DAO's to use FOR UPDATE. Just want to see what happens. I agree that it would be bad to use FOR UPDATE as a default.

                            Comment


                            • #15
                              REPEATABLE_READ is fine for all the platforms I ever tried, but the list wasn't exhaustive. I'll put a comment in the docs to say that it might be a better choice for some platforms and use some of the comments from this post to illustrate the point.

                              Comment

                              Working...
                              X