Announcement Announcement Module
Collapse
No announcement yet.
ItemWriter does not save the data to database until Job completes Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • ItemWriter does not save the data to database until Job completes

    Hi Batchers,

    I am looking for help in the following problem. Any help is appreciated.

    I have an application which runs in spring batch and I have all the required steps defined to run spring batch as it states. However my ItemWriter does not commit to db until the Job completes. This is a problem to me as I run multiple jobs at the same time and I want ItemWriter to commit as and when it saves into DB.
    I am using spring-data to save my data to MySQL db. I tried Step scope but did not help much.

    Any other thoughts on this?
    We are about to goto production and we want to confirm this before we actually go.

    Please help.

    Thanks,
    Triguna
    Spring Batch Developer.

  • #2
    Have you tried using the out of the box JPA ItemWriter? I doubt Spring Data JPA will be aware of the commit interval value.

    Comment


    • #3
      Wow... Thats an answer I was looking for... Thanks. Will try that and let you know if it worked for my scenario.

      Comment


      • #4
        Hey TJ.Cutajar, I tried with the JPAItemWriter.... The catch is, we cannot directly use JPAItemWriter as we have to do lot many things than just writing to the database in the writer (Esp., writing multiple values into DB). So I copied the code from JPAItemWriter (Esp., following code but it did not turned out to be a solution what I was looking for. Still duplicate entries are stored in DB instead of one record because of the multiple threads.

        Code which are copied:

        EntityManager entityManager = EntityManagerFactoryUtils.getTransactionalEntityMa nager(entityManagerFactory);
        if (entityManager == null) {
        throw new DataAccessResourceFailureException("Unable to obtain a transactional EntityManager");
        }

        // Use the entityManager to commit (merge)

        // in Finally block
        entityManager.flush();

        Still the commit does a double commit instead of only one.

        Comment


        • #5
          The problem is, multiple threads are trying to access the same code:

          Following is the snippet of the code:

          String objectCode = mainObject.getObjectCode();
          Object object = objCodeJpaRepository.findByObjectCode(objectCode);
          if(object == null) {
          // Get new object - basic instantiation of the new entity object
          object = new Object(objectCode);

          // Commit the object
          entityManager.merge(object);
          }

          entityManager.flush();

          Since the above code is executed by multiple threads (jobs), the concurrent thread is finding that object does not exist and creating a new object to insert into database. We tried putting this in a synchronized block but until the job completes, the above code did not commit the actual value to db which overrides the purpose of getting the same object from db for duplicate check.

          Job1 -> finds Object does not exist, inserts a new record - Object inserted (only after this job completes)
          Job2 -> Also finds the object does not exist as the Job1 did not complete yet and inserts yet another record (which is same as job1)

          Likewise it continues for the number of jobs we spawn.

          Any idea why is that?

          Comment


          • #6
            Are you using an EntityManager that participates in the Spring based chunk transactions?

            Comment


            • #7
              This is how we are getting the EntityManager when we changed the code which did not result into whats expected in our scenario:
              EntityManager entityManager = EntityManagerFactoryUtils.getTransactionalEntityMa nager(entityManagerFactory);

              In our earlier code, we used the entityManager declared in the applicationContext.xml. I think Spring Batch will internally uses this entityManager?

              Any problem with the above code? Please explain.

              Comment


              • #8
                EntityManagerFactoryUtils is not going to get you a valid EntityManager. Instead, you should be using one of the factory beans (LocalContainerEntityManagerFactoryBean or LocalEntityManagerFactoryBean) assuming you are not using JNDI to look up a container managed EM. You can read more about setting up the correct factory beans here: http://static.springsource.org/sprin...m.html#orm-jpa
                Last edited by mminella; Jun 11th, 2013, 08:39 AM.

                Comment


                • #9
                  Hey I forgot to mention that I have the following configuration mentioned in my applicationContext.xml

                  Code:
                  <!-- JPA Entity Manager Factory Definition -->
                  <bean id="entityManagerFactory" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean">
                  		<property name="dataSource" ref="dataSource" />
                  		<property name="jpaVendorAdapter">
                  			<bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter">
                  				<property name="showSql" value="true" />
                  				<property name="database" value="MYSQL" />
                  			</bean>
                  		</property>
                  		<property name="persistenceUnitName" value="jpa.pg" />
                  </bean>
                  persistence.xml

                  Code:
                  <persistence-unit name="jpa.pg" transaction-type="RESOURCE_LOCAL">
                  
                  Contains all the classes which are used as entities
                  
                  </persistence-unit>
                  I am using LocalContainerEntityManagerFactoryBean. Since the first suggestion is to use JPAItemWriter, I tried using the EntityManagerFactoryUtils, however I always used LCEMFB (Short for LocalContainerEntityManagerFactoryBean).

                  I am still facing the same issue.

                  Comment


                  • #10
                    Hello,

                    We have the same problem..

                    Two concurrent jobs running in parallel, checks for the existence of the object to save.
                    both finds no entry in the database and eventually both saves causing duplicate entries...

                    Does anybody has the solution?

                    Please help...

                    Regards,
                    Madhu
                    Spring developer

                    Comment


                    • #11
                      Can somebody please answer the above mentioned query?

                      Comment


                      • #12
                        Parallel Jobs

                        [QUOTE=madhukeshdg;448245]Hello,

                        We have the same problem..

                        Two concurrent jobs running in parallel, checks for the existence of the object to save.
                        both finds no entry in the database and eventually both saves causing duplicate entries...

                        Does anybody has the solution?

                        ============================
                        First, ItemWriters are not aligned with jobs for writing to their targets. Its a function of the step configuration. However, JPA/Hibernate are challenging because they optimize for performance by holding a txn-cache and releasing objects to an object cache if you've configured that. The first issue is workflow design. The scenario you describe is what is supposed to happen in the JPA/Hibernate world. If you have a hierarchy of object relationships you should design your jobs to process the leaf objects first and then process the more complex objects. Example: say you have a Customer associated with a Case (like welfare or child support) that has Participants of types CustodialParent, CaseWorker, Addresses, PhoneNumbers, etc. etc. You need to plan your jobs to process the Addresses and PhoneNumbers first so that they're in the database to be associated with the higher level objects that share them when they need to find them.

                        Second, with an object graph its just not possible to typically get a high commit-interval. Try setting it to 1 and make sure you flush with each pass. Even this will only work if you fixed the shared object access problem.

                        Hope that helps,

                        Comment


                        • #13
                          Triguna posted on linkedin: Thanks Wayne. I understood that we need to setup commit-interval to 1 and turn off hibernate flush. These are already done considering the same factors whatever you have mentioned.
                          Somehow we are not getting the functionality working even if we do the above said details. As of now we are storing the data into cache and then saving all at once (writer) but if we have a better solution it would help.
                          Is there any work going on towards this in spring-batch which you know off?
                          ===================

                          Do you have a sample project that you could place in a github location and share as an example of the problem? I would need to understand your job.xml, whether your readers are flat files or Hibernate item readers. Then a picture of the object graph or a pruned version large enough to demonstrate the problem would be helpful. If that's not possible I would read this post from Morten Anderson-Gott very carefully as he exposes all of the gotchas with Hibernate. btw - his approach was basically the approach we took to solve our Hibernate issues. The url is here - http://www.andersen-gott.com/2011/11...hibernate.html.
                          Last edited by wxlund; Jun 27th, 2013, 01:04 PM.

                          Comment


                          • #14
                            wxlund,

                            Our problem is two threads are accessing the entity manager but how do we communicate between these threads? Since its the same entity manager we are using (Assuming that two threads uses the same EM because spring-batch tells to application that way?), We are expecting that the second thread would know of the data saved from the first one. However that's not the case with spring-batch as until Job completes (Writer is done), the actual data values are not committed to database nor the entity manager knows the data updates from thread 1 or thread 2.

                            This brings us to a problem that how do we let entity manager know about these threads and also tell EM that the access of these threads to the same EM is valid. Right now we achieved this by caching our data in local objects and then checking these objects from either threads (in a synchronize way) and if it does not exist, commit otherwise do not commit. This is currently working.

                            However we are interested to know if spring-batch provides any better way to solve this. If it does, we can improve our application performance (we do not need to cache anything in the application explicitly as spring-batch JPA does it already for us through Entity Managers).

                            I believe that the problem is EntityManager or Database cache of JPA does not know that there are two threads and the data is not stored into database until Job completes. If one of these two cases work, we have a solution to this problem.

                            What's your thought on this?

                            Thanks,
                            Triguna

                            Comment


                            • #15
                              "Since its the same entity manager we are using (Assuming that two threads uses the same EM because spring-batch tells to application that way?), We are expecting that the second thread would know of the data saved from the first one. However that's not the case with spring-batch as until Job completes (Writer is done), the actual data values are not committed to database nor the entity manager knows the data updates from thread 1 or thread 2."

                              Two threads do not share the same EM. The second thread does not know about the data saved from the first one. Also, meant to correct this earlier, its not about the job completing. Commits occur on chunk boundaries so if you have your commit-interval set at 1 its every graph of objects for the one commit. This is the part where I suggested you make sure you flush. Make sure after each commit-interval that you flush to minimize the objects that would be in your shared graph. From our docs, "PlatformTransactionManager implementation for a single JPA EntityManagerFactory. Binds a JPA EntityManager from the specified factory to the thread, potentially allowing for one thread-bound EntityManager per factory. SharedEntityManagerCreator and JpaTemplate are aware of thread-bound entity managers and participate in such transactions automatically. Using either is required for JPA access code supporting this transaction management mechanism.

                              This transaction manager is appropriate for applications that use a single JPA EntityManagerFactory for transactional data access. JTA (usually through JtaTransactionManager) is necessary for accessing multiple transactional resources within the same transaction. Note that you need to configure your JPA provider accordingly in order to make it participate in JTA transactions." - http://static.springsource.org/sprin...onManager.html.

                              The issue can be correct by refactoring the batch architecture to process leaf objects first so that shared objects already exist when associated in parallel threads. Hope that makes sense. Its not about spring-batch. Its about how JPA/Hibernate manages the transactional session to optimize access performance for clients. Hopefully Morten's blog will help.

                              Comment

                              Working...
                              X