Announcement Announcement Module
Collapse
No announcement yet.
fastest way to batch insert w/ hibernate Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • fastest way to batch insert w/ hibernate

    Hi

    I have a situation where i need to create a List of say 10000 objects. What's the best way to do this? does spring have something?

    13.1. Batch inserts

    When making new objects persistent, you must flush() and then clear() the session regularly, to control the size of the first-level cache.

    Session session = sessionFactory.openSession();
    Transaction tx = session.beginTransaction();

    for ( int i=0; i<100000; i++ ) {
    Customer customer = new Customer(.....);
    session.save(customer);
    if ( i % 20 == 0 ) { //20, same as the JDBC batch size
    //flush a batch of inserts and release memory:
    session.flush();
    session.clear();
    }
    }

    tx.commit();
    session.close();
    now i'm wondering if spring does the flushing & clearing for me? notice the tx.commit(), i surely don't need that ... just wondering if i'm wasting time saying flush & clear and what spring can help me with, if anything, for batch processing

    thx

  • #2
    I think the best way of answering this is simply to measure the performance. Flushing too often can cause just as many problems as not doing it often enough.

    HibernateTemplate.saveOrUpdateAll(..) lets you insert lots of times at once. This doesn't perform any flushing though. If you want to do this I would recommend HibernateTemplate.execute(HibernateCallback). You can then perform the flush and clear as you want to. As for the tx.commit, if you are using Spring TransactionTemplate of declarative transactions, you don't need to worry about this.
    http://www.springframework.org/docs/...il.Collection)
    http://www.springframework.org/docs/...rnateCallback)

    Comment


    • #3
      yeah i prob should have mentioned that i am using OSIV and all the spring transaction management etc etc. this is very nice stuff! it would be really cool if i could figure out how to not use OSIV and hibernate callback but nothing worked. maybe i'll try that again next week, if i can avoid OSIV it will make my testing context a little simpler.

      just wondering if my explicit call to

      Code:
      private void flushAndClear()  {
          if (getSession().isDirty())  {
               getSession().flush();
               getSession().clear();
          }
      }
      is actually donig anything, i would think that because springs hibernate infrastructure is so good, that i would just bother it by doing lower level stuff like that myself.

      i guess i'll try a few experiements with a large dataset, we'll see what happens and i'll post back.

      Comment


      • #4
        It's very easy not to use OSIV, I'm not using in my current project. I you post a thread for your OSIV issue, I'm sure someone might help. Your call to flush will be doing something, and personally I can't remember the last time I had to call flush. The Spring transaction will do this for you on commit. Large datasets however might be one example of where you might want to use it though.

        Comment


        • #5
          ok well i ran some experiments here. unless i am doing something wrong (very possible), it looks like the answer is to 1) don't mess with the batch size at all, 2) insert objects one by one, 3) don't flush & clear

          i'm quite surprised, but the results speak for themselves: (3700 rows with about 80 columns being read from a csv file. if anyone needs more details on the experiment i'm happy to provide, but i'm 90% sure these are good results

          Code:
          17:05:12,968  INFO Experiment:25 - create individually no batch size - 17.848 seconds.
          
          17:05:31,273  INFO Experiment:32 -batch create with 8 51.575 seconds.
          17:06:23,084  INFO Experiment:32 - batch create with 16  64.398 seconds.
          17:07:27,688  INFO Experiment:32 - batch create with 32 70.037 seconds.
          17:08:37,930  INFO Experiment:32 - batch create with 64  146.881 seconds.
          17:11:05,063  INFO Experiment:32 - batch create with 128 107.341 seconds.
          17:12:52,577  INFO Experiment:32 - batch create with 256 181.478 seconds.
          anyone else find that odd that it is actually slower to do what hibernate recommends as in my first post? maybe i'm doing something wrong, maybe spring does a great job by itself? i'm pretty sure the batch size is getting set correctly. maybe because i'm iterating through the list twice to do the batch create (read from csv, create list)? that is bad but still wouldn't explain the fact that setting the batch size to 8 or 32 is that much worse, it should be at most 2x the individual create .. any ideas?

          here's the relevant portions of the code,

          Code:
          public int createIndividually()  {
              while (csv.readRecord())  {
                  MyObject myObject = readRowFromCsvFile(csv);
                  if (myObject != null)  {
                      getDAO().create(myObject);
                      ++newRows;
          	}
               }
               return newRows;
          }


          Code:
          public int batchCreate()  {
              List<MyObject> myObjectList = new ArrayList<MyObject>();
              while (csv.readRecord())  {
                  MyObject myObject = readRowFromCsvFile(csv);
                  if (myObject != null)  {
                      myObjectList.add(myObject);
                      ++newRows;
          	}
               }
               return getDAO().batchCreate(myObjectList);
          }

          Code:
          public int batchCreate(final List<Entity> entityList)  { // in the DAO
              Long insertedCount = 0L;
              for (int i = 0; i < entityList.size(); ++i) {
                  create(entityList.get(i));
                  if (++insertedCount % batchSize == 0) {
                      flushAndClear();
          	}
              }
              flushAndClear();
              return insertedCount;
          }
          
          
          protected void flushAndClear()  {
              if (getSession().isDirty()) {
                  getSession().flush();
                  getSession().clear();
              }
          }

          Comment


          • #6
            I'm not surprised that the performance wasn't as clear cut as assumed, these things tend to work like that. It would be go to build the list first and then run the tests. There's not much point unless it's fair. Another issue is where the transaction boundary is here? I've found flushing the Session to be something I tend to stay awat from and just let Hibernate manage it.

            Comment


            • #7
              Originally posted by lloyd.mcclendon View Post
              ok well i ran some experiments here. unless i am doing something wrong (very possible), it looks like the answer is to 1) don't mess with the batch size at all, 2) insert objects one by one, 3) don't flush & clear

              i'm quite surprised, but the results speak for themselves: (3700 rows with about 80 columns being read from a csv file. if anyone needs more details on the experiment i'm happy to provide, but i'm 90% sure these are good results

              Code:
              17:05:12,968  INFO Experiment:25 - create individually no batch size - 17.848 seconds.
              
              17:05:31,273  INFO Experiment:32 -batch create with 8 51.575 seconds.
              17:06:23,084  INFO Experiment:32 - batch create with 16  64.398 seconds.
              17:07:27,688  INFO Experiment:32 - batch create with 32 70.037 seconds.
              17:08:37,930  INFO Experiment:32 - batch create with 64  146.881 seconds.
              17:11:05,063  INFO Experiment:32 - batch create with 128 107.341 seconds.
              17:12:52,577  INFO Experiment:32 - batch create with 256 181.478 seconds.
              anyone else find that odd that it is actually slower to do what hibernate recommends as in my first post? maybe i'm doing something wrong, maybe spring does a great job by itself? i'm pretty sure the batch size is getting set correctly. maybe because i'm iterating through the list twice to do the batch create (read from csv, create list)? that is bad but still wouldn't explain the fact that setting the batch size to 8 or 32 is that much worse, it should be at most 2x the individual create .. any ideas?

              here's the relevant portions of the code,

              Code:
              public int createIndividually()  {
                  while (csv.readRecord())  {
                      MyObject myObject = readRowFromCsvFile(csv);
                      if (myObject != null)  {
                          getDAO().create(myObject);
                          ++newRows;
              	}
                   }
                   return newRows;
              }


              Code:
              public int batchCreate()  {
                  List<MyObject> myObjectList = new ArrayList<MyObject>();
                  while (csv.readRecord())  {
                      MyObject myObject = readRowFromCsvFile(csv);
                      if (myObject != null)  {
                          myObjectList.add(myObject);
                          ++newRows;
              	}
                   }
                   return getDAO().batchCreate(myObjectList);
              }

              Code:
              public int batchCreate(final List<Entity> entityList)  { // in the DAO
                  Long insertedCount = 0L;
                  for (int i = 0; i < entityList.size(); ++i) {
                      create(entityList.get(i));
                      if (++insertedCount % batchSize == 0) {
                          flushAndClear();
              	}
                  }
                  flushAndClear();
                  return insertedCount;
              }
              
              
              protected void flushAndClear()  {
                  if (getSession().isDirty()) {
                      getSession().flush();
                      getSession().clear();
                  }
              }
              I am doing a similar thing w/batch processing. I am basically adding elements to a Collection in a Swing Client, then passing this collection by way of HttpInvoker to a Service Layer Bean, which then reads the elements, saves each other, then checks the designated batch size and if met, commits the transaction.

              This works fine if one instance of the process is being executed. However, when run concurrently w/one or more of the same processes, I see the dreaded "multiple sessions attempted to access collection" exception. I have even tried to change the scope on the Service Bean to "prototype", thinking that each getBean() call would deliver a unique bean from the Spring container, hence, a separate collection, but no luck. I have also tried various combinations of AOP/Interceptor controlled transaction demarcation as well as programatically(not HibernateTemplate) demarcated code, using SessionFactory.openSession() for the programmatic style, and SessionFactory.getCurrentSession() for AOP, and get the same exception.

              Is there a way to successfully get concurrent batch inserts working when the process is initiated from the Client and not on the Server?

              Comment


              • #8
                "multiple sessions attempted to access collection" exception means you have an object A referencing a collection C and object B refencing the same collection C. Then you try to bind the object A with session1 and object B with session2. To which session should Hibernate bind the collection C?

                Comment


                • #9
                  Originally posted by dejanp View Post
                  "multiple sessions attempted to access collection" exception means you have an object A referencing a collection C and object B refencing the same collection C. Then you try to bind the object A with session1 and object B with session2. To which session should Hibernate bind the collection C?
                  I understand that part. However, I should be able to get this to work by setting the scope property of my bean which handles the collection to prototype, for example, here is the snipet from my Spring app context, which runs within my distributed web app:
                  Code:
                  <bean id="importXTFService" class="com.xrite.ind.backcheck.service.imports.ImportXTFServiceImpl" scope="prototype">
                          <property name="sessionFactory" ref="backcheckSessionFactory" />
                      </bean>
                  According to the Spring docs . . .
                  creation of a new bean instance every time a request for that specific bean is made (that is, it is injected into another bean or it is requested via a programmatic getBean() method call on the container)
                  This would imply that I should have a unique instance of my ImportXTFServiceImpl bean every time my client request the operation. However, the client requests the operation by way of Spring Remoting/HTTP Invoker, so I am wondering if the proxy has something to do w/the reason why I am not getting a unique instance of my bean for each call.

                  Comment


                  • #10
                    Nope, that will not work. Hibernate doesn't care about instances of your service class, but instances of hibernate persistent classes referenced in multiple hibernate sessions.

                    Comment


                    • #11
                      Originally posted by dejanp View Post
                      Nope, that will not work. Hibernate doesn't care about instances of your service class, but instances of hibernate persistent classes referenced in multiple hibernate sessions.
                      I don't completely understand what you mean here . . . there should be a way to allow multiple Spring Remoting clients to access a bean exposed through the Spring org.springframework.remoting.httpinvoker.HttpInvok erProxyFactoryBean.
                      If in fact the scope=prototype can be applied to the proxied object, then there would be two separate beans, hence, two distinct collection objects, each tying to a distinct Hibernate Session, correct?

                      Here is the method in the Service Bean exposed through the proxy and called from the Rich Client app through HTTP Invoker:

                      Code:
                      public class ImportXTFServiceImpl implements ImportXTFService{
                          
                      /** Creates a new instance of ImportXTFServiceImpl */
                      public ImportXTFServiceImpl() {
                      
                      private static int saveCount = 0;
                      private SessionFactory sessionFactory = null;
                      private Session session = null;
                      private Transaction transaction = null;
                         
                      public void setSessionFactory(SessionFactory sessionFactory) {
                         this.sessionFactory = sessionFactory;
                      }
                      
                      public void importBatch(Collection objs) {       
                             
                             session = this.sessionFactory.openSession();
                             session.setCacheMode(CacheMode.IGNORE);
                             session.getTransaction().begin();
                                    
                             for(Object o : objs) {           
                                 session.save(o);
                                 if(++saveCount % objs.size() == 0)
                                  {                       
                                     session.getTransaction().commit();
                                     session.clear();
                                     saveCount = 0;
                                  }
                             }       
                             
                             session.close();
                          }
                      Is there a property on the org.springframework.remoting.httpinvoker.HttpInvok erProxyFactoryBean bean that needs to be added which controls the scope of the bean exposed by the proxy, similar to how I have the scope=prototype setting on the bean def in the app context running on the server?

                      Comment


                      • #12
                        ok, this is interesting . . .

                        I just ran another test, only this time, the test ran over a local app context started in the same JVM as the unit test, as opposed to a separate JVM for the unit test and the web app containing the Spring Container. Two concurrent tests run at the same time did NOT result in the Illegal attempt to associate a collection with two open sessions exception when run locally in the same JVM, even when the scope on the ImportXTFServiceImpl bean was left at the default(singleton).

                        So, it sounds like this is the result of separate JVMs and perhaps an equality issue of some sort? The exception here is a bit misleading I think.

                        Why would running this in a separate JVM cause this problem? Is there a way to accomodate for this through the proxied object somehow?

                        Comment


                        • #13
                          Problem &quot;SolvEDD&quot;

                          Looks like I have a solution. Using TransactionTemplate did the trick, but I am still not certain exactly why. It may have something to do with way the following from the below snippet of code behaves in the Callback versus the Plain Old Hibernate approach:
                          Code:
                          public void setSessionFactory(SessionFactory sessionFactory) {
                                  this.sessionFactory = sessionFactory;
                              }
                              
                              public SessionFactory getSessionFactory() {
                                  return this.sessionFactory;
                              }
                          
                           . . .
                          
                          getSessionFactory().getCurrentSession().save(object);
                          
                           . . .
                          Here is the refactored importBatch method:

                          Code:
                          public void importBatch(final Collection objs) {       
                                this.transactionTemplate.execute(new TransactionCallbackWithoutResult() {
                                   public void doInTransactionWithoutResult(TransactionStatus status) { 
                                     for(Object object : objs) {             
                                       getSessionFactory().getCurrentSession().save(object);
                                     }
                                   }
                                });
                              }

                          Comment


                          • #14
                            It's not the collection that you pass as a parameter the hibernate complains about (hibernate sees single instances in that collection, doesn't care at all if you pack them into a collection at some point or not). The error comes from the fact that some of the objects in that collection internally share a reference to some (other) collection.

                            Comment


                            • #15
                              However there is still something going on based on the relationship between Hibernate and Spring that has caused this new behaviour. The use of TransactionTemplate, for one, has allowed concurrent requests to work where the "plain old Hibernate" approach and managing the session myself did not work. I understand what you are saying, and that is a pure Hibernate issue. So, what has resulted in this successfully working now? The way TransactionTemplate manages getCurrentSession()?

                              Again, the root problem as I see it is an object in the collection(or one of it's objects which may be contained in a collection) was already associated to an open session. Using the plain hibernate approach, wiring up the bean w/an AOP interceptor and a txadvice w/propagation=NEVER and calling getCurrentSession() resulted in the error, where as the use of TransactionTemplate and getCurrentSession() works.

                              Comment

                              Working...
                              X