Announcement Announcement Module
Collapse
No announcement yet.
Problem with import huge amount of data using hibernate Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with import huge amount of data using hibernate

    Hi,

    I've got a problem with import data from file to the database.
    I'm using standard csv file but this file has over 1 000 000 rows.
    One row represent one object so I wrote basic import function with this scenario:

    Code:
    -load csv file
    
    -start loop
    	-get row
    	-create object and set data from row
    	-save object using hibernate (I'm using HibernateCallback())
    -end loop
    
    -close file
    Problem is in the performance. I need long long long time to execute this import. I was searching
    for better solution how import a huge amount of data to the database using hibernate but nothing useful found.
    I think it's common problem of the most applications. So if anybody knows it will really help me.

    Thank you for every answer.

    Best regards
    Martin[/code]

  • #2
    I would suggest to read some good book on the Hibernate, like "Java Persistence with Hibernate", they consider this problem quite detailed.
    As well you can look on to chapter 13 of Hibenate Reference ("Batch processing").

    To put it very short -
    1. o not try flush or commit every record, do it in some reasonable chunks, few hundreds or (if memory permits) few thousands records per chunk.
    2. even better - consider the usage of StatelessSession

    Regards,
    Oleksandr
    Originally posted by Martin Valach View Post
    Hi,

    I've got a problem with import data from file to the database.
    I'm using standard csv file but this file has over 1 000 000 rows.
    One row represent one object so I wrote basic import function with this scenario:

    Code:
    -load csv file
    
    -start loop
    	-get row
    	-create object and set data from row
    	-save object using hibernate (I'm using HibernateCallback())
    -end loop
    
    -close file
    Problem is in the performance. I need long long long time to execute this import. I was searching
    for better solution how import a huge amount of data to the database using hibernate but nothing useful found.
    I think it's common problem of the most applications. So if anybody knows it will really help me.

    Thank you for every answer.

    Best regards
    Martin[/code]

    Comment


    • #3
      Hi Oleksandr,

      thank you for help. I tried "batch processing" and result is 50 thousand rows in 6-7 minutes. It's not optimal. Maybe I need better hardware.

      Comment


      • #4
        If you really want performance...

        ...don't use Hibernate.

        the extra overhead of Object creation (not to mention the network overheads of jdbc) make bulk importing of data this way inherently slow.

        If your database has any of its own bulk import tools use them instead -it will be MUCH faster.

        Alternatively look at third party ETL tools.

        Oleksandr's suggestions will get the best out of Hibernate but that is as quick as you'll get.

        Comment


        • #5
          Originally posted by zebthecat View Post
          ...don't use Hibernate.

          the extra overhead of Object creation (not to mention the network overheads of jdbc) make bulk importing of data this way inherently slow.

          If your database has any of its own bulk import tools use them instead -it will be MUCH faster.

          Alternatively look at third party ETL tools.

          Oleksandr's suggestions will get the best out of Hibernate but that is as quick as you'll get.

          hi,

          thank you for post

          but I need work with objects. After I get data from file, I need create object, set data from file, set my own data (some extra attributes) and then save it.

          Comment


          • #6
            What requirment do you have for processing speed?

            Which DBMS are you using?

            Is each record in your file mapped to one object only or to many objects?

            Are records interrelated (e.g. does some parent/child relationship exists between records, or someting similar)?

            Regards,
            Oleksandr
            Originally posted by Martin Valach View Post
            Hi Oleksandr,

            thank you for help. I tried "batch processing" and result is 50 thousand rows in 6-7 minutes. It's not optimal. Maybe I need better hardware.

            Comment


            • #7
              I'm using oracle 8.3. One file is one table and one row of the file is one object (row in the table). I've got some relation between tables (foreign key).

              This import will be executed every week in exactly defined time. So I need import data such a way that doesn't very affect my application, if it's possible.

              Regards,
              Martin


              Originally posted by al0 View Post

              What requirment do you have for processing speed?

              Which DBMS are you using?

              Is each record in your file mapped to one object only or to many objects?

              Are records interrelated (e.g. does some parent/child relationship exists between records, or someting similar)?

              Oleksandr

              Comment


              • #8
                If you would be able to reveal some more details then some better solution may (or may not ) be suggested.

                Anyway, Hibernate, if used properly, in this scenario is not that slow comparing to the puring JDBC (percents, not times). And Stateless session may improve it even more (as it has not session cache and associated overhead).

                If you need manifold improvement your only bet is database builtin bulk load tool (if your DB includes any) or tool that is able to use bulk-load (AKA direct load) API of your database (if any).

                Note, that such tools quite often can provide some kind of datavalidation/transformation, including calculation of additional attributes.

                E.g. in case of Oracle you may define your CSV file as "external table" and then use it in any SELECT statement, joining it with other tables and so on.
                It opens almost endless possibilities.

                BTW, you have written "but I need work with objects." - this statement is inherently wrong (unless you have specific order from your chef which you can not contradict), object are not goal, they are only means to achive goal - create fast, reliable, maintanable software. If they not help, but impede, then get rid of them .

                Originally posted by Martin Valach View Post
                hi,

                thank you for post

                but I need work with objects. After I get data from file, I need create object, set data from file, set my own data (some extra attributes) and then save it.

                Comment


                • #9
                  First of all, you do not use Oracle 8.3, because such version simply does not exist.

                  The last Oracle versions that had 3 in the second place were 7.3.x - way too old.
                  It may be 8.0.3 (very old, does not supported any more and full of bugs), but I doubt so as well.

                  The last version of 8.x was 8.1.7.4, of 9.x - 9.2.0.8 and of 10.x - 10.2.something. So, may you prove this version once more? Just connect to DB with SQL*Plus and you would obtain something like to
                  Code:
                  SQL*Plus: Release 10.2.0.1.0 - Production on Thu Jun 19 12:44:24 2008
                  
                  Copyright (c) 1982, 2005, Oracle.  All rights reserved.
                  
                  
                  Connected to:
                  Oracle9i Release 9.2.0.8.0 - Production
                  JServer Release 9.2.0.8.0 - Production
                  Ok, anyway your case lloks as text-book case for usage of SQL*Loader with direct=yes, unless your tables have relations based on surrogate (system-generated) keys. But in this case you anyway may escape with external tables, if your Oracle version supporst them (introduced in 9.0).

                  Regards,
                  Oleksandr

                  Originally posted by Martin Valach View Post
                  I'm using oracle 8.3. One file is one table and one row of the file is one object (row in the table). I've got some relation between tables (foreign key).

                  This import will be executed every week in exactly defined time. So I need import data such a way that doesn't very affect my application, if it's possible.

                  Regards,
                  Martin

                  Comment


                  • #10
                    SQL*Loader faq's

                    http://www.orafaq.com/wiki/SQL*Loader_FAQ

                    is as good a place to start as any.
                    You can schedule your load as a db job - Shouldn't interfere with our web app at all.

                    Comment


                    • #11
                      Typos

                      You can schedule your load as a dbms job - Shouldn't interfere with your web app at all.

                      Comment


                      • #12
                        Really, the best start point is SQL*Loader own documentation.

                        Concerning DB job - it may be problematic to use DB job to start extenal programm (namely, SQL*Loader). OS scheduler seems to be better alternative. Especially if files tp load are not colocated with DB.

                        Regards,
                        Oleksandr

                        Originally posted by zebthecat View Post
                        http://www.orafaq.com/wiki/SQL*Loader_FAQ

                        is as good a place to start as any.
                        You can schedule your load as a db job - Shouldn't interfere with our web app at all.

                        Comment

                        Working...
                        X