Announcement Announcement Module
No announcement yet.
JdbcCursorItemReader / DB2 - limited rows read before out of memory Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • JdbcCursorItemReader / DB2 - limited rows read before out of memory


    I'm using the JdbcCursorItemReader for some benchmarking exercises against a very large table in DB2 (type 4, z/OS connect).

    While the benchmarks are fast, I'm having an issue with memory....

    I'm only able to process about 64000 rows before I get an out of memory error... it's not from the framework but I'm wondering if anyone has seen this and hopefully has a workaround? This is java 1.6.0_03.

    ( just has a while loop doing JdbcCursorItemReader read()s)


    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(
    at org.springframework.batch.item.database.JdbcCursor ItemReader.doRead(
    at Driver.main(
    Edit/Delete Message

  • #2
    i also facing a bit problem while using mysql. The problem I face is when I start the step which using the jdbccursor to read a very large table, it has to wait a very long time before it really start to read. If I didn't set the fetchsize will hit the java heap space error. Did you set the fetchsize? Hope will help


    • #3
      The rows actually start returning quickly for each for processing.

      I've played with the fetchsize.. making it larger and smaller... it doesn't seem to have any effect.... always terminates with the out of memory after the same number of rows are returned to be processed.

      I'm guessing it it something in the DB2 driver.

      It's disappointing since this defeats the purpose of using JdbcCursorItemReader to begin with. We've started looking at the driving query based readers.

      I have mySQL but haven't tested there. I may get to it, just to confirm it's a driver issue. I'm guessing people use this every day with far more than 65,000 rows for batch processing.


      • #4
        I know a few other projects that have had issues with cursors in DB2 on massive datasets, so I'm not surprised. In fact, it was the original reason for the creation of the DrivingQuery readers. You may also want to look at the new paging readers that Thomas has committed in M2. They're functionally very similar to the DrivingQueryItemReader (although not limited to just keys), but also page, allowing much larger datasets to be used.


        • #5
          Thanks Lucas.

          We'll take a look. The main concern with DrivingQueries is the key, which we don't have on a number of our largest tables. The paging readers may be just the thing we need.


          • #6
            What is the real problem?

            I'm a bit curious about this problem.

            An ordinary batch job always work directly against the tables by opening a cursor, fetch a row - do processing (maybe updating via the cursor), then loop around the fetch until SQLCODE +100, and at last close the cursor. In this case there will never be more data in the program than what a single fetch can bring into it. Further more, the data is worked on directly and with a reasonable commit frequency (we use maximum 5 seconds), the batch job can be run in parallel with other processing (like online transactions).

            If we have a medium sized table with 100 million rows and an average rowsize of 10,000 bytes (a row containing some ordinary columns and an XML document), the memory needed to run the batch against the 100 million row table will never be more than 10,000 bytes.

            In your example, it seems like your batch is reading the complete result set into memory to subsequently process it. Getting out-of-memory after 64,000 rows means that with a rowsize of 10,000 bytes you've spent about 640 MB of memory. In the above example (with a 100 million rows), this would have required 1 terabyte of memory. Furthermore, a lot of updates might have been done against the real table while your batch is processing the copy it has got in its own memory - which translates into erroneous results.

            So, what is wrong here? Is it the way Spring Batch runs against the database or is it an error in the jdbc driver? BTW, what is IBMs reaction if you report a defect on the jdbc driver?



            • #7
              I think there is a memory leak in the jdbc driver. On a hunch, I tried the DB2 JDBC driver from DataDirect. Unfortunately, it doesn't support the preparedStatement(String, int, int, int) method which Spring uses so I wasn't able to run it. I don't think the issue is with Spring Batch. I still want to run the same test in mySQL (for example) just to prove the point that's it's the driver, not the framework.

              IBM is unaware of any problem. We have opened a PMR but frankly our results have been mixed when dealing with JDBC driver problems in the past with them. I'll post any updates on my testing or from IBM.


              • #8
                I'm wondering if it is the ResultSet.HOLD_CURSORS_OVER_COMMIT setting that causes the driver to hold on to some objects in memory. Could you run a plain JDBC cursor test with this setting turned on and off to see if this is the case. Best way to verify this would probably be to use a profiler to monitor the memory use. Sun's NetBeans IDE has a good one built in and its free.


                • #9
                  Out of memory exception

                  I also get the problem on a PostgreSQL database, retrieve a large resultset 1M records.

                  I'm currently investigating if the OOME also happens using plain JDBC statements like suggested by Thomas.

                  When I know the results I will post them here.


                  • #10
                    Out of memory exception


                    Here are the results. I get the same error OOME using straight forward JDBC statements. It does not even matter whether I specify the ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY,
                    options or not. Also specifying the fetchSize has no influence on the OOME.

                    The OOME happens when retrieving the ResultSet, not when iterating over it.

                    Here is the code

                    public static void main (String[] args) {
                    	try {
                    		System.out.println("Driver Loaded.");
                    		String sql = "select * from strand_props limit 200000";
                    		Connection conn = DriverManager.getConnection("jdbc:postgresql://db1.test.sandd.hdc/sandd", "sandd", "[email protected]");
                    		System.out.println("Connection obtained.");
                    		PreparedStatement preparedStatement = conn.prepareStatement(sql, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY,
                    //		PreparedStatement preparedStatement = conn.prepareStatement(sql); //1
                    		System.out.println("Prepared statement created.");
                    		ResultSet rs = preparedStatement.executeQuery();
                    		System.out.println("ResultSet Retrieved. Starting iterating.");
                    		int cnt= 0;
                    		while ( {
                    			if ((cnt % 1000) == 0) {
                    				System.out.println("Retrieved " + cnt + " records from resultset.");
                    		System.out.println("Retrieved " + cnt + " records from resultset.");
                    	} catch (ClassNotFoundException e) {
                    		System.out.println("No Driver Found.");
                    	} catch (SQLException e) {
                    		System.out.println("SQL Problem.");
                    The exception happens before ResultSet Retrieved. Starting iterating. is printed on the console.

                    The jdbc-driver I used is postgresql-8.3-603.jdbc3, the latest version. There is one more version I can try (postgresql-8.3-603.jdbc4).

                    I keep you posted.


                    Patrick Vanbrabant


                    • #11
                      Good news!!

                      I did a trace with IBM and they came back with the following settings for the jvm:

                      -Xmx1024M -XX:PermSize=256M -XX:MaxPermSize=256M

                      I can't say I *yet* understand why this works, but so far it does! My test is well past 10 million rows.... I've asked IBM for reference material on this. I'll try to figure out why this works, wanted to pass this along immediately though for others that may be dealing with this issue.

                      Thanks for all the help, hope it helps others.


                      • #12

                        Have you tried a smaller fetch size - like setFetchSize(100)? That could reduce the memory usage.


                        • #13
                          I have tried with no fetchsize at all. The same problem.
                          FetchSize 100 -> Same problem.
                          FetchSize 10 -> Same problem

                          All of these test were done using this small sample application (using plain JDBC) from my previous post.

                          It looks like executing the query on PostgreSQL simply returns the complete resultset. This seems to be causing the OOME.

                          In the final (production case), I will need to retrieve more than 8M records from this same table. I will try to expiriment with my JVM memory settings, and see what the results are.


                          Patrick Vanbrabant