Announcement Announcement Module
Collapse
No announcement yet.
JdbcCursorInputSource and OutOfMemoryError Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    That's a really great post. I've known about issues such as this for awhile, which is why I also created the DrivingQueryItemReader. It's the approach used on another client to solve this issue. The problem is, there's really a limit to the number of keys you can store in memory within a jvm. At this client we solved it by 'partitioning' the driving query with a maximum key limit in mind, something like 100,000 keys per partition. You could then split the data up with a query such as : select count(ID) from T_FOOS order by ID. (The order by would be important here) Then you could kick off N jobs where N equals the number of partitions. The JobParameters would then contain the begin and end ID for that partition (giving you separate instances). There's a little work to be done to make sure the KeyCollector implementations can pull and set up a prepared statement from the JobParameters, which I'll be working through today and tomorrow. If you look at the latest trunk I've put something forward that works for JdbcCursorItemReader, but I agree there's some work that needs to be done there too. (I may move some implementation pieces into samples before the release)

    One long term solution to the cursor problem might be to make it forward only and buffer up until mark. It would require holding what the RowMapper returned in a buffer until the next mark, in which case the buffer contents can be thrown away. The only reason the buffer is needed is so that rollback can be supported. I don't have time to implement that right now, but it's an enhancement I'm looking to make in the future.

    Comment


    • #17
      Thanks Lucas !
      You fixed this issue with 1.0.0-FINAL.

      (I was testing my own buffered implementation, I'll throw it away )

      Comment


      • #18
        Yeah, sorry about that, I forgot to mention in this thread that I fixed in final. I wasn't going to touch it until 1.1, but then when I dug into it, i really started to think of it as a bug. It didn't take too long to fix, I'm glad it works for you.

        Comment


        • #19
          I'm glad we fixed the issue. As a further improvement, why couldn't we use the RDBMS to do the scrolling? (It would be platform dependent SQL.) DrivingQueryItemReader would benefit from this as well.

          Comment


          • #20
            I'm not sure I fully understand Dave? It seems like to do that you're talking about some form of pagination? Isn't that quiet a bit different from opening a database cursor? I think you're right that it might hold some promise for the driving query reader, but I think it would be fundamentally different from a cursor reader, unless I'm completely misunderstanding.

            Comment


            • #21
              As a further improvement, why couldn't we use the RDBMS to do the scrolling?
              Something like the following for Oracle ?
              Taken from http://www.oracle.com/technology/ora...o56asktom.html, section 'Pagination with ROWNUM' :
              Code:
              select * 
                from ( select /*+ FIRST_ROWS(n) */ 
                a.*, ROWNUM rnum 
                    from ( your_query_goes_here, 
                    with order by ) a 
                    where ROWNUM <= 
                    :MAX_ROW_TO_FETCH ) 
              where rnum  >= :MIN_ROW_TO_FETCH;

              Comment


              • #22
                Right, but wouldn't that be something like a paginated item reader, rather than a cursor item reader? It seems like two completely different things.

                Comment


                • #23
                  If the framework is smart enough to know about the different RDBMS flavours the interface for the user is identical to the existing readers, isnt it? So we could implement it as a new reader, but that would mean two choices to achieve the same goal with one clear winner (as long as you were on a supported platform).

                  Comment


                  • #24
                    Sorry to come in the middle of a very interesting conversation (this new jdbc reader can be really interesting)

                    I'm glad we fixed the issue.
                    We found a bug in 1.0.0-FINAL JdbcCursorItemReader (sorry !).

                    It happens on second or subsequent restarts.
                    This is because bufferredReader don't remember the previous processedRowCount.
                    bufferredReader#processedRowCount only contains the rows processed on the current restart.

                    When update is called, JdbcCursorItemReader stores in the execution context the rows during this restart. It should store the rows processed during this restart AND the previous ones.

                    This issue can be resolved changing the JdbcCursorItemReader#open code - see //CHANGE, //END CHANGE block at the end (note : a new constructor BufferredResultSetReader(ResultSet,RowMapper,long processedRowCount) would have been better than setting the field directly) :
                    Code:
                    public void open(ExecutionContext context) { Assert.state(!initialized, "Stream is already initialized. Close before re-opening."); Assert.isNull(rs); Assert.notNull(context, "ExecutionContext must not be null"); executeQuery(); initialized = true;
                    long currentProcessedRow = 0; if (context.containsKey(getKey(CURRENT_PROCESSED_ROW))) { try { currentProcessedRow = context.getLong(getKey(CURRENT_PROCESSED_ROW)); while(rs.next()){ if(rs.getRow() == currentProcessedRow){ break; } } } catch (SQLException se) { throw getExceptionTranslator().translate("Attempted to move ResultSet to last committed row", sql, se); } }
                    //CHANGE bufferredReader = new BufferredResultSetReader(rs, mapper); bufferredReader.processedRowCount = currentProcessedRow; //END CHANGE }

                    Comment


                    • #25
                      You're right, I'll add an extra constructor:

                      http://jira.springframework.org/browse/BATCH-549

                      Comment


                      • #26
                        It's fixed in 1.0.1 dev.

                        Comment


                        • #27
                          Thanks Lucas !

                          Comment


                          • #28
                            HI All,

                            actually I'am getting this problem again when using a HibernateCursoItemReader. The driver caches every Row internally which results into 500MB heap data. This constantly crashes all my bigger data dump tasks.
                            Is the bug that existed in the JdbcCursorItemReader not fixed in the hibernate one?
                            Any ideas?

                            I'am using the thin oracle jdbc type 4 driver on 10g.

                            Comment


                            • #29
                              Dump Files

                              Do you have the crash dump file? Are you using an IBM or Sun JVM?

                              Comment


                              • #30
                                Hi,

                                i will get the crash dump as soon as i can, i found the "leak" by profiling through the application with jprofiler. in oracle.jdbc the number of byte[] is constantly rising.
                                The driver stores all the read rows in the OracleResultSetCacheImpl in the resultset. As this is a normal behaviour of the oracle driver, i wonder how i can change that behaviour. Any idea?

                                I'am using sun 1.6, attatched a screen shot.
                                Did a dump by specifiing: -XX:+HeapDumpOnOutOfMemoryError
                                Used the SAP Memory Analyzer, very good to dig through dumps.

                                h-t-t-p://img-up.net/?up=oracleProb1DC2Xi0.png
                                h-t-t-p://img-up.net/?up=dumpVisualX1oB1n.png
                                Sorry, new links, old ones didn't work.

                                Sorry, attatchments are resized to strong, can't link pictures as i'am a new user (need 15 posts).
                                Last edited by hessenmob; Jun 1st, 2008, 05:27 AM.

                                Comment

                                Working...
                                X