Announcement Announcement Module
Collapse
No announcement yet.
JdbcCursorInputSource and OutOfMemoryError Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • JdbcCursorInputSource and OutOfMemoryError

    Hi,

    I've made a batch using JdbcCursorInputSource (spring batch m3). The more lines I read, the more java heap is consumed and then the batch ends by OutOfMemoryError (250Mo used for 200.000 lines in my database).
    I can execute my batch using the -Xmx option but I think that's not a good solution :
    the consumed memory should not increase with the number of lines returned by the cursor.

    JdbcCursorInputSource seems not to close the resultset at commit time.
    Is there some foreseen evolution to answer to this problem?
    I would think JdbcCursorInputSource should create Jdbc resources (statement & resultset) at the beginning of each commitInterval and close them at the end of the commitInterval. This would help manage database locks and resource usage.

    Thanks beforehand.

  • #2
    The cursor doesn't close the connection at commit time because doing so would close the cursor, requiring you to open it again when starting the next transaction which would be very expensive.

    It shouldn't be taking up any heap size, since the only thing it's holding onto is the resultset itself, are you holding the output of the resultset in some kind of collection that isn't being freed?

    Comment


    • #3
      JdbcCursorInputSource and OutOfMemory

      The only thing I do with the resultSet is to pass it to the mapRow method of my RowMapper in which I read it to create some domain Object.
      Then I write domain Object in a file using a Writer that extends FlatFileItem Writer.
      I have replaced my processor by a MockProcessor which do nothing and I've had the same result : an increase of consumed memory during the batch's execution.

      Do you think there's a solution at my problem?
      Do you want an extract of my code?

      (sorry for my bad english)

      Comment


      • #4
        I think posting your code would be helpful. I would imagine there's probably a pesky memory leak somewhere, potentially in the RowMapper, but if there is an issue with the InputSource seeing the code can help to narrow it down.

        Comment


        • #5
          I have done many tests and I have found that the memory increases until 550 Mo when I use
          stmt = con.createStatement(ResultSet.TYPE_SCROLL_INSENSIT IVE,ResultSet.CONCUR_READ_ONLY,ResultSet.HOLD_CURS ORS_OVER_COMMIT);
          and until 50 Mo when I use
          stmt = con.createStatement();
          Then it seems to be a jdbc/Oracle problem.

          I know that JdbcCursorInputSource needs to use TYPE_SCROLL_INSENSITIVE to use absolute method of ResultSet.
          If you have any suggestion, I'm happy to read it.

          If you always want my code with JdbcCursorInputSource or my test without JdbcCursorInputSource, say to me.

          I have also test to close the ResultSet at commit time but you were right, the performances are very bad.

          Comment


          • #6
            Hmmm....i would definitely like to look at your test code. Have you tried setting the fetch size and/or max rows?

            Comment


            • #7
              To detect that the problem comes from TYPE_SCROLL... choice at create statement time, I have created a batch having the following job.xml:
              <bean id="step1" parent="simpleStep">
              <property name="commitInterval" value="20000" />
              <property name="saveRestartData" value="false" />
              <property name="allowStartIfComplete" value="true"/>
              <property name="tasklet">
              <bean class="org.springframework.batch.execution.tasklet .RestartableItemProviderTasklet">
              <property name="itemProvider" ref="DBItemProvider" />
              <property name="itemProcessor" ref="compositeItemProcessor"/>
              </bean>
              DBItemProvider:
              <bean id="DBItemProvider" class="mypackage.VersementItemProvider">
              <property name="dataSource" ref="dataSource" />
              <property name="mapper">
              <bean class="mypackage.VersementRowMapper" />
              </property>
              </bean>
              The provider don't realy use the mapper in my test then I won't give you mapper's code.

              Processor:
              <bean id="compositeItemProcessor" class="org.springframework.batch.item.processor.Co mpositeItemProcessor">
              <property name="itemProcessors">
              <list>
              <!--<bean id="fileItemProcessor" class="org.springframework.batch.item.processor.It emWriterItemProcessor">
              <property name="itemWriter" ref="fileVersementItemWriter"/>
              </bean>-->
              <bean id="mockProcessor" class="com.natixis.sphinx.batch.modele.processor.M ockProcessor"/>
              <!--<bean id="DBItemProcessor" class="org.springframework.batch.item.processor.It emWriterItemProcessor">
              <property name="itemWriter" ref="sqlSupportDao"/>
              </bean>-->
              </list>
              </property>
              </bean>
              Only the MockProcessor is used in my test and it does nothing.
              its code:
              public class MockProcessor implements ItemProcessor {

              public void process(Object data) throws Exception {
              // TODO Raccord de méthode auto-généré

              }

              }
              My VersementItemProvider is:
              public class VersementItemProvider implements ItemProvider {


              private Log log=LogFactory.getLog(VersementItemProvider.class) ;

              private static final String SELECT_SUPPORT_RECORD = "SELECT c.ceseqo \"contrat\", m.ceseqo, s.ceseqo \"support\", m.amount \"amount\",s.cotaux \"taux\",t.valcot \"cote\",s.nbpart \"nbpart\" "+
              " FROM sphmcontrat c, sphmmvt m, sphmsupport s,sphmtypesupport t " +
              " WHERE m.dateff<= '01/02/2008' and m.coperi=1 and m.cocont=c.ceseqo and c.ceseqo=s.cocont and t.ceseqo=s.cosupp and (c.coetco=1 or c.coetco=2) order by c.ceseqo, m.ceseqo, s.ceseqo";

              private static final String SELECT_CLIENT_RECORD = "select p.ceseqo, p.cocivi,p.nompat,p.nommar,p.prenom,p.preno2,p.dah eme,p.cosecu,p.cositf,a.librue,a.copost,a.libvil,a .copays,a.coposn,a.copayn,a.libvin,c.cochar,c.dahe m1,c.numrib from sphmclient c, sphmadresse a, sphmpersonne p where c.cochar=2 and c.copers=p.ceseqo and p.ceseqo=a.copers order by p.ceseqo";
              private Connection con;

              private Statement stmt;

              protected ResultSet rs;

              private DataSource dataSource;
              private RowMapper mapper;
              private boolean initialized = false;


              public Object next() {
              if (!initialized){
              try {
              con = dataSource.getConnection();

              stmt = con.createStatement(ResultSet.TYPE_SCROLL_INSENSIT IVE,ResultSet.CONCUR_READ_ONLY, ResultSet.HOLD_CURSORS_OVER_COMMIT);
              rs=stmt.executeQuery(SELECT_CLIENT_RECORD);
              initialized=true;
              } catch (SQLException e) {
              log.error(e);
              throw new RuntimeException(e);
              }

              }
              Object result = null;
              try {
              if (rs.next()){
              result = new Object();//mapper.mapRow(rs, rs.getRow());
              System.out.println(rs.getRow());
              } else{
              con.commit();
              rs.close();
              stmt.close();
              con.close();
              }
              } catch (SQLException e) {
              log.error(e);
              throw new RuntimeException(e);
              }
              return result;
              }

              public void setDataSource(DataSource dataSource) {
              this.dataSource = dataSource;
              }
              public void setMapper(RowMapper mapper) {
              this.mapper = mapper;
              }
              public Object getKey(Object item) {
              return item;
              }
              public boolean recover(Object data, Throwable cause) {
              return false;
              }
              }
              With this code, my batch returns 199913 lines, consums 550 Mo of memory, the processor time is about 14 s and the duration is about 1minute 20 s.

              When I change create statement like this in my provider:
              stmt = con.createStatement();
              my batch returns always 199913 lines but consums 55 Mo of memory; processor time and duration are similar than previously.

              I have done some tests using setMaxRows(): the less number of rows is read, the less memory is consumed and the less is the duration but I don't have all the results.
              If I close the ResultSet each 10000 records, 85 Mo of memory is consumed with the same time processor as previously but the duration is 5 minutes.

              When I increase fetch size, the performance are better (30 s instead of 1minute 20s for fetch size=10000) and 550 Mo of memory are always consumed.

              If you want some precisions about a test or if you want the code of the batch which uses JdbcCursorInputSource, say to me.

              Comment


              • #8
                So, it seems like tweaking the fetchsize and max rows gives you the result you're looking for?

                550 mb is definitely too much, but you mentioned the tweaks got it down to 55 mb, which seems reasonable, unless I'm missing something.

                Comment


                • #9
                  It looks to me like there was no improvement in the heap usage, but a large ( as expected) improvement in the throughput when the fetch size was tweaked. But it also looks like the problem with the heap size must be in the jdbc driver, if anything. I guess I can sort of understand why the driver might want to eat memory if a cursor is open with such lax policies (it might be called on to reset to any row at any time), and so it probably comes down to a performance tradeoff being made by the RDBMS vendor. Maybe you could try another driver or database platform?

                  Comment


                  • #10
                    Quick Fix ?

                    I encounter the same problem:

                    Memory consumption is increasing alot with the data.
                    I understand it's not a spring-batch error, but I would like a work around.

                    I don't need, in my particuliar case, to have a resultset with random access to rows. (No restart in this case).

                    I would be interested in overriding executeQuery () or the createStatement in my own class, but it's a private method.

                    Gerard COLLIN

                    Comment


                    • #11
                      Tptp

                      Did you guys try using Ecplise TPTP to help narrow down which objects are hanging around, and what's referencing them?

                      Comment


                      • #12
                        Well...

                        In fact, I could get eclipse profiler to run (timeout in running process) and the client don't want to buy a tool.


                        Anyway, if it's a normal behaviour of the oracle driver to keep data in memory when using SCROLLABLE resultset, I guess there's not much I can do.

                        I think I'm going to rewrite a custom JdbcCursorInputSource to override this problem.

                        Gerard COLLIN

                        Comment


                        • #13
                          TPTP &amp; JInsight

                          You can send the profile data to a file, and then analyze it later - this way you shouldn't timeout. Also, if you want a better tool try JInsight from IBM: http://www.alphaworks.ibm.com/tech/jinsightlive

                          Comment


                          • #14
                            I'll try !

                            Howver, this kind of test is planned for later.

                            Gérard

                            Comment


                            • #15
                              Hello,

                              Very interesting thread.

                              Just googled a bit looking at jdbc drivers implementation of scrollability.

                              It appears most jdbc drivers don't implement server side scrollability, but emulate it on the client side.
                              ResultSet scrollability is usefull only with Sql server on Jtds drivers.

                              So, I'm a bit sceptic about JdbcCursorItemReader usability for reading large volumes of data.
                              One way of using JdbcCursorItemReader is like gcollin told to remove TYPE_SCROLL_INSENSITIVE, and forget abour restartability (never call absolute()).
                              Another way is perhaps on restart to reread the data from the last line read (using a separate restartSql and storing the pk of the last processed item ?).

                              This link shows a list of drivers and whether they implement or not server side rs scrollability : http://www.hibernate.org/314.html.
                              Here's a summary (I've just copy/paste the info for quick reference) :
                              1. Oracle : [KO] Client side support
                              2. Sql server on Jtds : [OK] Server side support
                              3. Sybase on Jtds : [KO] Client side support
                              4. Mysql [KO] : Client side support.
                              5. Hsqldb [KO] : scrollable rs not supported.
                              6. Firebird JDBC driver : [KO] client side.
                              7. PostgreSQL JDBC driver ?
                              8. Informix JDBC driver : [OK] by creating a temporary table.
                              [1] Reference documentation for Oracle (http://download.oracle.com/docs/cd/B...t.htm#i1022319).
                              Oracle reference documentation claims scrollability is implemented on the client side.
                              Because the underlying server does not support scrollable cursors, Oracle JDBC must implement scrollability in a separate layer.
                              It is important to be aware that this is accomplished by using a client-side memory cache to store rows of a scrollable result set.
                              Important:
                              Because all rows of any scrollable result set are stored in the client-side cache, a situation where the result set contains many rows, many columns, or very large columns might cause the client-side Java Virtual Machine (JVM) to fail. Do not specify scrollability for a large result set.
                              In fact, Oracle jdbc driver 11.1.0.6.0 default implementation (oracle.jdbc.driver.OracleResultSetCacheImpl) just stores every line of the resultSet in a Vector. The lines are stored in the Vector when they are requested (for instance, if you call rs.last(), it just retrieves every line of the resultSet and store them in the Vector).
                              Oracle allows to override the default behaviour by implementing the OracleResultSetCache interface (for instance caching a number of items equal to the commit-intervall for Spring Batch). Not sure it's the way to go since it's really Oracle-specific.

                              Hope this search helped !

                              Comment

                              Working...
                              X