Announcement Announcement Module
Collapse
No announcement yet.
Records omitted from query using JpaPagingItemReader Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Records omitted from query using JpaPagingItemReader

    I have a Spring Batch tasklet that reads data from a database based on a Status column using a JpaPagingItemReader.

    <bean id="transactionReader"
    class="org.springframework.batch.item.database.Jpa PagingItemReader" scope="step">
    <property name="entityManagerFactory" ref="entityManagerFactory" />
    <property name="queryString" value="select t from Batch t where exportStatus='AWAITING_EXPORT'" />
    <property name="pageSize" value="3" />
    </bean>

    As part of the processing, I need to set the exportStatus to 'COMPLETED_EXPORT' for each Batch. This has been implemented in an ItemProcessor implementation where the attribute setter is called. Consider a Batch of 9 rows retrieved back as a result set from the initial query. The processor changes the underlying data so that the exportStatus is updated as each row is processed. The commit-interval is set to 3 to match the pageSize so after 3 rows are read, the changed Batch objects are flushed to the database as part of the commit. However, the next read skips the middle rows returned from the original query which leaves half of my Batches unprocessed. I understand why this is happening - the query that the JpaPagingItemReader is firing would only retrieve 6 rows the second time around as the exportStatus has been changed on the previous commit. It appears that the reader is holding onto the index of where it last processed (in this case 3) and then processing from that record onwards missing out the first 3 rows of my data that I need to process.

    I could probably get around this by writing another after step to update the status (hold the Id's of the objects to be updated on the execution context). This would have an overhead though as I'd need to reload the data in the after step.

    I'd imagine that this would be quite a common pattern, so would be grateful if anyone could provide some suggestions on a recommended approach to this sort of issue using Spring Batch.

  • #2
    You can't modify the data that the reader is consuming while the step is still active. The most common pattern is to do the updates in another step or after the step in a listener. If you need to do transactional bookkeeping during the step execution, I guess there is no alternative really to using a second table to store the updates.

    Comment


    • #3
      We have the same issue in one of our batch job implementations.

      In the Spring Batch documentation, section 6.9.2 Paging ItemReaders, it states:
      "Each query that is executed must specify the starting row number and the number of rows that we want returned for the page."

      If I specify the property 'currentItemCount' in my reader configuration and set it to zero, how does that work? Will the reader always start at first row in the result set when building the page? Or, will it only do that for the very first page, treating it more as an offset?

      Comment


      • #4
        Originally posted by dannozrx View Post
        If I specify the property 'currentItemCount' in my reader configuration and set it to zero, how does that work?
        See the Javadocs for the reader. Setting currentItemCount=0 has no effect since that is the default. It just says: "unless you know otherwise, start reading from the beginning". Setting it to a non-zero value is a useful feature for partitioned use cases where each reader is responsible for a different range of items.

        Comment

        Working...
        X