Announcement Announcement Module
No announcement yet.
problem in JdbcPagingItemReader with Timestamp as sortkey Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • problem in JdbcPagingItemReader with Timestamp as sortkey

    When there are records more than specified page size with same timestamp, some of the record will be skipped in JdbcPagingItemReader.

    Spring-Batch : 2.1.3
    DB: postgres
    queryProvider: SqlPagingQueryProviderFactoryBean

    page size = 10
    record id 1~20 has time stamp "2010-01-01 00:00:000"
    record id 21~25 has time stamp "2010-01-02 00:00:000"

    firstPageSql generated by PagingQueryProvider will use sortKey column for order by.
    remainingPagesSql will have a query which uses greater than (">") on sort key column in where clause
    ( qlPagingQueryUtils#buildWhereClause).

    remainingPagesSql becomes like this: where sort_key > '2010-01-01 00:00:000'

    So first page query will fetch id 1~10(consider accessed by id order), but next page will retrieve 21~25 because of the condition "sort_key > '2010-01-01' ", thus id 11~20 will be skipped.

    This can be avoided by specifying unique column as sort_key.
    I guess same thing will happen when number of records with same value in sort_key is more than page size.

    I think "OFFSET" should be used to get remaining pages for postgres or PagingQueryProvider needs to use "generateJumpToItemQuery" method ,which seems using offset, for generating the remaining page sql.

    Should I create a new issue in JIRA?


  • #2
    I don't think the paging reader was designed with that use case in mind. Does it make sense to use a non-unique column as the sort key? I don't think it does really because how would you know in your case that there wasn't a new record inserted since the last execution with a duplicate timestamp? Did I miss something?


    • #3
      Since where-clause can be specified, item-reader can retrieve records regardless of when they are inserted or updated.
      In my case, I wanted to read items sorted by modified timestamp for logging and unit test purpose. Modified timestamp column was bulk updated, so there were many duplicated timestamp records.
      Since it took me a little bit time to figure out why all records were not processed, I thought it would be nice sort key works regardless of uniqueness or not.