Announcement Announcement Module
Collapse
No announcement yet.
Database ItemReaders - which one better? Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Database ItemReaders - which one better?

    Hi All,

    This might not be a question purely related to Spring Batch, but thought if someone could help me out

    I have a table with about hunderd thousand rows and and each row having about 80 columns. I need to run a job to get hold of the data and write them to a csv file.

    My application runs on a different Linux server, and database is on a different linux server.

    Now my questions are
    (a) If a StoredProcedure is returning a cursor which is actually holding about 1000 rows, whenever I call ResultSet.next() - will it read from memory of my application server or make a network call to database server and get the data from the memory.

    (b) As per the SpringBatch documentation, if using JDBCCursorItemReader or StoredProcedureItemReader : calling read() once wold suffice, as all the data would be mapped to a POJO via Rowmapper.
    It was mentioned "the big advantage of ItemReader is that it allows items to be streamed'. What exactly is the meaning of being streamed? Does this mean, before the handle is given back to step, all of the cursor data is copied in a list?

    (c) When looking at performance, is StoredProcedureItemReader better or JDBCPagingItemReader? Number of rows can go up to 100 thousand .

  • #2
    If all you are doing is reading right out of a table.. use a JDBCCursorItemReader. Every time you call read, your code gets 1 row, and you can process it.

    (Behind the scenes spring batch is streaming it.. meaning 1 call to read gets 100 or 1000 rows... but you don't care about this). All you care about is
    a) Each read gives you 1 row from your dataset
    b) Reads are fast (due to behind the scenes magic).

    You should be able to stream 400,000 rows in a minute or two with a single thread. If you need faster performance you can multi-thread and partition it, but don't do this unless you really need to.

    P.S. Your application will only hold a few hundred rows in memory at a time. If you are on any half decent modern hardware.. give spring batch 2 or 3 GB of heap and it will be fine.

    Comment

    Working...
    X