Announcement Announcement Module
Collapse
No announcement yet.
Processing data by groups Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Processing data by groups

    We have a requirement where which would require to read entire input file, group data by account and then process each account as an atomic unit.
    Basically delete all the exisisting rows for the account in the database and insert the new rows from the file.

    Here's what I am doing(still in dev).
    Set commit interval to large number (max records in the file could be 100k, so the commit interval is set to 500k), the processor creates a map<acct,dataList> and stores in the stepcontext, on the writer side I have a wrapper that gets the data map from step context and calls the service layer for each account.

    The data that is passed from reader to writer (wrapper) is ignored. So there are basically two sets of data in memory which is not efficient.

    Is there a better way to do this?

    If I were to return null from processor for all the records, will the writer still be called?

    We have several small data files that need to processed with similar pattern (data needs to persisted -all or nothing based on account level)
    Last edited by Nitty; Jan 19th, 2010, 09:04 PM. Reason: update title

  • #2
    Set commit interval to large number
    This is rarely (if ever) a good approach. It loses the benefit of checkpointing your progress that Spring Batch provides and also opens you up for the possibility of running out of memory.

    The better way to do this is to have a first step that loads the file into a special "staging" database table. Then you can have a second step that queries the staging table with a "group by" cause in the SQL to have an all of the information for a specific account as one record (for atomicity). This will allow for you to have checkpointing in both steps (so if a failure occurs, you don't have to redo work) and allows you to perform atomic updates of all information for a particular account.

    Comment


    • #3
      Code:
      Then you can have a second step that queries the staging table with a "group by" cause in the SQL to have an all of the information for a specific account as one record (for atomicity).
      There will be several records / account, how do we chuck the reader per account group
      Last edited by Nitty; Jan 20th, 2010, 08:15 AM. Reason: code tags

      Comment


      • #4
        You can write a wrapper for the SingleItemPeekableItemReader (that in turn wraps a JdbcCursorItemReader). The wrapper will build a list containing all of the account records that make up one group. It will do this by calling peek() to see if the next account is in the group, and, if it is, calling read() to read it and put it in the list. If all of the accounts are read, then it can return the list of accounts.

        Comment


        • #5
          After writing data into staging table I need update some reference data by issuing a sql on the inserted data before, Is there a way to do this without a step or do I need to dummy up the read/write and issue the command in the writer?

          On a seperate note - we have a requirement where I need to poll on a table for a status in between steps, Is there a way to achieve this?

          Comment


          • #6
            To perform a single action (as opposed to an action for each item) use a custom Tasklet.

            As for polling, maybe have an intermediate step that loops until the db is ready, sleeping for a bit on each iteration.
            Last edited by DHGarrette; Jan 20th, 2010, 09:27 PM.

            Comment

            Working...
            X