Announcement Announcement Module
No announcement yet.
Handling Duplicates Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Handling Duplicates

    I have a use case that I like to get some ideas as how best to implement it.

    Here goes.
    Flatfiles are uploaded via a web portal. These files are currently parsed and loaded into a database table.
    The first file that is uploaded is usually and full export of data so I've implemented a simple Step that inserts the records into the database.
    Additional files contains updated or new records since the last export.

    My idea is to run a first step by just inserting the records into the database.
    The second step would take any duplicate exceptions and call a second step that would call an update statement instead of an insert.

    What's the best way to handle this?
    Should I create a Listener that writes all the duplicate records to a staging table and step 2 iterates it and calls and update statement instead?

  • #2
    It depends upon how you are recognizing the duplicates. If you're letting them fail and considering them skipped records, you could use a SkipListener to write out the failed records. Otherwise, you could check for the duplicate and decide where to write it based on whether it already exists. This would mean an extra database roundtrip though. If you go with the first option, be sure to set the exception to not rollback (assuming not rolling back is okay, which depends upon whether or not you're writing anything else out)

    In 2.0, you can change the exit code and decide to only go to the compensating step if it has that code. In 1.x, you'll have to let it always go to that step an no-op if there are no skipped records.