Announcement Announcement Module
Collapse
No announcement yet.
Writing to 2 jdbc datasources in one step v's 2 steps each with 1 datasource Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Writing to 2 jdbc datasources in one step v's 2 steps each with 1 datasource

    Guys,

    I'd like some advice on the scenario below to understand the pro's and con's of one combined step, or multiple dedicated steps. I'm worried about the cost of incurring distributed transactions in Option 1, versus the cost of re-reading in Option 2 - and any other issues people immediately see with these approaches.

    If there are other worries/options or my scenario wasn't detailed enough please let me know.

    Thanks

    Danny

    General Comments:
    • Datasources A, B & C would be separate schemas/connections, and could be separate databases.
    • Record set to process could be in the many millions
    • Spring-batch running inside J2EE container, spawned by Quartz
    OPTION 1
    Step 1
    - Read from input datasource A
    - Call api that accepts a chunk of records and writes to datasource B
    - Call api that accepts a chunk of records and writes to datasource C
    Assumption: We'd have to combine the two api calls into a single writer implementation.

    OR

    OPTION 2
    Step 1
    - Read from input datasource A
    - Call api that accepts a chunk of records and write to datasource B
    Step 2
    - Read from input datasource A
    - Call api that accepts a chunk of records and write to datasource C
    Assumption: We would have to re-read the same records for Step 2 that we did for Step 1.

  • #2
    Hi Danny,

    Given the fact that two-phase-commit is a blocking protocol, I would implement the use case with Option-II.

    Now about the extra cost involved in re-read operation, depending on the use case you might want to implement it differently. If the two resultant datasets are symmetric (but why would it be?), I might implement it this way:
    1. Read from A --> perform processing --> Write to B (Spring Batch)
    2. Write from B --> C using database features

    Comment


    • #3
      Spring Batch has its own data in one of those databases (or maybe a different one altogether), so neither of your options avoids the distributed transaction. You probably have to bite the XA bullet. Are you sure it is a problem?

      However, an interesting alternative that I never thought of before is to use a different datasource (and transaction manager) for each step in the job. Might just about work.

      Comment

      Working...
      X