Announcement Announcement Module
No announcement yet.
Data source synchronization? Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data source synchronization?

    TLDR: I need to keep two data sources with the same schema loosely in sync (say, every several hours) without having a guaranteed internet connection at any given time. One is the master and the other contains a small amount of the master's data. The two data sources are backed by different RDBMS types. Is there a canonical way to do this in Spring that I'm just not aware of, or does this sort of thing have a common name I can Google that I just don't know?

    I have a single remote ("master") RMDBS data source serving as a data warehouse for several distributed databases that should always contain a strict subset of the master data source's data. The two data sources have an equivalent schema (i.e. tables and columns have the same names). Data may be uploaded to either the remote or local machines, and local machines should be responsible for pulling down data that is relevant to them from time to time (every few hours is fine). This is not a situation where the master needs to "push" updates down to local machines. However, there is a potential for intermittent connection issues for the local machines, so I can't assume that they can talk to the remote whenever they need to. Additionally, the remote machine is running Oracle but the local ones are using an embedded database since they only need a small subset of the warehouse's data. The local machines can be responsible for pulling the data they need; I'm just now sure how to keep them in sync given that updates can occur to either at any time.

    I first looked into Spring's XA support, but dropped it because of the intermittent connection issues. Uploading data to a local machine even if it doesn't currently have a connection to the remote is a valid action, so I don't want transactions rolled back just because a connection wasn't available. I briefly looked into doing this all in the DB via triggers, but if it's possible then it requires more kung-fu than anyone on our staff has at the moment. (Certainly open to that option though.) I would definitely like to do this in a Spring-friendly way if such a thing exists.

    I am prepared to buckle down and do this "the hard way", i.e. writing our own logic to compare and synchronize based on primary keys. This just seems like a problem that's been solved before. If it matters, I'm using Spring version 3.1.2.
    Last edited by mzekoff; Aug 21st, 2012, 03:28 PM.

  • #2
    Re: Data source synchronization

    The scenario is often called database synchronization or replication. It sounds like you want asynchronous, multi-master, heterogenous replication with horizontal filtering. Asynchronous: the data commits locally without delay and is synchronized remotely in the background. Multi-master: making a change on either database will sync to the other one. Horizontal filtering: one database has a subset of the rows from the other one. Heterogenous: different database systems are being synchronized.

    Take a look at SymmetricDS, which is an open source database synchronization solution that matches those requirements. It runs as a separate Java process, or it can be embedded within an application. It supports Oracle, H2, HSQLDB, Apache Derby, and many other databases. It was designed to work in low bandwith and intermittent connection situations. It's using Spring framework and it's easy to extend and write plug-ins for it. There's also production support and consulting from JumpMind if you need it.


    • #3
      It sounds like you want asynchronous, multi-master, heterogenous replication with horizontal filtering.
      Frankly, I'm blown away right now by 1) the quality of that response, and 2) that such a precisely descriptive name does in fact exist for this problem. Thank you. I'm checking out the SymmetricDS site right now.