Announcement Announcement Module
Collapse
No announcement yet.
Using 2 different datasources with Spring Batch Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using 2 different datasources with Spring Batch

    Hello,

    I am completely new to Spring Batch and am just getting started with it, so I apologize if some of my questions are too obvious!

    In my department we use Microsoft SQL Server databases. Also, we are restricted to the usage of stored procedures when interacting with the database. That's why I am trying to adapt the job and step incrementer to that policy and haven't succeeded so far.

    Therefore, I was wondering if it is possible to use 2 databases simultaneously, the standard one for data processing and keep the embedded HSQLDB (or any other database) only for the framework?

    Cheers,
    Choucri

  • #2
    Choucri,

    You can choose any database that you like to run Spring Batch with. The framework does store metadata in 4 tables (will probably increase with m4 though). You could easily use a database to store metadata in, and another one for application concerns. However, I wouldn't recommend using an in-memory database for the metadata. The value of storing information about a batch run is being to able to look at it in real-time, and report off of it. For instance, you could look back and see what time a job was run two weeks ago, and how many records it processed, etc.

    I'm curious about this 'everything has to be a stored procedure' policy at your company. In a batch scenario, this can cause some serious issues. For example, since the datasets are so large, and take long periods to process, we break them up into 'chunks' and commit after each chunk has been processed. This allows us to restart from the last chunk successfully processed, and keeps transaction sizes at a reasonable size, but not too small for performance reasons. It would not be possible to do this in a stored procedure, since the framework would have no control, and there would be no advantage in a stored procedure that processes one record. Furthermore, most of our 'stock' database item readers do not use stored procedures. The same is true for our metadata storage dao's.

    -Lucas

    Comment


    • #3
      Using Multiple Databases

      I currently use two datasources, one for all the batch tables and sequences, and one for my data store. There is nothing in the framework that prevents you from doing this.

      The only real issue I think you should be concerned about is transaction management across multiple datasources, since you can only register one transaction manager with your step executor.

      I do agree that there is a problem with the policy of only using stored procedures. Unfortunately, as Lucas points out, this can potentially cause you to lose a great deal of the benefit of Spring Batch, but in a corporate environment I understand that you can't do much about this. I can't tell you what you should do about this, as I do not know your position. I would personally push back and say "this is a framework that is outside of our control - create a new username for it in the database that it will use solely and leave it as it is." In any case, my recommendation to you is to try using granularly-defined stored procedure definitions with Spring Batch but keep a close eye on performance and document any problems to review with your management.

      Comment


      • #4
        Thanks a lot guys for your quick reply!

        In fact, the 'stored procedure only' policy is mainly enforced for historical reasons, and was introduced way before ORM frameworks emerged. Unfortunately, at my level I have no power to make an exception.
        Originally posted by dkaminsky View Post
        I would personally push back and say "this is a framework that is outside of our control"
        I can't do this either, since this is the first project in my division to use Spring (I mean not only Spring Batch). In fact, introducing Spring has been a personal initiative, therefore I have to prove that it can be tailored to our architecture and adapt to our policies.

        This is why dedicating a database to Spring Batch and thus having the freedom to choose the way of interacting with it, is a compelling alternative.

        Originally posted by lucasward View Post
        However, I wouldn't recommend using an in-memory database for the metadata.
        In fact, the in-memory database is just a temporary solution to get started with the framework.

        Originally posted by dkaminsky View Post
        The only real issue I think you should be concerned about is transaction management across multiple datasources, since you can only register one transaction manager with your step executor.
        Indeed, that's my biggest worry! Since I'm new to Spring, I'm not familiar with its transaction management model. I presume that using the DataSourceTransactionManager is no longer sufficient. Do I have to use JTA? With 2 data sources what would be the equivalent declaration of:
        "<bean id="sqlTransactionManager" class="org.springframework.jdbc.datasource.DataSou rceTransactionManager"
        lazy-init="true">
        <property name="dataSource" ref="dataSource" />
        </bean>" ?

        Again, thanks a lot for your help!
        Choucri

        Comment


        • #5
          If you wish to use JTA, spring has a specific JtaTransactionManager implemention that you can use. Please refer to the Spring reference documentation for more information.

          Doug is partly right, the StepExecutor does take a transaction manager, which is for the 'business' transaction. However, the storage of metadata needs to have transactions applied around it using AOP, which would allow for the usage of a separate transaction manager. You can see examples of this in our samples. It should also be noted that the only place this intersects is at certain times when the StepExecution or Step is stored as part of the business transaction, which is the case for restart data, however, this model will change a bit in milestone 4.

          Comment


          • #6
            Clarification

            To clarify my previous point:

            1. You have a data source which represents your metadata tables - ie. the Spring Batch schema. This data source is managed using Spring AOP.

            2. You have one or more other data sources which represent your business data. If you only have one, you do not need to use JTA, since you can create a Spring transaction manager and inject that into your StepExecutor. This is the case with my personal situation. However, if you have several "business" data sources, JTA may be necessary to properly control transactions between them, which is the warning I was trying to convey.

            3. Ditto the exception Lucas mentioned.

            Comment


            • #7
              As you said, I created 2 Spring transaction managers, the business one injected in StepExecutor and the second one in a transaction advice applied to batch repository methods, and it works like heaven!

              Thanks a lot for all your help!

              Comment


              • #8
                2 Spring transaction managers

                Originally posted by Choucri FAHED View Post
                As you said, I created 2 Spring transaction managers, the business one injected in StepExecutor and the second one in a transaction advice applied to batch repository methods, and it works like heaven!

                Thanks a lot for all your help!
                Can someone please post an example of Spring Batch with two transaction managers? We will persiste our batch metric data to database_1 (Oracle) and our batch jobs will read and write to database_2 (SQL Server). Our batch environment will not run within an application container and we will be using Spring JDBC for persistence.

                Thanks!
                Brad

                Comment


                • #9
                  There's not too much to post, the configuration for a Step with a transaction manager is already well defined in the reference documentation:

                  http://static.springframework.org/sp...n.html#d0e3409

                  There's nothing stopping you from using a different transaction manager in your Dao from the one you wired in your step. However, without JTA, when the framework rolls back only the transaction manager wired into the step will rollback, not the one in your dao. Furthermore, you would have to find someway to commit on the dao transaction manager as well. The only way to safely get around this is to use JTA. Since you're running out of container, the only option I can think of is Jotm.

                  Comment


                  • #10
                    Multiple data sources for chunk based processing

                    I have 2 different datasources, one to read and another one to write results like below:

                    <job id="sampleJob" job-repository="jobRepository">
                    <step id="step1" transaction-manager="myTransactionManager">
                    <tasklet>
                    <chunk reader="itemReader" processor="itemProcessor" writer="itemWriter" commit-interval="10"/>
                    <tasklet>
                    </step>
                    </job>

                    ItemReader should get data from dataSource_1.
                    ItemWriter should write data to dataSource_2.

                    As per the documentaion, we can configure a single transaction manager at tasklet


                    In this scenario, how do i use the transaction manager here ?


                    Thanks,
                    Vamsi.

                    Comment


                    • #11
                      Originally posted by lucasward View Post
                      There's not too much to post, the configuration for a Step with a transaction manager is already well defined in the reference documentation:

                      http://static.springframework.org/sp...n.html#d0e3409

                      There's nothing stopping you from using a different transaction manager in your Dao from the one you wired in your step. However, without JTA, when the framework rolls back only the transaction manager wired into the step will rollback, not the one in your dao. Furthermore, you would have to find someway to commit on the dao transaction manager as well. The only way to safely get around this is to use JTA. Since you're running out of container, the only option I can think of is Jotm.
                      Hi,
                      I have similar kind of situation. In my Spring Batch application, I have defined two transaction manager and in transaction manager bean declarations, I have used <qualifier> tag to differentiate these transaction manger in my transactional DAO class.

                      First txn manager:
                      <bean id="transactionManager" class="org.springframework.orm.hibernate3.Hibernat eTransactionManager" >
                      <property name="sessionFactory" ref="crmsSssionFactory" />
                      <qualifier value="wfInternal"/>
                      </bean>

                      Second txn manager:
                      <bean id="cddprofileTxnManager" class="org.springframework.orm.hibernate3.Hibernat eTransactionManager" >
                      <property name="sessionFactory" ref="cddProfileSessionFactory" />
                      <qualifier value="cddProfile"/>
                      </bean>

                      During step execution, I need to update data into two databases.First database is same as what I use to store batch meta data. This one uses the transaction manager used by step execution which is "transactionManager". Second db one uses another transaction manager - cddprofileTxnManager. It is committing db updates in second DB, but roll back is not happening in case of exception. Is there anything I can do without using JTA?My application doesn't use any kind of Java EE container.

                      Thanks,
                      Sandhya

                      Comment


                      • #12
                        Without using global transactions of some kind, you will run into the issues of having one transaction's failure not rolling back the other's.

                        Comment

                        Working...
                        X