Announcement Announcement Module
Collapse
No announcement yet.
Asyncronous updates/inserts Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Asyncronous updates/inserts

    One thought on increasing performance we are investigating is the ability to do asyncronous update/inserts to the database.

    Obviously one issue with this would be how to retry/recover if there is an exception.

    But assuming this was figured out, would the Spring Batch way of doing this be though a custom OutputSource? (ie, can a tasklet make asynchronous calls to an OutputSource or DBA object?).

    One recovery possibility would be to make the LUW a group of records(partition), and have a partition table somewhere that the OutputSource/InputSource would use to indicate whether a partition had failed and needed to be retried.

    Any other thoughts on this approach or alternatives?

  • #2
    The Spring Batch way of doing it would be through the StepExecutor. You can already get some benefits of asynchronous processing by configuring the RepeatOperations (the stepOperations of SimpleStepExecutor). I would play with that if I were you before assuming that you need anything more complicated. It will handle retries and restarts in the same way as the synchronous case.

    A custom StepExecutor that handles the partitions you mention would be another option - this is on the roadmap for 1.1. But that is fundamentally a distributed computing solution (multi-VM). In a single VM I'm guessing you can always make progress with the SimpleStepExecutor in one for or another.

    You could write a custom OutputSource, but I'm not sure it would help. Your chunk size would go down to 1, because each call to OutputSource.write() would have to be in its own transaction.

    Comment


    • #3
      Originally posted by struesda View Post
      One thought on increasing performance we are investigating is the ability to do asyncronous update/inserts to the database.
      What do you mean by asyncronous in this case? I think we are looking for something similar, we'd like to write a processor that somehow can use PreparedStatement.addBatch() instead of PreparedStatement.execute() and then after X iterations a call to PreparedStatement.executeBatch() would do the actual update. We ran a few tests and this can be up to 3 times as fast.

      Originally posted by struesda View Post
      Obviously one issue with this would be how to retry/recover if there is an exception. One recovery possibility would be to make the LUW a group of records(partition)
      Correct, that is the main problem I see with our desired approach as well. Unless we change the LUW to a group of records as you rightly point out, however that's a road I'd rather not go down.

      Originally posted by Dave Syer View Post
      The Spring Batch way of doing it would be through the StepExecutor. You can already get some benefits of asynchronous processing by configuring the RepeatOperations?
      A custom StepExecutor that handles the partitions you mention would be another option - this is on the roadmap for 1.1.
      Care to elaborate a bit on those 2 options?

      Comment


      • #4
        close - but a bit further abstracted

        What do you mean by asyncronous in this case? I think we are looking for something similar, we'd like to write a processor that somehow can use PreparedStatement.addBatch() instead of PreparedStatement.execute() and then after X iterations a call to PreparedStatement.executeBatch() would do the actual update. We ran a few tests and this can be up to 3 times as fast.
        We were thinking of letting the OutputProcessor or DBA object handle the addBatch() and executeBatch() steps. All the processor would do would be to call the OutputProcessor.write() command - which would either return immediately - and do the addBatch() step behind the scenes asyncronously, or the write() call itself could be made asyncronous - and the processor would not be waiting for the return.

        In a batch system - these DB inserts are the slowest part of the system - and if we have multiple threads going - we only have more inserts to be waiting on - so if we could remove that wait - and let the inserts be done on their own - that should be a huge performance win.

        Comment


        • #5
          N.B. there is a JIRA for the batched JDBC processor as a new feature (http://opensource.atlassian.com/proj...rowse/BATCH-76). Joris Kuipers also implemented one as part of a small project, and he allegedly found it quite easy to do by using the existing APIs. He said his "ItemProcessor also implement the RepeatInterceptor
          interface, so it would receive callbacks on each start and end of a chunk.
          After that, it becomes a piece of cake to do the flushing at the
          appropiate time using a BatchSqlUpdate."

          Comment


          • #6
            It's interesting that he used a RepeatInterceptor, rather than TransactionSynchronization. It seems like adding the ItemProcessor as a RepeatInterceptor would require you to also wire it into the RepeatTemplate as well, when you get the same functionality with a the TransactionSynchronizationManager.

            Comment


            • #7
              Originally posted by lucasward View Post
              It's interesting that he used a RepeatInterceptor, rather than TransactionSynchronization. It seems like adding the ItemProcessor as a RepeatInterceptor would require you to also wire it into the RepeatTemplate as well, when you get the same functionality with a the TransactionSynchronizationManager.
              Actually, I never thought of using TransactionSynchronization
              I based my example on the text on this page: http://static.springframework.org/sp...es/simple.html

              To store up SQL operations until the end of a batch, and take advantage of JDBC driver efficiencies, the client needs to store some state during the batch, and also register a transaction synchronisation. For this kind of scenario we introduce an interceptor framework in the template execution. The template calls back to interceptors, which themselves can strategise clean up and close-type behaviour: ...
              The RepeatInterceptor can be stateful, and can store up inserts until the end of the batch. If the RepeatTemplate.iterate is transactional then they will only happen if the transaction is successful.
              That suggested to me that my approach of using a RepeatInterceptor was the recommended one. It did enable me to do exactly what I wanted, which seems to be the same as the topic of this thread: accumulate SQL statements in a batched statement and flush that batched statement at the end of each chunk. Of course you're right, this requires me to register my ItemProcessor implementation with the RepeatTemplate used as chunkOperations as an interceptor.
              However, my approach will also work with whole-batch transactions, which I guess wouldn't be possible when using a TransactionSynchronization.

              How and where would you register an ItemProcessor as a TransactionSynchronization if you took that approach? Would you use BatchTransactionSynchronizationManager.registerSyn chronization? It doesn't seem to have the desired effect in my sample app, but I'm not sure if I have properly configured transactions to run per chuck and not per step.

              Joris

              Comment


              • #8
                Originally posted by Joris Kuipers View Post
                Would you use BatchTransactionSynchronizationManager.registerSyn chronization? It doesn't seem to have the desired effect in my sample app, but I'm not sure if I have properly configured transactions to run per chuck and not per step.
                Strike that: it works, I just called registerSynchronization from the wrong place (from the constructor instead of the process-method). I've implemented TransactionSynchronization in my ItemProcessor and now call BatchSqlUpdate.flush() from beforeCommit and BatchSqlUpdate.reset() from afterCompletion.

                It results in a lot less configuration, since I can now use a SimpleStepConfiguration instead of a ChunkOperationsStepConfiguration with its own chunkOperations RepeatTemplate that has the interceptor configured. That also means that I can rely on the default StepExecutorFactory of the DefaultJobExecutor, which happens to be a SimpleStepExecutorFactory that doesn't support ChunkOperationsStepConfigurations.

                The disadvantage is that the chunk has to be transactional now, while using the RepeatInterceptor will work in all cases (specifically the whole-batch transaction case). For my sample that isn't a problem though, so I think I'll go with the TransactionSynchronization for now. Thanks for the tip!

                Comment


                • #9
                  Joris,

                  I'm glad to hear that it worked for you. You do bring up a lot of good points, and I think the snippet from the website needs to be modified. I guess it's the difference between being 'Transactional' and needing to be notified of transactions. The former can be accomplished as the website says, by making RepeatTemplate.iterate transactional. However, the later can only be accomplished with TransactionSynchronization.

                  Comment

                  Working...
                  X