Announcement Announcement Module
Collapse
No announcement yet.
Elegant Shutdown Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Elegant Shutdown

    Hello

    I am wondering how the forum suggests shutting an SI application down without losing messages.

    My application involves IMAP to XML file and XML file to e-mail. I have decorated targets and sources to add some transactional functionality.

    I could have each endpoint listen in to some injected singleton which maintains an imminent shutdown variable to determine whether to accept new messages, but would need a way for the application to monitor whether any messages were still queued before final shutdown.

    I already require that the target speaks to a transaction manager which ensures that input files and e-mails are kept in persistence, but in a state which indicates that they are active in a transaction. When the system starts I do a rollbackAll to ensure that any messages which were not committed due to a crash are reprocessed. It would therefore not be the end of the world to simply kill the system as it could recover but this is not elegant.

    Any thoughts would be appreciated.

    Thanks guys!

  • #2
    I've been thinking along similar lines. For now there are two major things missing:

    1. persistent channels (so that you could shut down without having to wait for the messages to be processed)
    2. channel monitoring. Maybe it would be a good idea to use application events to tell interested observers that a channel is empty..

    If you have time you could play around a bit with it and tell us how you would like it implemented. That would be really useful information.

    Comment


    • #3
      Yer I had noticed the lack of the guaranteed delivery pattern (persistence in channel). I don't like the idea of that for this problem - the way that the system would shut down and grab a hold of messages without external systems being notified is a bit messy.

      I will have a play with option 2 you suggest - preventing sources from producing once the shutdown flag is set and then waiting until all channels are empty before pulling the plug.

      Comment


      • #4
        check out INT-256, it's going to help you solve your problem possibly.

        Comment


        • #5
          What is the current situation on elegant/graceful shutdown? I read through this thread but it is a little dated. The MessageBusListener that was added for the issue was apparently removed when the new TaskScheduler was introduced.

          My current solution is to instantiate custom ThreadPoolTaskExecutors and set the waitForTasksToCompleteOnShutdown to true. This seems to solve the problem of my container (Glassfish) killing the running consumer threads when the application server shuts down. The container will now block trying to shutdown my web application until Spring successfully shuts down which will block for all my task executors to finish.

          This solution works pretty well with two remaining problems:
          1. The task executors block in the destroy() call, which means Spring has already started destroying other beans. If the executing threads use other beans, there is a chance that they were destroyed already.

          2. There is no easy way to set the setWaitForTasksToCompleteOnShutdown property on the task executor through the integration namespace. This means I have to custom build task executors. This isn't terrible for pollers, but it means I have to construct the SimpleTaskScheduler myself because there is no way to set a custom executor without manually constructing the scheduler.

          That leads to two questions:
          1. Is this the correct way to have a graceful shutdown of SI?
          2. Why doesn the SimpleTaskScheduler and Pollers support a similar setting to setWaitForTasksToCompleteOnShutdown so they will block when told to stop until all threads are complete?

          Thanks,
          -mike

          Comment


          • #6
            If you could add the code you're using that would be really helpful. Perhaps you can reduce it to a testcase. I'd be happy to take it from there and see if we need to fix some holes in the API and namespace.

            What do you mean by "constructing the SimpleTaskScheduler myself"? Are you using a bean element, or the new keyword?

            It looks to me that invoking shutdown() during the lifecycle stop (instead of in the destruction phase) on the ThreadPoolTaskExecutor should prevent the destroy problem you are describing. I'll ask Mark to comment on this, but I'm thinking that this might be an SI responsibility that we've overlooked. In that case, you should log a bug for this.

            Since the executor can be shared by multiple pollers (this is recommended in fact) we cannot just shut it down from any schedulers stop method. It's a bit of a tricky problem, but you got that already I think.

            Again, a simple program/testcase that shows the problem would be really helpful.

            Comment


            • #7
              iwein,

              I'll try to get a small example together that shows the problem but the root of it is that the task executors may still be executing tasks after the destroy method on the scheduler is called. The scheduler attempts to cancel any running tasks, but if the task isn't block it will never get an interrupted exception and therefore not immediately stop. Once the application context is shutdown, Glassfish will kill any remaining threads to stop the application resulting in a random ThreadDeath exception in one of my consumers.

              By "manually" constructing the scheduler I mean using the beans directly. I was using the new namespace keyword for the thread executors, but the thread-pool-task-executor tag doesn't allow the waitForTasksToCompleteOnShutdown property to be set. At the same time, in the IntegrationContextUtils at line 93 the scheduler is constructed with an anonymous TaskExecutor instance so there is no way to modify it or replace it. To solve this I had to do bean creation of the entire task scheduler (including the error channel stuff handled in the IntegrationContextUtils method).

              I agree that this is a tricky problem, especially because the task executors can be shared.

              -mike

              Comment


              • #8
                Mike,

                Please do provide some example code if possible. It should be fairly simply for us to enable setting the 'waitForTasksToCompleteOnShutdown' for both the beans created by the namespace support and for the "main" internally created TaskExecutor instance. That said, I think the idea of a true graceful shutdown may be much more complicated. We would need to know the proper order to shutdown producers and consumers such that all activity from a given moment in time is able to complete. For the same reason that decoupling producers and consumers makes the system very flexible, it also creates quite the chicken/egg situation. Have you considered that part of the problem as well, or are you just looking for a way to avoid stopping tasks forcefully?

                Regards,
                Mark

                Comment


                • #9
                  Mark et al,

                  I posted a simple sample application here: http://drop.io/mpilone
                  . The zip is a small Eclipse project.

                  In the sample, I create a simple queue channel with a consumer that takes a few seconds to complete the request. If you run the application with the "awaitTermination" method commented out, you'll see that the application just exits before all of the background workers complete (even thought they've been started).

                  If you run it with "awaitTermination" enabled, the Spring application context will block on close until all the background threads are closed. This appears to meet the needs of my current application, but I don't think it is an ideal solution.

                  As you pointed out Mark, this is a complex problem. My solution doesn't solve the problem of the other messages still sitting in the queue channel that are just lost. It also doesn't solve the problem that Spring has already started destroying beans by the time my thread factory is destroyed. This means the tasks still executing could be using beans that have already been destroyed.

                  While writing the sample, I realized that setWaitForTasksToCompleteOnShutdown doesn't work as I expected. The documentation indicates that it will wait for termination but in fact it only allows running tasks to finish but doesn't actually await termination.

                  I know there were suggestions of a "poison pill" approach on the original Jira issue. That might be a better way to go if you need to flush all the channels.

                  Let me know if you have any questions about my sample or if you want to discuss approaches further.

                  Thanks,
                  -mike

                  Comment


                  • #10
                    Originally posted by mpilone View Post
                    Mark et al,

                    I posted a simple sample application here: http://drop.io/mpilone
                    . The zip is a small Eclipse project.

                    ....

                    While writing the sample, I realized that
                    setWaitForTasksToCompleteOnShutdown doesn't work as I expected. The documentation indicates that it will wait for termination but in fact it only allows running tasks to finish but doesn't actually await termination.
                    Nice and crisp sample, what you describe here could be a bug in Spring Core.

                    The javadoc for java.util.concurrent.TPE#shutdown states that this "Initiates an orderly shutdown...", which is different from the javadoc for Springs wrapper on this class property states that it specifies "whether to wait for scheduled tasks to complete". This is not what is happening if you look at the source code for the wrappers shutdown method.

                    There is some homework to be done on how this came to be, possibly it's just a matter of updating the javadoc on the wrapper.

                    Comment


                    • #11
                      I logged http://jira.springframework.org/browse/SPR-5387 to track the ThreadPoolExecutor discussion.

                      Comment


                      • #12
                        iwein,

                        I agree that the waitForTasksToCompleteOnShutdown is either a bug or a documentation problem. Thanks for filing the issue.

                        However I don't think that this addresses the larger issue related to Spring Integration. At a minimum users need a way to set the waitForTasksToCompleteOnShutdown (or whatever the resolution of SPR-5387 becomes) property. But this is only a halfway fix because the executors will only wait once destruction has started, meaning other, possibly dependent beans have already been destroyed.

                        As Mark pointed out, there is also the even larger and more complicated issue of shutting down channels in the right order to flush pending messages, ideally on the context closed event rather than destroy.

                        Maybe there could be some kind of shutdown hook in the SI task scheduler where users could configure an ordered list of disposable objects (channels and task executors) that should be shutdown in order. This puts the onus on the user to properly order all the channels in order to flush messages and it also controls when/which task executors should be shutdown (in the case of sharing). Just an idea.

                        -mike

                        Comment


                        • #13
                          Originally posted by mpilone View Post
                          However I don't think that this addresses the larger issue related to Spring Integration. At a minimum users need a way to set the waitForTasksToCompleteOnShutdown (or whatever the resolution of SPR-5387 becomes) property. But this is only a halfway fix because the executors will only wait once destruction has started, meaning other, possibly dependent beans have already been destroyed.
                          You're right. I wasn't trying to imply we end this discussion or close INT-256. Just decomposing.

                          As Mark pointed out, there is also the even larger and more complicated issue of shutting down channels in the right order to flush pending messages, ideally on the context closed event rather than destroy.

                          Maybe there could be some kind of shutdown hook in the SI task scheduler where users could configure an ordered list of disposable objects (channels and task executors) that should be shutdown in order. This puts the onus on the user to properly order all the channels in order to flush messages and it also controls when/which task executors should be shutdown (in the case of sharing). Just an idea.
                          Maybe you can already get a long way implementing Lifecycle and Ordered on your components. Still, I can imagine that at some point you need more control over the channel contents during shutdown. Maybe the key is to implement this in the channels. How does this sound:

                          A StoppableChannel will immediately stop receiving messages when stopped, it will then block the stop call until all its messages have either been purged or consumed. It also implements the Ordered interface, so you can order the stop invocations in your system on parts that need to be flushed on shutdown.

                          Code:
                          <channel id="first">
                            <stoppable order="1"/>
                          </channel>
                          <channel id="second">
                            <stoppable order="2"/>
                          </channel>
                          I haven't even bothered to check if this would work, just thinking out loud.

                          Comment


                          • #14
                            iwein,

                            That's an interesting idea and I think it accomplishes what I was going for with the order list of channels to shutdown. So what would happen to messages that were written to the channels after shutdown? An exception thrown or would the message just go to the error channel? What would trigger the shutdown? Maybe the context closing event?

                            I'm using a persistent channel adapter that I wrote to serialize messages into a database. So my use case would really be to stop the backend processing of messages (where the poller is reading from the inbound channel adapter) but to allow the queuing/persisting of messages on the front end until the application is shutdown. This would allow me to add functionality to my application to 'prepare for shutdown' in which the back end processes would stop but the front end could still consume and persist messages. I know this is probably outside of the normal use for SI though. I guess it could apply to someone using a JMS adapter (where JMS is queuing the requests but the consumers are prepared for shutdown).

                            It almost seems like you want/need different behavior for polling channels which potentially have background threads and direct channels that can complete before returning to the caller.

                            I'd be curious to see how/if other ESBs handle this.

                            -mike

                            Comment

                            Working...
                            X