Announcement Announcement Module
No announcement yet.
Spring Integration and Amazon Simple Workflow Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Spring Integration and Amazon Simple Workflow

    I've recently been asked to look into AWS' Simple Workflow offering, which appears to be a very specific model of working with a hosted message broker to embody a workflow. There's even an library of annotations that will decorate your classes with implementations to do the polling of its queues, and some interesting asynchronous functionality too.

    There's quite an overlap with Spring Integration what with polling consumers and remotely-stored queues.

    The API just isn't as nice as I'm used to with Spring though. Whilst there are some Spring-friendly helper classes, one has to code to AspectJ-generated classes, and all 'activities' (think @Services) have to be registered in XML rather than by package-scanning.

    I'd be over the moon if SI wrapped some of the nastiness of coding to Amazon's API!

  • #2
    Thanks for the note and sorry for the delayed response.

    When you say you've been asked to "look into" it, does that mean you're still exploring options? Spring Batch and Spring Integration provide a lot of flexibility (as you have already implied), and they can easily run in the cloud. So, rather than thinking about wrapping their API (at least at this point), could you let us know what you see as key requirements that are currently lacking from SB and SI to implement your specific use-cases?

    In fact, keep an eye out as we will be providing samples and blogs about running integration and batch apps in the cloud (most likely with a focus on Cloud Foundry as the runtime, but as you know the programming model is portable).


    • #3
      Hi Mark,

      Thanks for your reply.

      I'm working in a business with many heterogenous distributed processes dealing with large volumes of data. As a circumstance of the evolution of the company a lot of these have been written in less-than-ideal circumstances. Constituent elements of a workflow might involve Pig scripts, RDBMS database transfers, Shell scripts, CRON jobs, MongoDB imports, large queues (100s of millions of messages, with a lifetime of over a month at times) and a lot of work with Hadoop, HDFS and HBase.

      AWS SWF offers several advantages over setting up our own message broker:
      1. No hardware to look after, and no message broker cluster to understand and configure for the System Admins;
      2. State of a workflow has guaranteed and managed persistence with a web UI, and automated task retries;
      3. A workflow is defined in one place, rather than esoterically between the configuration of consumers, producers, and the message broker routing.

      AWS SWF has a Java Flow Framework (FF) SDK which uses AspectJ to generate source classes at runtime that essentially wrap your workflow implementation logic & consumers, and provides the queue-polling and messaging functionality making these transparent to the coder (bar some annotations). However this requires coding to the FF API, meaning that any code you'd like to offer as part of a workflow must know that it's to be accessed as such, and that it must be accessed using AWS SWF FF. The workflow implementation aspect is particularly clunky, involving coding to and referencing generated classes, whereas I'm used to the much more Spring-y way of referencing my interfaces and them being wrapped in a proxy.

      My current thinking is that we can leverage the centralised nature of SWF workflow implementations and the benefits of using SWF's (simplified and constrained) hosted persistent message broker, whilst maintaining decoupling of consumer implementations by writing Spring Integration Adapters that abstract away the polling of the SWF queues. This would mean my consumer code need not know about SWF nor even messaging, and we could swap SWF for AMQP or other solutions with a Spring configuration change at a later date.

      Spring Batch + Integration as a Scheduler?

      One of the other challenges we face with SWF is that it has no scheduling. We need jobs to run at certain times of the week, and are currently relying on localised CRON jobs, which is quite frankly unacceptable. This is however resilient as there is no SPoF.

      My initial thoughts when looking at the problem was to consider using Spring scheduling and Spring Integration to fire off messages to always-running consumers that would then pick up these weekly jobs. This was initially disregarded due to concerns about maintaining the scheduling service, tracking the state of a workflow (what happens if step 3 in a process fails?) and making the scheduling service resilient through clustering and distribution.

      If you have any practical advice as to how Spring Integration and Spring Batch might address these concerns I would be thrilled to hear them. Both technologies offer a great deal of very useful functionality, and it can be easy to miss features that may be useful.


      • #4
        Thanks for the detailed response. It does sound like you have an ideal set of use-cases for us to consider for Spring Integration and Spring Batch in the cloud. This is something we are actively developing, and yet it's mostly a matter of assembling a coherent model out of existing parts. One such part that might be of interest to you is the Spring Batch Admin project. Specifically, it may address your concerns regarding visibility of job status/failures/etc. If you haven't seen it before, I'd recommend starting here:

        Do you think having that available in the cloud combined with Spring Batch and Integration processes bound to a RabbitMQ broker (with which you wouldn't interact directly) to enable the messaging layer that facilitates the interaction between integration adapters and batch jobs would be moving in the right direction?

        Of course, some new features would also be helpful, such as a UI for managing scheduled "triggers" at runtime as a complement to the event-driven triggering. That would not be difficult to add since Trigger strategies can be referenced by scheduled Polling Consumer adapters already.

        Finally, and perhaps most importantly, you may have heard our Spring Hadoop announcements yesterday (this is a good starting point: ). That is an area where we will be very focused on tying Spring Integration and Batch together for "big data" processing in the cloud. Of course, we already have a nice library for working with MongoDB (MongoTemplate in spring-data-mongodb as well as Spring Integration support built on top of that).

        Your idea about abstraction wrappers, even if simply Spring Integration Channel Adapters, is something we will consider as well.