Announcement Announcement Module
No announcement yet.
Aggregating with non-unique correlation data Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Aggregating with non-unique correlation data


    I'm having an issue where my application is hanging when trying to put a message on the async channel queue.

    I'm using inbound/outbound channel adapters to communicate to another system over tcp/ip
    The channel adapters are set up to round robin between two instances of the other system and failover.
    When we send the request, we are using some echo data to correlate the response. If we do not get a response within our specified timeout, we move on to send the same request to a remote instance of the same system.

    There are a few suspects in the code/design
    • We have configured a task executor in the server side connection factory where we are receiving messages. Each message we receive spawns a new worker thread (as we want.)
    • We have a client side connection factory we use to send a request to the other system whose response is needed for the original server side request. But...
    • We are using a java Future to make the call to the external system to enforce the timeout. We can only give our application 2 seconds to make the call to them, so we're using the Future to spawn that work asynchronously, then in the next line trying to get that async response using a timeout param. (suspect 1)
    • We are not using SI to define the timeout because we found that whenever there was a timeout, the adapter would close the connection and our application would hang. Originally we set the timeout very high (60k ms) but we realized it would be a very bad thing if this scenario ever happend in production. So, with advice of SI team, we set the timeout value to "0."
    • Our timeout is not the same as the other system, so while we may move on to the remote call, the system may still respond to our original request with the same correlation data. So we could possibly get 2 (or more) responses on our inbound channel adapter with the same correlation value. (suspect 2)

    The interesting thing is, when we don't have the task executor (which is WAS work manager) defined in the listening server context, we never get into this hung situation, but we get a lot of "timeouts" trying to contact the other system. I say "timeout" because the other system is actually logging that they are responding to us within a few hundred ms and not taking 2 seconds that we have defined as the java Future timeout.

    Another interesting thing is we are getting this exception before it hangs

    Error TcpSendingMessageHandler - Error creating SocketWriter org.springframework.core.task.TaskRejectedException: CommonJ WorkManager did not accept task: 
    org.springframework.scheduling.commonj.WorkManagerTaskExecutor.execute( at org.springframework.integration.ip.tcp.connection.TcpnetClientConnectionFactory.getOrMakeConnection(
    But we don't have the reason code for why it didn't accept the task. (Timeout, unavailable threads, or "other") I'm not sure if SI has this in the exception it's wrapped in.

    So at a high level, I'm wondering if the problem is with our correlation data possibly being non-unique
    Is it because we are using that java Future to somewhat artificially enforce the timeout.

    Any ideas or hints are greatly appreciated.
    Last edited by carterish; Apr 28th, 2012, 02:32 PM. Reason: title somewhat misleading

  • #2
    Solved. Real issue description here: