Announcement Announcement Module
Collapse
No announcement yet.
Possible bug with TcpNioConnection - v2.0.4.RELEASE Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Possible bug with TcpNioConnection - v2.0.4.RELEASE

    Hi,

    I've created a system whereby I have both a TCP inbound adapter and a corresponding TCP outbound adapter with the connection factory working in server mode and NIO is true. I also configured the connection factory to use a task executor.

    When a message arrives on a socket I can see that the doRead method of TcpNioConnection is called and I can further see that in this method the private variable writingToPipe is set to true. Once the code has got past checkForAssmebler you can see that another thread is started which executes the run method of the TcpNioConnection.

    The problem is that if after the checkForAssembler method executes the TCP client dies then the the following code (still within doRead) throws an exception

    Code:
    	int len = this.socketChannel.read(this.rawBuffer);
    Now remember that the variable writingToPipe is still set to true at this point. The problem then is that the run method of the other thread is waiting for writingToPipe to become false (see method dataAvailable called from the run method). Since the run method is called from a loop of the ThreadPoolExecutor that loop never ends.

    In fact the situation gets worse because each time the run method executes another thread is spawned and does exactly the same. Because writingToPipe is true each time the run method of TcpNioConnection is executed it determines that there is more data in the pipe (which there isn't) and creates a new assembler just as the comment says

    Code:
    	if (dataAvailable()) {
    	// there is more data in the pipe; run another assembler
    	// to assemble the next message, while we send ours
    		this.executionControl.incrementAndGet();
    		this.taskExecutor.execute(this);
    	}
    Eventually all your task executor threads are used up and no other clients can connect. In fact if you have a large enough pool of threads then you will see your CPU usage max out as all the threads are in an infinite loop.

    Hopefully my analysis is accurate and you will be able to work on a fix for this?

    Regards

    Ted

  • #2
    Are you speaking theoretically, or have you experienced this?

    I do think you might have identified a potential problem, but I don't think the result will be as you describe.

    if the 'writingToPipe' variable gets 'stuck on' in the way you describe (I do see a small possibility of that), then the run() thread will hang in convert() (reading from the pipedInputStream) waiting for the data to show up; it won't execute the 'next' assembler until assembly of the current message is complete; it certainly won't be in a tight loop waiting for the boolean to be reset.

    If you do believe you have seen what you describe, please attach a debug log (in a zip).

    Thanks

    Comment


    • #3
      Re: Possible bug with TcpNioConnection - v2.0.4.RELEASE

      Garry,

      This is a real situation. I first noticed the problem when I was doing a kind of saturation test with a few thousand open connections. Eventually my client couldn't create any more connections and dies leaving some of the server threads in the state I describe. To re-create simply put a break point in the doRead method of TcpNioConnection after writingToPipe is set to true then kill your TCP client. If you then continue the thread the problem should become apparent.

      The convert method just returns null when I try this.

      I've attached a log file showing only uk.co.steria.telematrix.ip.tcp.connection at TRACE. If you need more let me know.

      Regards

      Ted

      Attachment
      Attached Files

      Comment


      • #4
        Thanks; I am about to get on a 'plane - I will look at this over the weekend.

        Comment


        • #5
          BTW, please create a JIRA bug issue here... https://jira.springsource.org/browse/INT

          Thanks

          Comment


          • #6
            Raised here https://jira.springsource.org/browse/INT-1937

            Thanks for the fast response to this.

            Comment


            • #7
              Hi, the fix went in to last night's build. Please change your pom to use 2.0.5.BUILD-SNAPSHOT and give it a spin.

              Thanks again for your work analysing the problem.

              Comment

              Working...
              X