Announcement Announcement Module
Collapse
No announcement yet.
Message loss after a netweork disconnect Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Message loss after a netweork disconnect

    Hi,

    If I disconnect my consumer server from the RabbitMQ server (simply unplug the network cable) and then reconnect again, the first message sent after this event is lost but all subsequent messages are received perfect by the consumer. The queue consumers are configured with Auto-Ack property. After sending the first message to the queue, the queue immediately looks empty but it is never received by the consumer.

    Am I missing a configuration setting or is it a bug?
    I am using RabbitMQ 2.8.4 and spring-amqp 1.1.2

    Thanks for your help.

  • #2
    Pulling cables is never a good idea, unless you have heartbeats enabled - so both ends get to know the connection was broken.

    That said, I wouldn't expect a complete loss of a message but it's certainly possible to be left with an un-acked message in a queue after a such a failure.

    Comment


    • #3
      Hi Gary,
      No message is left in queue after this. This problem is consistently reproducible.

      Are you recommending to set heartbeats?
      You said "Pulling cables is never a good idea". What is the best way to test network interruption?

      Thanks,

      Comment


      • #4
        All I am saying is, the way TCP works is, if you pull an ethernet cable, depending on the network topology, one or the other end might not know that the socket was disconnected.

        The underlying rabbit connection factory supports heartbeats which can be used to detect this condition.

        It appears that the new RabbitMQ 3.0.0 release enables heartbeats by default...

        https://www.rabbitmq.com/release-notes/README-3.0.0.txt

        As I said, I would not expect messages to be lost; but unack'd messages could exist; they wouldn't be in the queue, but the broker won't resubmit it until it finds out the connection was lost.

        If you can reproduce it with a simple example, and/or provide a log, I will take a look.

        Comment


        • #5
          The following simple example can reproduce the problem.

          Code:
          <rabbit:connection-factory id="connectionFactory" host="myhost" port="5672" channel-cache-size="10" />
          <rabbit:admin connection-factory="connectionFactory" />
          <rabbit:queue id="jobExecQ" name="Q_JobExec" queue-arguments="haQ" />
          <rabbit:queue-arguments id="haQ">
             	<entry key="x-ha-policy" value="all" />
          </rabbit:queue-arguments>
             	
          <rabbit:listener-container connection-factory="connectionFactory" concurrency="5" acknowledge="none" >  
               <rabbit:listener ref="jobExecutor" queues="jobExecQ" />
          </rabbit:listener-container>
          <bean id="jobExecutor" class="test.amqp.JobExecutor" />
          And here is the MessageListener code:

          Code:
          public class JobExecutor implements MessageListener {
          
            private static Log log = LogFactory.getLog(JobExecutor.class);
          
            public void onMessage(Message rmqMessage)  {
              String messageText = new String(rmqMessage.getBody());
              log.info("Received message: " + messageText );
            }
          
            public static void main(String[] args) throws Exception {
              ApplicationContext context = new ClassPathXmlApplicationContext("rabbitConfiguration.xml");
            }
          }
          To reproduce:
          1. start the app by the main method in the above class
          2. send a message to Q_JobExec queue using the default exchange (I used rabbitmq management console for this)
          3. check the log to see the received message
          4. unplug the network cable for a few seconds
          5. send second message as you did in step 2
          6. send third/forth/... messages

          The message in step 5 is lost.

          Thanks for your help.
          Last edited by rasadoll; Nov 22nd, 2012, 10:48 PM.

          Comment


          • #6
            Please use [ code ] ... [ /code ] tags (no spaces inside brackets) around code and config.

            Did you pull the cable on the consumer, or the server?
            Is there a network switch between the consumer and the server? (Likely there is, because the server won't find out the connection was broken until it tries to send the next message).
            Do you see anything interesting in the logs on the consumer (with TRACE level) regarding the "lost" message after the connection is re-established?
            What about in the Rabbit log on the server?
            If you see no evidence of the "lost" message arriving in the consumer's logs, I suggest you raise this question on the RabbitMQ mailing list.

            Comment


            • #7
              Hi Gary,
              The cable is pulled on the consumer side.
              I am not aware of our network architecture in the company. However, the consumer is on my machine and the server is on a VM.
              Nothing interesting in the RabbitMQ log. And I don't think that it is a rabbitmq issue. Before switching to spring-amqp we were using rabbitmq client api and had implemented an HA client to survive connection loss/etc. In that implementation we don't see this problem and all messages are received after reconnect.
              I have also found that setting heartbeats fixes this issue. Is this the right/best solution?

              Thanks

              Comment


              • #8
                Interesting - I can't reproduce your problem; my client is running on linux - I don't have a native windows box I can use. What is your client's OS?

                However, I am running with RabbitMQ 3.0.0 (which enables heartbeats by default according to the release notes).

                What's weird is (with Linux), I see no connection break - when I pull the cable the connection remains 'established' (with a non-zero send-Q) and when I reconnect everything works fine.

                I have to go out for turkey day now, but I'll see if I can reproduce by disabling heartbeats over the holidays.

                I assume you do realize that auto-ack (none in s-a parlance) is dangerous, though; right?

                Comment


                • #9
                  Nope - I see no difference with heartbeats disabled (except the netstat send-Q stays at zero while the cable is unplugged) but, after reconnecting, the next message is received ok.

                  I'll see if I can load up 2.8.4 on another VM later; can you try 3.0.0. ??

                  Comment


                  • #10
                    I'll try with RabbitMQ 3.0 tomorrow. Is spring-amqp 1.1.2 fully compatible with rabbitmq 3.0?

                    My environment is all windows. Client: Win7, Server: win server 2008

                    Regarding your point on auto-ack setting, we picked that option as we don't want message redelivery in case of consumer or broker crash.

                    Comment


                    • #11
                      Another observation:
                      It works fine when acknowledge="auto" on the container.

                      I have to admit that we enabled auto-ack (acknowledge="none") when we switched to spring-amqp. So my previous test with pure rabbitmq api was based on programmatic acking the message. I would try that to verify if it is a rabbitmq issue.

                      Thanks,
                      Last edited by rasadoll; Nov 22nd, 2012, 03:46 PM.

                      Comment


                      • #12
                        I was able to reproduce your issue with a windows client (in a VM - disabling the VM's network interface is the equivalent of pulling the cable, and I see the connection broken in the log).

                        There was no activity at all on the client for the lost message after the connection was restored.

                        I have to believe this is an artifact of auto-ack - if you think about it - the broker sends the message after the interruption; auto-acks it; and THEN is informed by the network that the connection was broken - too late to recover the message.

                        This explains why acknowledge="auto" and/or using heartbeats solves the issue, but I suspect that heartbeats may not suffice for very brief network outages.


                        So, the bottom line is I don't believe it's a spring-amqp OR RabbitMQ issue - it's just an artifact of the way auto-ack works. If you disagree, I suggest you raise it on the RabbitMQ mailing list. If you do, please post a link here.
                        Last edited by Gary Russell; Nov 22nd, 2012, 09:30 PM.

                        Comment


                        • #13
                          Is spring-amqp 1.1.2 fully compatible with rabbitmq 3.0?
                          I just started testing (for this thread, and with spring-amqp 1.1.3 - the current release) and haven't found any problems yet. I do know that immediate mode and x-federated exchanges are no longer supported so we need to remove (deprecate) support for those from spring-amqp too. But, I am not aware of anything that should prevent s-a being used with Rabbit 3.0.0 (as long as you are not using immediate or x-federated exchanges).

                          Comment


                          • #14
                            Thank you very much Gary,
                            You are absolutely right about the auto-ack behavior. I had the impression that auto-ack is simply an explicit ack done by the client just before calling the client code, which I was wrong, as it is part of the amqp protocol and maintained by the server.

                            I also found the following link which answers my questions:
                            https://groups.google.com/forum/?fro...ss/SapkdSr6MK0

                            For our particular case, I would change the ack mode on the container to "manual" and ack the message before our processing business logic as we wanted to avoid re-processing of a message in case of consumer or broker crash.

                            Thanks

                            Comment

                            Working...
                            X