Announcement Announcement Module
Collapse
No announcement yet.
rabbitmq mirrored queue disappears Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • rabbitmq mirrored queue disappears

    Using RabbitMQ (2.7.0) with RabbitMQ java client (2.5.1) and spring-amqp (1.0.0.RELEASE).

    We have encountered a weird scenario when attempting to set up a Mirrored queue. When restarting one of our nodes the queue disappears.

    We have the following in our test environment:

    - Node 1 (master)
    - Node 2 (slave)
    - Producer (same host as slave)

    Producer sends messages to Node 1, and the queue is automatically created with the following parameters:

    Durable = true
    Exclusive = false
    Auto-delete = false
    "x-ha-policy", "all"

    The queue is also replicated on Node 2 as synchronized.

    We then stop Node 1 and start it again (rabbitmqctl stop_app/start_app) to simulate a network failure. Now Node 2 "owns" the queue and Node 1 is mirrored (unsynchronized).

    Everything looks OK until we start sending messages again via our Producer to Node 1. We then get the following stacktrace and our queue seems to get deleted from rabbitMQ:

    Code:
    org.springframework.amqp.AmqpException: failed to commit RabbitMQ transaction
    	at org.springframework.amqp.rabbit.connection.RabbitResourceHolder.commitAll(RabbitResourceHolder.java:137)
    	at org.springframework.amqp.rabbit.transaction.RabbitTransactionManager.doCommit(RabbitTransactionManager.java:175)
    	at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:754)
    	at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:723)
    	at org.springframework.transaction.interceptor.TransactionAspectSupport.commitTransactionAfterReturning(TransactionAspectSupport.java:393)
    	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:120)
    	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
    	at $Proxy13.send(Unknown Source)
    	at com.memnon.apport.eventhorizon.EHClient.send(EHClient.java:33)
    	at com.memnon.eventhorizon.EHConnection.sendDummy(EHConnection.java:68)
    	at com.memnon.transnet.admin.controller.EventHorizonTestAction$MessageSender.run(EventHorizonTestAction.java:69)
    	at java.lang.Thread.run(Thread.java:662)
    Caused by: java.io.IOException
    	at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:107)
    	at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:131)
    	at com.rabbitmq.client.impl.ChannelN.txCommit(ChannelN.java:878)
    	at com.rabbitmq.client.impl.ChannelN.txCommit(ChannelN.java:59)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    	at java.lang.reflect.Method.invoke(Method.java:597)
    	at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$CachedChannelInvocationHandler.invoke(CachingConnectionFactory.java:298)
    	at $Proxy15.txCommit(Unknown Source)
    	at org.springframework.amqp.rabbit.connection.RabbitResourceHolder.commitAll(RabbitResourceHolder.java:134)
    	... 12 more
    Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; reason: {#method<channel.close>(reply-code=406, reply-text=PRECONDITION_FAILED - partial tx completion, class-id=90, method-id=20),null,""}
    	at com.rabbitmq.utility.ValueOrException.getValue(ValueOrException.java:67)
    	at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:33)
    	at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:328)
    	at com.rabbitmq.client.impl.AMQChannel.rpc(AMQChannel.java:201)
    	at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:125)
    	... 21 more
    Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; reason: {#method<channel.close>(reply-code=406, reply-text=PRECONDITION_FAILED - partial tx completion, class-id=90, method-id=20),null,""}
    	at com.rabbitmq.client.impl.ChannelN.asyncShutdown(ChannelN.java:384)
    	at com.rabbitmq.client.impl.ChannelN.processAsync(ChannelN.java:235)
    	at com.rabbitmq.client.impl.AMQChannel.handleCompleteInboundCommand(AMQChannel.java:151)
    	at com.rabbitmq.client.impl.AMQChannel.handleFrame(AMQChannel.java:96)
    	at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:441)
    After this has occured the Producer seems to be sending messages when trying again, but they do not appear in RabbitMQ, and no new queue is created (looking in RabbitMQ web gui and through rabbitmqctl list_queues). We do not get any error messages or strack traces after this.

    Anyone have any ideas about this? What are we missing?

  • #2
    Mirror queues are a new feature since Spring AMQP was released and we haven't really tested the behaviour you describe. I can't offhand see anything wrong with what you are doing, but then again we don't have the full details yet. I would recommend looking at the protocol traffic with the Tracer utility (from the Rabbit Java client). If you can post the result of the logs from a Tracer in front of each of your brokers we can see what is going on in more detail. If it's a broker problem you'll have to ask the Rabbit team on the broker mailing list.

    Comment


    • #3
      This is what we get in the Tracer utility when doing the scenario described in my previous post, when Node 1 has restarted and we attempt to send messages to it again.

      Code:
      1321892474178: <Tracer-11> ch#0 <- {#method<connection.start>(version-major=0, version-minor=9, server-properties={product=RabbitMQ, information=Licensed under the MPL.  See http://www.rabbitmq.com/, platform=Erlang/OTP, capabilities={exchange_exchange_bindings=true, consumer_cancel_notify=true, basic.nack=true, publisher_confirms=true}, copyright=Copyright (C) 2007-2011 VMware, Inc., version=2.7.0}, mechanisms=PLAIN AMQPLAIN, locales=en_US), null, "[B@1948cc8c"}
      1321892474182: <Tracer-11> ch#0 -> {#method<connection.start-ok>(client-properties={product=RabbitMQ, information=Licensed under the MPL. See http://www.rabbitmq.com/, platform=Java, capabilities={exchange_exchange_bindings=true, publisher_confirms=true, basic.nack=true, consumer_cancel_notify=true}, copyright=Copyright (C) 2007-2011 VMware, Inc., version=2.5.1}, mechanism=PLAIN, response=guestguest, locale=en_US), null, "[B@1948cc8c"}
      1321892474213: <Tracer-11> ch#0 <- {#method<connection.tune>(channel-max=0, frame-max=131072, heartbeat=0), null, "[B@1948cc8c"}
      1321892474246: <Tracer-11> ch#0 -> {#method<connection.tune-ok>(channel-max=0, frame-max=131072, heartbeat=0), null, "[B@1948cc8c"}
      1321892474246: <Tracer-11> ch#0 -> {#method<connection.open>(virtual-host=/, capabilities=, insist=false), null, "[B@1948cc8c"}
      1321892474283: <Tracer-11> ch#0 <- {#method<connection.open-ok>(known-hosts=), null, "[B@1948cc8c"}
      1321892474317: <Tracer-11> ch#1 -> {#method<channel.open>(out-of-band=), null, "[B@1948cc8c"}
      1321892474354: <Tracer-11> ch#1 <- {#method<channel.open-ok>(channel-id=), null, "[B@1948cc8c"}
      1321892474386: <Tracer-11> ch#1 -> {#method<tx.select>(), null, "[B@1948cc8c"}
      1321892474426: <Tracer-11> ch#1 <- {#method<tx.select-ok>(), null, "[B@1948cc8c"}
      1321892474466: <Tracer-11> ch#1 -> {#method<queue.declare>(ticket=0, queue=mirroreventqueue, passive=false, durable=true, exclusive=false, auto-delete=false, nowait=false, arguments={x-ha-policy=all}), null, "[B@1948cc8c"}
      1321892474505: <Tracer-11> ch#1 <- {#method<queue.declare-ok>(queue=mirroreventqueue, message-count=2, consumer-count=0), null, "[B@1948cc8c"}
      1321892474546: <Tracer-11> ch#2 -> {#method<channel.open>(out-of-band=), null, "[B@1948cc8c"}
      1321892474582: <Tracer-11> ch#2 <- {#method<channel.open-ok>(channel-id=), null, "[B@1948cc8c"}
      1321892474616: <Tracer-11> ch#2 -> {#method<tx.select>(), null, "[B@1948cc8c"}
      1321892474653: <Tracer-11> ch#2 <- {#method<tx.select-ok>(), null, "[B@1948cc8c"}
      1321892474686: <Tracer-11> ch#2 -> {#method<basic.publish>(ticket=0, exchange=, routing-key=mirroreventqueue, mandatory=false, immediate=false), #contentHeader<basic>(?), "[B@224c47db"}
      1321892474686: <Tracer-11> ch#2 -> {#method<tx.commit>(), null, "[B@1948cc8c"}
      1321892474828: <Tracer-11> ch#2 <- {#method<channel.close>(reply-code=406, reply-text=PRECONDITION_FAILED - partial tx completion, class-id=90, method-id=20), null, "[B@1948cc8c"}
      1321892474868: <Tracer-11> ch#2 -> {#method<channel.close-ok>(), null, "[B@1948cc8c"}
      1321892474869: <Tracer-11> ch#2 -> {#method<channel.open>(out-of-band=), null, "[B@1948cc8c"}
      1321892474870: <Tracer-11> ch#2 <- {#method<channel.open-ok>(channel-id=), null, "[B@1948cc8c"}
      1321892474908: <Tracer-11> ch#2 -> {#method<tx.select>(), null, "[B@1948cc8c"}
      1321892474943: <Tracer-11> ch#2 <- {#method<tx.select-ok>(), null, "[B@1948cc8c"}
      1321892474977: <Tracer-11> ch#2 -> {#method<tx.rollback>(), null, "[B@1948cc8c"}
      1321892475013: <Tracer-11> ch#2 <- {#method<tx.rollback-ok>(), null, "[B@1948cc8c"}
      We have also been trying a couple of other different scenarios.

      Scenario 1
      With the same setup as in my previous post the Producer sent messages to Node 1. We then shut down the Producer and restarted Node 1. Now the queue is "owned" by Node 2. We then started sending messages to Node 2 and got the exact same exception.

      Scenario 2
      With the same setup as in my previous post we stopped Node 2 and started sending messages from the Producer to Node 1. We then restarted Node 1 and began sending messages to it again. Now this is working as it should without loosing any messages. So when we only have one node it's working.

      Scenario 3
      We wrote another test producer not using spring and tried the same thing as in my previous post and now everything seems to be working.


      From what I can tell it seems spring can't handle mirrored queues when the master node dies.
      Last edited by ahem; Nov 21st, 2011, 10:28 AM.

      Comment


      • #4
        At least it's obvious now why no messages are being delivered - you have an uncommitted transaction when the channel is closed by the broker. The trace log looks pretty normal apart from the exception, which smells like a broker problem, but it is possible that we are tickling it in a way that is avoidable. My guess (uneducated) is that there is something going on in the brokers that makes them reject the tx.commit because they haven't finished linking up the mirror.

        Can you post the Tracer log from your Scenario 3? Clearly you are doing something different on the client, so it would be good to see exactly what that is. Were you using a transaction in that case?

        Comment


        • #5
          It seems we missed the transaction part when running our Scenario 3.

          We have now made a new attempt with some sort of transaction, still a bit unsure of how to do that properly but it looks ok.

          Code:
                  
          channel.txSelect();
          channel.basicPublish("", Conf.QUEUE_NAME, MessageProperties.PERSISTENT_TEXT_PLAIN, Converter.toByteArray(message));
          channel.txCommit();
          However we now get the same error as we did with spring. So it seems there is either a problem with rabbitMQ, or we are doing something wrong.

          This is what we got in the trace:

          Code:
          1321955867226: <Tracer-4> ch#0 <- {#method<connection.start>(version-major=0, version-minor=9, server-properties={product=RabbitMQ, information=Licensed under the MPL.  See http://www.rabbitmq.com/, platform=Erlang/OTP, capabilities={exchange_exchange_bindings=true, consumer_cancel_notify=true, basic.nack=true, publisher_confirms=true}, copyright=Copyright (C) 2007-2011 VMware, Inc., version=2.7.0}, mechanisms=PLAIN AMQPLAIN, locales=en_US), null, "[B@1948cc8c"}
          1321955867228: <Tracer-4> ch#0 -> {#method<connection.start-ok>(client-properties={product=RabbitMQ, information=Licensed under the MPL. See http://www.rabbitmq.com/, platform=Java, capabilities={exchange_exchange_bindings=true, publisher_confirms=true, basic.nack=true, consumer_cancel_notify=true}, copyright=Copyright (C) 2007-2011 VMware, Inc., version=2.5.1}, mechanism=PLAIN, response=guestguest, locale=en_US), null, "[B@1948cc8c"}
          1321955867262: <Tracer-4> ch#0 <- {#method<connection.tune>(channel-max=0, frame-max=131072, heartbeat=0), null, "[B@1948cc8c"}
          1321955867300: <Tracer-4> ch#0 -> {#method<connection.tune-ok>(channel-max=0, frame-max=131072, heartbeat=0), null, "[B@1948cc8c"}
          1321955867300: <Tracer-4> ch#0 -> {#method<connection.open>(virtual-host=/, capabilities=, insist=false), null, "[B@1948cc8c"}
          1321955867332: <Tracer-4> ch#0 <- {#method<connection.open-ok>(known-hosts=), null, "[B@1948cc8c"}
          1321955867370: <Tracer-4> ch#1 -> {#method<channel.open>(out-of-band=), null, "[B@1948cc8c"}
          1321955867413: <Tracer-4> ch#1 <- {#method<channel.open-ok>(channel-id=), null, "[B@1948cc8c"}
          1321955867451: <Tracer-4> ch#1 -> {#method<queue.declare>(ticket=0, queue=mirrorqueuenotspring, passive=false, durable=true, exclusive=false, auto-delete=false, nowait=false, arguments={x-ha-policy=all}), null, "[B@1948cc8c"}
          1321955867494: <Tracer-4> ch#1 <- {#method<queue.declare-ok>(queue=mirrorqueuenotspring, message-count=1, consumer-count=0), null, "[B@1948cc8c"}
          1321955867531: <Tracer-4> ch#1 -> {#method<tx.select>(), null, "[B@1948cc8c"}
          1321955867562: <Tracer-4> ch#1 <- {#method<tx.select-ok>(), null, "[B@1948cc8c"}
          1321955867625: <Tracer-4> ch#1 -> {#method<basic.publish>(ticket=0, exchange=, routing-key=mirrorqueuenotspring, mandatory=false, immediate=false), #contentHeader<basic>(?), "[B@331f2ee1"}
          1321955867625: <Tracer-4> ch#1 -> {#method<tx.commit>(), null, "[B@1948cc8c"}
          1321955867765: <Tracer-4> ch#1 <- {#method<channel.close>(reply-code=406, reply-text=PRECONDITION_FAILED - partial tx completion, class-id=90, method-id=20), null, "[B@1948cc8c"}
          1321955867800: <Tracer-4> ch#1 -> {#method<channel.close-ok>(), null, "[B@1948cc8c"}
          We also tried this with the latest version of the java client (2.7.0) with the same results.

          Comment


          • #6
            OK, then I suggest you copy the trace logs from this last post over to rabbitmq-discuss and we can find out from the broker guys what is going on.

            If you don't need transactions, it is possible you are good to go, but if you do then it looks like there is a lower level problem that we need to resolve either in the Java client or in the broker.

            Comment

            Working...
            X