-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When py-amqp tries to publish to nonexistent exchange crash with AttributeError: 'NoneType' object has no attribute 'drain_events' #218
Comments
Another improvement can be to ensure that after connection is closed - after calling Lines 498 to 501 in 0e793de
|
@auvipy you can close this issue now... |
thanks!! |
Running 2.4.0, we are still getting the same exception:
Anyone else still getting it ? Thoughts on what is causing it ? |
@AvnerCohen I was not able to create situation with your Exception. Could you provide some simple failing example? |
:( I afraid not. We randomly get it in production, random workers would crash. @matusvalo What's interesting is that your suggested fix is basically to expose the "correct: exception: In our case, we seem to be getting both of this errors:
From amqp/channel.py in _on_close at line 282:
The 2 errors would happen in parallel and random workers will fail on it. maybe this is now no longer an amqp issue but a core celery issue? |
@AvnerCohen I was afraid that it is random issue. I am not sure about problem on kombu/celery side because the exception is coming from py-amqp. If there is issue on celery/kombu side in worst case py-amqp must be able handle correctly wrong usage. Could you post here full tracebacks for not found errors and also attribute error? + how often do you get this errors? At least I know about one problem on py-amqp side - it is not fully following amqp standards because it processes incoming messages in |
@matusvalo thanks so much for the insights and time. We are getting this during deploy of new code changes or any maintenance that involves large scale stop/start of workers (btw - which brings this item as could be relevant - celery/celery#4618 - bot looks like there is no extra info there). Here are the two raw stack traces: AttributeError: 'NoneType' object has no attribute 'drain_events'
File "celery/worker/worker.py", line 205, in start
self.blueprint.start(self)
File "celery/bootsteps.py", line 119, in start
step.start(parent)
File "celery/bootsteps.py", line 369, in start
return self.obj.start()
File "celery/worker/consumer/consumer.py", line 318, in start
blueprint.start(self)
File "celery/bootsteps.py", line 119, in start
step.start(parent)
File "celery/worker/consumer/consumer.py", line 596, in start
c.loop(*c.loop_args())
File "celery/worker/loops.py", line 91, in asynloop
next(loop)
File "kombu/asynchronous/hub.py", line 354, in create_loop
cb(*cbargs)
File "kombu/transport/base.py", line 236, in on_readable
reader(loop)
File "kombu/transport/base.py", line 218, in _read
drain_events(timeout=0)
File "amqp/connection.py", line 500, in drain_events
while not self.blocking_read(timeout):
File "amqp/connection.py", line 506, in blocking_read
return self.on_inbound_frame(frame)
File "amqp/method_framing.py", line 79, in on_frame
callback(channel, msg.frame_method, msg.frame_args, msg)
File "amqp/connection.py", line 510, in on_inbound_method
method_sig, payload, content,
File "amqp/abstract_channel.py", line 126, in dispatch_method
listener(*args)
File "amqp/channel.py", line 1616, in _on_basic_deliver
fun(msg)
File "kombu/messaging.py", line 624, in _receive_callback
return on_m(message) if on_m else self.receive(decoded, message)
File "kombu/messaging.py", line 590, in receive
[callback(body, message) for callback in callbacks]
File "celery/worker/pidbox.py", line 51, in on_message
self.reset()
File "celery/worker/pidbox.py", line 66, in reset
self.stop(self.c)
File "celery/worker/pidbox.py", line 63, in stop
self.consumer = self._close_channel(c)
File "celery/worker/pidbox.py", line 71, in _close_channel
ignore_errors(c, self.node.channel.close)
File "kombu/common.py", line 298, in ignore_errors
return fun(*args, **kwargs)
File "amqp/channel.py", line 226, in close
wait=spec.Channel.CloseOk,
File "amqp/abstract_channel.py", line 60, in send_method
return self.wait(wait, returns_tuple=returns_tuple)
File "amqp/abstract_channel.py", line 80, in wait
self.connection.drain_events(timeout=timeout)
File "amqp/connection.py", line 500, in drain_events
while not self.blocking_read(timeout):
File "amqp/connection.py", line 506, in blocking_read
return self.on_inbound_frame(frame)
File "amqp/method_framing.py", line 79, in on_frame
callback(channel, msg.frame_method, msg.frame_args, msg)
File "amqp/connection.py", line 510, in on_inbound_method
method_sig, payload, content,
File "amqp/abstract_channel.py", line 126, in dispatch_method
listener(*args)
File "amqp/channel.py", line 1616, in _on_basic_deliver
fun(msg)
File "kombu/messaging.py", line 624, in _receive_callback
return on_m(message) if on_m else self.receive(decoded, message)
File "kombu/messaging.py", line 590, in receive
[callback(body, message) for callback in callbacks]
File "celery/worker/pidbox.py", line 51, in on_message
self.reset()
File "celery/worker/pidbox.py", line 67, in reset
self.start(self.c)
File "celery/worker/pidbox.py", line 54, in start
self.node.channel = c.connection.channel()
File "kombu/connection.py", line 266, in channel
chan = self.transport.create_channel(self.connection)
File "kombu/transport/pyamqp.py", line 100, in create_channel
return connection.channel()
File "amqp/connection.py", line 491, in channel
channel.open()
File "amqp/channel.py", line 437, in open
spec.Channel.Open, 's', ('',), wait=spec.Channel.OpenOk,
File "amqp/abstract_channel.py", line 60, in send_method
return self.wait(wait, returns_tuple=returns_tuple)
File "amqp/abstract_channel.py", line 80, in wait
self.connection.drain_events(timeout=timeout) And: NotFound: Basic.consume: (404) NOT_FOUND - no queue ' [email protected]' in vhost '/'
File "celery/worker/worker.py", line 205, in start
self.blueprint.start(self)
File "celery/bootsteps.py", line 119, in start
step.start(parent)
File "celery/bootsteps.py", line 370, in start
return self.obj.start()
File "celery/worker/consumer/consumer.py", line 316, in start
blueprint.start(self)
File "celery/bootsteps.py", line 119, in start
step.start(parent)
File "celery/worker/pidbox.py", line 55, in start
self.consumer = self.node.listen(callback=self.on_message)
File "kombu/pidbox.py", line 91, in listen
consumer.consume()
File "kombu/messaging.py", line 477, in consume
self._basic_consume(T, no_ack=no_ack, nowait=False)
File "kombu/messaging.py", line 598, in _basic_consume
no_ack=no_ack, nowait=nowait)
File "kombu/entity.py", line 737, in consume
arguments=self.consumer_arguments)
File "amqp/channel.py", line 1572, in basic_consume
returns_tuple=True
File "amqp/abstract_channel.py", line 60, in send_method
return self.wait(wait, returns_tuple=returns_tuple)
File "amqp/abstract_channel.py", line 80, in wait
self.connection.drain_events(timeout=timeout)
File "amqp/connection.py", line 500, in drain_events
while not self.blocking_read(timeout):
File "amqp/connection.py", line 506, in blocking_read
return self.on_inbound_frame(frame)
File "amqp/method_framing.py", line 55, in on_frame
callback(channel, method_sig, buf, None)
File "amqp/connection.py", line 510, in on_inbound_method
method_sig, payload, content,
File "amqp/abstract_channel.py", line 126, in dispatch_method
listener(*args)
File "amqp/channel.py", line 282, in _on_close
reply_code, reply_text, (class_id, method_id), ChannelError, |
@matusvalo Anything we can help or provide information additional info on that? |
Thank you @AvnerCohen. I need to have to fine some spare time to have a look on that... For now no. |
@thedrow for now no. I am not able to reproduce the issue. There are multiple issues present which points to using closed connection/channel - e.g. another issue: celery/kombu#1027 There are only two places where Lines 145 to 156 in 756d60d
Lines 452 to 464 in 756d60d
One possible option can be if multiple threads shares one single connection... |
One possibility came to my mind:
The other possiblity is that when Client sends
Unfortunately py-amqp library does not conform specs in this case. |
@AvnerCohen could you please check master branch if your problem still occurs? |
@matusvalo On our end, we have seen this behavior when starting some 80 celery workers (with anywhere between 2 to 8 concurrency) in parallel. |
Hi everyone, I found this issue after looking for a solution for this problem. Env: From rabbit 3.8.15, they introduced this new feature: https://www.rabbitmq.com/consumers.html#acknowledgement-timeout When the timeout is triggered, the
Do you know how to prevent this? Celery should not break for timeout errors |
For me no permissions to exchange (or virtual host) ends up with the behavior above. |
@matusvalo I'm able to reproduce the |
I've also created a py-amqp only example to reproduce the issue: https://github.com/povilasb/celery-issues/#py-amqp-example What happens, I believe, is that RabbitMQ responds with "Channel Close" message Line 142 in 98f6d36
Upon reading the docs I tend to think that this may be expected from the AMQP library:
Anyway, I'm gonna stay away from this issue now since I guess this is more of a Kombu/Celery issue for not reopening a new channel. My issue may not be related to the original one as well. |
Steps to reproduce:
test_exc
exchange does not existExample code:
Executing of the script ends up with the following stacktrace:
I was able to replicate this issue on RabbitMQ 3.7.8 and Master branch of py-amqp.
I have found out that connection was set to None in the following line:
py-amqp/amqp/connection.py
Line 464 in 0e793de
After inserting breakpoint I have found out the following tracebacks:
From the tracebacks can be seen that:
Close
method to serverCloseOK
. He starts drain_events loop.Close
method instead ofCloseOk
.Open
method (part of_do_revive()
method) and waits forOpenOK
. It starts another (!) drain_events loopCloseOK
method. It clears connection (sets self.channels = None)OpenOk
method. Client receives some method but crashes since Connection.channels == NoneIn general the problem is that client executes
Channel._do_revive()
method even when connection is closing:py-amqp/amqp/channel.py
Lines 276 to 280 in 0e793de
Possible solution is to in general will be roughly like this:
Connection.close()
method is called:Channel._do_revive()
:After this fix correct exception is raised:
Moreover does it make sense to revive connection just before raising exception?
py-amqp/amqp/channel.py
Lines 276 to 280 in 0e793de
The text was updated successfully, but these errors were encountered: