Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heartbeat and starvation #71

Open
ghost opened this issue Jun 3, 2013 · 5 comments
Open

Heartbeat and starvation #71

ghost opened this issue Jun 3, 2013 · 5 comments

Comments

@ghost
Copy link

ghost commented Jun 3, 2013

Hi all,

I am using tornadio2 from the GIT repository, b463209

I recently dicovered a problem in the way heartbeats are handled by Tornadio.

Incoming messages go through the on_message method of the TornadioWebSocketHandler class (in persistent.py). This entry point is used both by Socket.IO events and heartbeat frames. If the connection gets suddenly flooded —in a perfectly valid use-case—, Tornadio doesn't process the heartbeat frames anymore, because they are queued far behind all the other messages.

I implemented my server using a top-half / bottom-half pattern, but it is not enough, and the connection keeps getting closed although everything is working fine.

I have one solution, but you must know I am no Python expert. :)

Tornadio should first read messages as fast as possible to immediately read heartbeat frames. Event processing can be delayed to another worker. Basically, this is implementing the top-half / bottom-half logic in Tornadio itself.

What do you think? There might be a better idea. I am willing to help if needed.

Thanks,

@mrjoes
Copy link
Owner

mrjoes commented Jun 3, 2013

Is there a reason why your on_message blocks?

Just in case, Tornado is single-threaded asynchronous framework. If you do blocking operation, everything stops, including tornadio2 heartbeat logic. If you need to do some kind of CPU intensive computation, you can do it in separate thread or process. If you do some expensive IO, then try using asynchronous library or offload work to different thread as well.

As far as problem concerned: if on_message blocks, tornadio2 is blocked as well, it won't read anything from the socket, there's no way it'll receive any pending heartbeats and handle them.

@ghost
Copy link
Author

ghost commented Jun 3, 2013

My on_message doesn't really block, there are just too many events to handle: the heartbeat frame is stacked, but not read before the timeout...

(The machine is moreover slow and I can't change anything about it!)

@mrjoes
Copy link
Owner

mrjoes commented Jun 3, 2013

socket.io protocol does not support throttling, so there's no way to limit data sent from the client.

Also, even if tornadio2 will be able to respond to pings, it won't reduce burden on the server - backlog queue will continue to grow until you run out of memory.

How many messages per second are you receiving?

@ghost
Copy link
Author

ghost commented Jun 3, 2013

The load is under control, it won't grow undefinitely. I can't even improve scheduling on the client side, because I don't know how many clients there will be. The heartbeat frame is somewhere, we just need Tornadio to read it!

I am surprised, because it seems I am the only one to encounter this problem (according to Google).

@mrjoes
Copy link
Owner

mrjoes commented Jun 3, 2013

OK, here's how it works:

  1. Read binary blob from socket
  2. Parse blob into data
  3. For each message in received data, call on_message
  4. Repeat from HandshakeHandler should set Access-Control-Allow-Origin #1

As you can see, process is synchronous. Tornadio2 won't receive anything from the socket, if it is blocked at #3. If there's no heartbeat frame in received data, then there's nothing to handle. Also, missed heartbeat means that IOLoop was blocked for more than 20+ seconds. That's alot.

Can you measure time spend in on_message? How long does it takes to process one message?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant