-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GELF Benchmarks are slow vs UDP #31
Comments
UDP is a connectionless protocol (in contrast to TCP, which isn't being used here), so I don't think that this is the reason for the GELF gem being "slow" because there simply isn't any overhead for creating and closing a connection. Actually it doesn't surprise me at all that sending an almost empty UDP packet is faster than building a full-fledged GELF message from arbitrary parameters (see GELF specification) and sending the resulting message via UDP. Don't get me wrong, I'm all for optimizing the GELF gem if there's a real bottleneck, but the benchmarks in this issue are skewed and simply measure the wrong things. |
I thought I had an error in my early benchmarks as well but to mimic the behavior of the Gelf gem in the simplest form try this: run a udp server: require 'socket'
s = UDPSocket.new
s.bind(nil, 1234)
loop do
text, sender = s.recvfrom_nonblock(16)
end Here's how Gelf opens and closes connections to send messages: require 'socket'
s = UDPSocket.new
before_time = Time.now
100_000.times do |i|
s.send("#{i}", 0, 'localhost', 1234)
end
puts "Time elapsed was #{Time.now - before_time}" Here's the same implementation except we make the connection outside the loop: require 'socket'
s = UDPSocket.new
before_time = Time.now
s.connect('127.0.0.1', 1234)
100_000.times do |i|
s.send("#{i}", 0)
end
puts "Time elapsed was #{Time.now - before_time}" Here's the output of that benchmark:
Here's the gist to run it: https://gist.github.com/08ab203dec55f1ebd1ad |
I'm seeing similar performance numbers as @allcentury. Of course, the gem supports multiple addresses so multiple UDP sockets would need to be created for multiple addresses to see this performance gain. I'd be okay with keeping multiple pre-bound UDPSockets alive. Since this is UDP, it's only really keeping around a file descriptor and a It looks like, from the MRI source, that prebinding the socket causes |
@allcentury Interesting, I wasn't aware that there's such a performance hit when not explicitly creating a "connection" in a UDP socket. Sorry I came across a bit harsh in my last comment. @markglenn Thanks for the quick analysis of the As always, pull requests fixing this issue are welcome. |
Executing |
When sending a datagram, non-bound sockets require a DNS lookup. Instead, bind the socket to the given addresses when creating the socket. Also, lazily create the socket connections, which allows the notify methods to capture the exceptions as expected. If the sockets are created and bound at initialize, an exception would occur then. Lastly, allow resetting the connections if the addresses are changed. Fixes graylog-labs#31
@bernd That is correct. I'm not sure how often the IP changes for an organization's Graylog server that a 50x (again my run through of the benchmark) slowdown is worth it. Our organization made the IP static, but I know I can't make that same assumption for everyone. Unfortunately because of UDP's fire and forget method of sending packets, there is no real way of checking this without doing another TCP has a similar issue, but I believe the connection is dropped when the IP changes, so it's easy to detect. I feel like there can be a middle ground here, but I don't see it off hand. Perhaps we can do another DNS lookup every few messages, but that guarantees at least a few lost messages when an IP change does occur. I'm not terribly familiar with the Graylog server, but is there a way to keep a TCP connection alive just for this check? |
@markglenn I totally agree that it should be fast by default. Maybe you can add some kind of option that controls the behavior? Otherwise people that use round-robin DNS or something like that will have problems. The Graylog server has an API endpoint that can be used to check the status. |
I vote that this should be closed. If you have a static IP address, you can just use the IP instead of the server name and get similar results as pre-binding the connection. I ran the gist from @allcentury but with '127.0.0.1' instead of 'localhost' and saw the following results:
Maybe a note in the docs to reflect how much slower using a name can be? Any real check to make sure the message is going to the right server would be slower than doing a DNS lookup anyway. |
Very interesting and a great find @markglenn ! |
Normally I would pair this issue with a pull request but your guys requirement to sign a disclosure and waiver is something I can't do. That 'release form' is completely different than the license you've attached to this repo so I don't know what to believe.
That all said, I'm going to list out what I would fix if this was a normal open source project in hopes that you agree and will implement the change.
First, the problem. GELF was slowing down our servers, which seems odd especially considering GELF uses UDP under the hood which is a 'fire and forget' protocol. I decided to Benchmark UDP vs. GELF's implementation of UDP.
The Benchmark results:
Staggering really - UDP can log 90_000-100_000 msgs / second, GELF's implementation can do 2500-3000 msgs / second.
The benchmark script looks like this:
You'll need to run a UDP server in a separate process for those to work:
So what's different? Well it's really a small change. In the GELF gem, we never open the connection until we call the
send
method with arguments containing host and port.It looks like this
@socket.send(datagram, 0, host, port)
. That block of code is here.When we use send with host & port, a connection must first be established before sending then the packet gets sent and finally the connection closes for EVERY message. Instead if you see in my benchmark script, I first connect then log with just
send(arg, 0)
. Since the connection is already established, the message is fired and forgotten.The text was updated successfully, but these errors were encountered: