-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Less reliance on global_id #188
Conversation
There's nothing useful they can do with it anyway.
…acklog Will now report -1 any time a message comes directly from a specific channel. It is not possible for an arbitrary backend implementation to efficiently provide the `global_id` of messages fetched from channel-specific storage mechanisms.
Besides this initial impediment to using Streams' native IDs, we have the problem that they are not strictly numeric and are certainly not sequential integers, as message_bus currently assumes IDs to be.
Even with a complex representation as in Streams' native IDs, IDs can be considered ordered. Math on Streams' IDs can be implemented for the simple cases where tests simply require "an ID greater than X" or "an ID greater than Y" quite easily. Measured look-back is also implemented in a backend-specific way, and while on Redis Sorted Sets this must be done by calculating some ID range (based on the assumption of sequential integers), on Redis Streams (and others) it may be done by reading X messages from the stream sorted in reverse order. My proposal is for message_bus to cease to guarantee that message IDs be sequential integers, though in practice on some backends they might be, and provide opaque IDs over the network (may be defined as String), and complex IDs (a @SamSaffron What do you think? |
Actually, there's other places where IDs are assumed to be sequential integers, independent of the backlog in use:
I believe these can be moved to the backend, and I'm working on that now. |
Also probably improves performance by fetching full backlogs less often.
Subscribing from some future point is unnecessary and heavy to implement. See ae0d7cd#commitcomment-31541983
Hmm how do we do firehose global server side consumers without a global id? How do we reliably catch up globally? Can redis do stream publishing in LUA and then come with some guarantees of both ids lining up and messages having a consistent global and local id? Regarding breaking the API to support other types of identifiers, I am not in principal against but it is an enormous and painful change if we are to amend the apis, I would like to avoid that. So stuff like Regarding pushing stuff that ought to be in the backend in to the backend, I support this, BUT, we should be careful here with regards to copy and pasting this block of code 3 times, so if the base backend needs to implement and then redis stream is stuck overriding I guess that is ok. There is one huge fundamental problem here though, I think you need to do some rough numbers of how much faster stream is going to be. If after we end up doing all this work we end up with a backend that is no faster than what we have today, this ends up being a pure "intellectual" exercise vs something generally useful. We got to get numbers here. |
I'm not proposing removing the concept of global IDs, but simply not exposing them to clients subscribed to specific channels. With the changes here, global IDs are not exposed to Javascript clients, and are only exposed to server-side application code when acting on the global backlog (
No. Insertion into the stream is an atomic operation which internally assigns an ID (in the format
If we define the IDs to be backend-specific types (which can all be converted to strings for the wire format), the Redis and PG can continue using Integers, while Streams can use their native ID strings. I'm pretty sure we can make specifically
Yeah, I managed to mostly avoid that, check out the later commits on this branch.
Yes, I'm concerned about that also. If you want to hold off on merging this until Streams has some numbers, that's fine, but unfortunately this is all pretty necessary for performant usage of Streams (see #173 (comment); Streams' native IDs solve this problem). |
Understood, but we can do some basics to test the underlying structures, in particular compare native implementations of both patterns in a standalone ruby program. The lua script captures the whole publish part. Once that is done maybe you do the refactoring in the redis stream branch till you get to the state you can do full benchmark? |
Discourse only cares about the global ID here: https://github.com/discourse/discourse/blob/96168cb3c6b06e2bc8683300b532790ae5a9666f/lib/site_setting_extension.rb#L314, though I am not sure why it's using the
Indeed there are tests for that in message_bus' test suite.
Honestly I cannot imagine what it might be.
Yes, I will do this. |
When client code tries to subscribe to a channel using a `last_id` that doesn't exist on that channel, we raise an exception. See ae0d7cd#commitcomment-31544627 for details.
053d4e0
to
e62f532
Compare
Thanks @benlangfeld for all the hard work 👍 @SamSaffron do we still want this? |
Since we are kind of giving up on redis streams for now I am shelfing this. The change is just too big and risky. |
These are backward incompatible changes. This change will break discourse, which uses the
global_id
for some sort of dashboard, but I'm pretty sure it could use themessage_id
instead since it's only interested in one particular channel.See individual commit messages for details of the motivation here. More specifically, this is to support removal of the ID counter keys for Redis Streams support, so that we can use the native automatic IDs that Streams provide, in which case we cannot have a circular dependency between the global and channel-specific storage; we must publish to the channel-specific storage first to obtain the ID which is to be included as a pointer in the global storage later.
Also moves some message ID math down into the backend implementations so that the rest of message_bus is less coupled to sequential integer IDs.