Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: immediately finish requests on disconnect #391

Open
pananton opened this issue Dec 24, 2020 · 8 comments · Fixed by #392
Open

Question: immediately finish requests on disconnect #391

pananton opened this issue Dec 24, 2020 · 8 comments · Fixed by #392
Assignees
Labels

Comments

@pananton
Copy link

pananton commented Dec 24, 2020

Can you please give me a hint how I can solve the following problem?

Suppose that my connection to nats server is unstable. To send requests I manually create single inbox subscription and use natsConnection_PublishRequest function with auto-incremented reply-to. When answer is received I can determine original request using reply-to and call necessary callback (all this dispatching is performed by my wrapper). Also I implemented my own request timeout mechanism.

There is a potential problem if request was sent and then publisher disconnects before answer is received. I suppose that with a high probablity answer will not be received after reconnect (because it was received on the server-side when requestor was away and was dropped). Which means that request will be timed out and other system components waiting for it will stall for a request timeout value. For me it is much better to immediately finish such requests with error (and ignore answer if it is received after reconnect).

I can use onDisconnected cb to finish such request, but I need to be able to distinguish them from those requests that were published when connection is already disconnected (because this requests will be queued to reconnect buffer and will be successfully handled after reconnect). Is there any secure way to determine whether request is actually sent or is queued to reconnect buffer because connection is disconnected? Maybe you can suggest other solution?

@pananton pananton changed the title Qiestion: immediately finish requests on disconnect Question: immediately finish requests on disconnect Dec 24, 2020
@kozlovic kozlovic self-assigned this Dec 28, 2020
@kozlovic
Copy link
Member

Is there any secure way to determine whether request is actually sent or is queued to reconnect buffer because connection is disconnected? Maybe you can suggest other solution?

Unfortunately no. Even the use of the disconnected callback is not "safe" in that this is invoked asynchronously and by the the connection may have been restored, or not yet restored but after the library switched to memory pending buffer and so would cancel requests that have been buffered.

I am not sure what you can do if you implement the requests on your own (the way you describe). Note that it really looks like what the library does with new style of requests (the default behavior) when the library listens to a wildcard and requests are incrementing the last token and map that to the appropriate "request" when processing the reply.

We currently do not fail pending requests during a disconnect event but we do it for "flush" (PING/PONG) requests that were done before the reconnect thread starts, which I realize that is probably wrong and should be done earlier, before creating the pending buffer. We could also fail pending requests there, but that would not solve your issue since you do requests on your own.

@kozlovic
Copy link
Member

I keep taking things back after thinking more about it :-)

I will definitively fix the fact that we fail Flush too late (in reconnect thread instead of before creating the pending buffer), but as of failing the pending requests, I am not so sure anymore. The RequestX() APIs are doing direct flush of the request (meaning that the library write to socket asap), so the request is out already.

If the service provider (the replier) gets the reply and send it either after the reconnect, or while disconnected it may get buffered and flushed once reconnect happens. There is still a chance that the reply make it back to the requestor. If the replier was not connected to the same server and replies before the requestor reconnects, then yes, that server would have dropped the message due to the lack of subject interest.

So I could make it an option, but should not change the default behavior to not fail those pending requests.

kozlovic added a commit that referenced this issue Dec 28, 2020
If enabled, any pending request (using new style) will fail with
NATS_CONNECTION_DISCONNECTED status.

Also fixed the failing of pending flush requests which was done
in the doReconnect thread while it should be done prior to
creating the pending buffer and starting the reconnect thread.

Resolves #391

Signed-off-by: Ivan Kozlovic <[email protected]>
@kozlovic
Copy link
Member

@pananton I know that this is not helping you very much, but if you want to have a look at #392 to see if that looks ok?

@pananton
Copy link
Author

pananton commented Dec 29, 2020

Note that it really looks like what the library does with new style of requests (the default behavior) when the library listens to a wildcard and requests are incrementing the last token and map that to the appropriate "request" when processing the reply.

Earlier we discussed how to implement requests without using blocking natsConnection_Request, so I've taken your advices) And yes, I'm also doing my own mapping of request-reply pair using the technique described.

I know that this is not helping you very much, but if you want to have a look at #392 to see if that looks ok?

Yes, looks fine. Anyway, I think it's a useful option for other library users. Agreed that default behaviour should not be changed (at least for backward compatability).

I'm not insisting at all, but may be you would like to consider adding a function that takes the best from both worlds (for new style requests):

  1. from natsConnection_Request: ability to track published requests and fail them on disconnect/timeout
  2. from natsConnection_PublishRequest: it's asynchronous behaviour

Of course, such a function should provide some request-id (which you have already to perform internal request-reply mapping). and a way to specify callback that will be invoked on reply or error (this callback will provide reply message and request-id). natsConnection_PublishRequest is great but it requires to manually implement request timeout (not a big deal) and suffers from the issue described (can't be implemented).

@kozlovic
Copy link
Member

Yes, looks fine. Anyway, I think it's a useful option for other library users. Agreed that default behaviour should not be changed (at least for backward compatability).

Thanks, I will merge the PR then.

I'm not insisting at all, but may be you would like to consider adding a function that takes the best from both worlds (for new style requests):

  1. from natsConnection_Request: ability to track published requests and fail them on disconnect/timeout

That's what would happen with the new option in the discussed PR.

  1. from natsConnection_PublishRequest: it's asynchronous behaviour
    Of course, such a function should provide some request-id (which you have already to perform internal request-reply mapping). and a way to specify callback that will be invoked on reply or error (this callback will provide reply message and request-id).

The request ID would be more for the user though, no? Because the library does not need. It is exactly what is happening right now: a new reply subject is created, and the multiplex response handler uses the last token to find it in the map of requests and notify the proper response object condition variable. In an async way, the response object would be associated the callback and therefore could be invoked there. That would not be a problem. Also, the failing of pending requests (if new option is set) would work too.

The issue that would need way more coding is the timeout handling. There would be a need for a connection or library thread that deals with all request timeouts so that if no response is received.

I will noodle on that. Please let me know if you think that these RequestAsync() calls really need a user given request ID? As always when creating callbacks, I tend to have a "void *closure" argument anyway, but not sure if the request-ID thing should be required/formalized?

@kozlovic kozlovic reopened this Dec 29, 2020
@pananton
Copy link
Author

The request ID would be more for the user though, no? Because the library does not need. It is exactly what is happening right now: a new reply subject is created, and the multiplex response handler uses the last token to find it in the map of requests and notify the proper response object condition variable.

I mean that request-id nats.c library can return to user code is this last token from reply subject (I think it is unique because you need a way to demultiplex replies). I thought that this might be the simplest solution for you but your suggestion for providing void* closure as a way to identify original request when processsing reply is way better IMO.

The issue that would need way more coding is the timeout handling. There would be a need for a connection or library thread that deals with all request timeouts so that if no response is received.

Note that I have implemented timeout logic myself and can live without it. But I think if you provide such RequestAsync function it will be inconsistent to require request timeout management from user...

@kozlovic
Copy link
Member

Note that I have implemented timeout logic myself and can live without it. But I think if you provide such RequestAsync function it will be inconsistent to require request timeout management from user...

I meant that RequestAsync would require me to implement a timeout management for those async request, not the user ;-).

@pananton
Copy link
Author

I meant that RequestAsync would require me to implement a timeout management for those async request, not the user ;-).

I meant that please don't blame me for this additional work) In my code I've done this myself, thinking about the others)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants