Stop keep alive when sync is aborted #3686

finsterwalder · 2023-08-29T20:37:20Z

Type: [defect] T-Defect
This PR solves issue: #3678

When you start a client, which has connection problems, a keepAlive is started.
But when the connection is actively stopped e.g. via calling stopSync(), the connection is aborted.
When the connection is aborted, it does not make sense to continue to try to connect, because it will never succeed.

This is solved by this PR.

I could not find any tests for the sync, so I didn't find a place to extend tests to check this special case.

Checklist

Tests written for new code (and old code if feasible)
Linter and other CI checks pass
Sign-off given on the changes (see CONTRIBUTING.md)

Here's what your changelog entry will look like:

✨ Features

Stop keep alive when sync is aborted (#3686). Contributed by @finsterwalder.

…pSync() to prevent endeless looping

richvdh

Thank you for the contribution!

However... I don't really understand this, and it seems dangerous.

First of all, please can you explain the code path more clearly. Why is an AbortError thrown in this situation? I'm not very familiar with this area of the code, I'm afraid. In any case, the new code needs more comments to explain what is going on.

Second: how can you be sure that this is the only reason that an error like this will be thrown? I note that the code that handles regular /sync requests doesn't do anything like this. Instead, it checks this.running (

matrix-js-sdk/src/sync.ts

Line 1038 in a33edb2

if (!this.running) {

). It seems like we should do the same thing here.

Third: please do add some tests for this case, otherwise there is a high chance of a regression. I think the best place is spec/integ/matrix-client-syncing.spec.ts.

richvdh · 2023-08-30T10:41:08Z

@finsterwalder Also: I see that you have ticked the box for "Sign-off given on the changes" but I don't see your sign-off here. Please do read CONTRIBUTING.md for details on the DCO process.

Signed-off-by: Malte Finsterwalder <[email protected]>

…terwalder/matrix-js-sdk into stop_keepAlive_when_sync_is_aborted

finsterwalder · 2023-08-30T19:53:25Z

Thanks for checking my PR so quickly.

The code path is as follows.
When the sync is started, but fails initially, recoverFromSyncStartupError() is called, which indirectly calls pokeKeepAlive(), which keeps polling the versions endpoint.
This polling only stops, when we get a http response, that is either successful or has at least http status 400 or 404.
In all other cases it keeps polling.
When sync.stop() is called in this situation, abortController.abort() is called, which leads to all further http calls resulting in an AbortError (the http request itself is not even executed anymore). The above keep alive does not stop, though and will simply keep polling indefinitely.
So my rational was, that getting an AbortError is a sure sign, that polling can never succeed and should end.

But you are right. if (!this.running) is probably the more appropriate check.
It's set to true, right before the abortController is set and it's set to false, right before abortController.abort() is called. So it should have the same effect as checking for an AbortError.
I will update my PR accordingly.

I added a sign-off now. I did that on a different branch and then forgot it. ;-)

I looked into the test file, but I didn't see any tests, that check the keep alive feature.
That's quite a task for me to get acquainted enough to write such a test for the first time...

richvdh · 2023-09-04T14:13:00Z

src/sync.ts

+                    if (!this.running) {
+                        clearTimeout(this.keepAliveTimer);
+                        if (this.connectionReturnedDefer) {
+                            this.connectionReturnedDefer.reject();
+                            this.connectionReturnedDefer = undefined;
+                        }


In any case, the new code needs more comments to explain what is going on.

please do give this code some comments to explain what it is doing. Imagine you are unfamiliar with this code and trying to understand it. When does it happen, why is this the right thing to do? Are there other places in this file that have similar logic that it might be helpful to link to?

richvdh · 2023-09-04T14:15:22Z

I added a sign-off now. I did that on a different branch and then forgot it. ;-)

I see a sign-off on one of the commits in this PR, but not the ones with all the new code! Best thing to do is to update the description of the PR to show that your signoff covers all the changes in the PR.

richvdh

I looked into the test file, but I didn't see any tests, that check the keep alive feature.
That's quite a task for me to get acquainted enough to write such a test for the first time...

I talked to some other members of the team. We're agreed that we should not make an exception from our policy that all changes should include tests, and we think it's important that this change be tested. We don't feel like we can land it without tests.

It doesn't have to be a perfect test - all we're looking for is something that fails before your change, and passes afterwards.

I'm sorry that the existing code is not well tested, and I'm also sorry that nobody is in a position to give better guidance on writing a new test.

finsterwalder · 2023-09-05T07:35:11Z

Ok. Good points. I will add documentation.
I will also fix the sign-off. Sorry for that, have never done that before. :-)
I will also look into creating a test. But that might take a little while...

Thinking about my problem a little more, I have some ideas for followup improvements. Was wondering, what you think about it:

This is a keepAlive, which keeps retrying the call to the matrix server. When I stop the sync, the retry should obviously stop. That's what my fix is about.
But maybe there are other error cases, where we know that retrying is useless, because it can never succeed?
e.g. A wrong auth token will never fix itself by just retrying. That's exactly the case that is at the root of my problem. Just that I need to detect the auth error myself (listening for an HttpApiEvent.SessionLoggedOut even) and the stopping the sync myself. With my fix, that stopSync call will stop the keepAlive now. But wouldn't it make sense to stop the keepAlive already, once a broken auth token is detected? So a response with http status 401 or 403 are pointless to retry IMHO. And maybe there are other cases, where retrying is pointless? How about 407 Proxy Authentication Required or 410 Gone (for good)? 414 URI Too Long? 431 Request Header Fields Too Large?

richvdh · 2023-09-05T09:25:49Z

But maybe there are other error cases, where we know that retrying is useless, because it can never succeed?
e.g. A wrong auth token will never fix itself by just retrying. That's exactly the case that is at the root of my problem. Just that I need to detect the auth error myself (listening for an HttpApiEvent.SessionLoggedOut even) and the stopping the sync myself. With my fix, that stopSync call will stop the keepAlive now. But wouldn't it make sense to stop the keepAlive already, once a broken auth token is detected?

You're probably right that there are lots of other calls that aren't worth retrying. However: One problem is that, as things currently stand, the spec (https://spec.matrix.org/v1.8/client-server-api/#get_matrixclientv3sync) has very little to say about which error codes might be returned, and what you can infer from them.

A second problem: what should we actually do on one of these errors? If we just stop the sync loop, it just leaves the user with a dead client. Should we raise a SessionLoggedOut event? That's a bit of a nuclear bomb because it will make the client forget all of its state (including E2EE keys, etc). Who is to say that it is not a temporary outage?

So we might be able to improve this, but I strongly recommend focussing on stopSync during keepalive for now.

finsterwalder · 2023-09-05T12:53:35Z

I understand that including other cases would need a little more thinking. I just wanted to raise awareness and start this discussion.
In this change I will only do the handling of !isRunning to stop after calling stopSync.

The 401 case is currently notified via the HttpApiEvent.SessionLoggedOut event, so there exists a possible handling strategy. And it's also clear, that this will not fix itself with retries. Would it make sense to include this case soon?

finsterwalder · 2023-09-12T09:01:51Z

It was easier to create a new, clean Pull Request: #3720

Malte Finsterwalder and others added 2 commits August 29, 2023 22:21

stop keepAlive when the connection was aborted as done by calling sto…

1872a8b

…pSync() to prevent endeless looping

Merge branch 'matrix-org:develop' into develop

a33edb2

finsterwalder requested a review from a team as a code owner August 29, 2023 20:37

finsterwalder requested review from germain-gg and richvdh August 29, 2023 20:37

github-actions bot added the Z-Community-PR Issue is solved by a community member's PR label Aug 29, 2023

finsterwalder changed the title ~~Stop keep alive when sync is aborted~~ T-Defect Stop keep alive when sync is aborted Aug 29, 2023

finsterwalder changed the title ~~T-Defect Stop keep alive when sync is aborted~~ Stop keep alive when sync is aborted Aug 29, 2023

germain-gg added the T-Enhancement label Aug 30, 2023

github-actions bot deployed to PR Documentation Preview August 30, 2023 10:02 View deployment

richvdh requested changes Aug 30, 2023

View reviewed changes

germain-gg removed their request for review August 30, 2023 16:45

finsterwalder and others added 3 commits August 30, 2023 21:31

Merge branch 'matrix-org:develop' into develop

58728e6

Signed-off-by: Malte Finsterwalder <[email protected]>

Merge branch 'stop_keepAlive_when_sync_is_aborted' of github.com:fins…

6122e93

…terwalder/matrix-js-sdk into stop_keepAlive_when_sync_is_aborted

check syncs this.running to abort a keep alive

18dab2f

richvdh self-requested a review August 31, 2023 06:04

richvdh requested changes Sep 4, 2023

View reviewed changes

github-actions bot deployed to PR Documentation Preview September 4, 2023 14:15 View deployment

richvdh requested changes Sep 4, 2023

View reviewed changes

finsterwalder closed this Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop keep alive when sync is aborted #3686

Stop keep alive when sync is aborted #3686

finsterwalder commented Aug 29, 2023 •

edited

Loading

richvdh left a comment

richvdh commented Aug 30, 2023

finsterwalder commented Aug 30, 2023

richvdh Sep 4, 2023

richvdh commented Sep 4, 2023

richvdh left a comment

finsterwalder commented Sep 5, 2023

richvdh commented Sep 5, 2023

finsterwalder commented Sep 5, 2023

finsterwalder commented Sep 12, 2023

Stop keep alive when sync is aborted #3686

Stop keep alive when sync is aborted #3686

Conversation

finsterwalder commented Aug 29, 2023 • edited Loading

Checklist

✨ Features

richvdh left a comment

Choose a reason for hiding this comment

richvdh commented Aug 30, 2023

finsterwalder commented Aug 30, 2023

richvdh Sep 4, 2023

Choose a reason for hiding this comment

richvdh commented Sep 4, 2023

richvdh left a comment

Choose a reason for hiding this comment

finsterwalder commented Sep 5, 2023

richvdh commented Sep 5, 2023

finsterwalder commented Sep 5, 2023

finsterwalder commented Sep 12, 2023

finsterwalder commented Aug 29, 2023 •

edited

Loading