Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests can flake if we send a message before Olm sessions are established (I think) #41

Closed
andybalaam opened this issue Mar 26, 2024 · 6 comments · Fixed by #42
Closed
Assignees

Comments

@andybalaam
Copy link
Contributor

I was seeing some tests failing on my local machine today, even though the CI is green. I have begun tracking down the cause, so am writing my thoughts and theories here, most of which are probably wrong.

I was getting this pattern of failures for TestRoomKeyIsCycledAfterEnoughMessages:

    --- PASS: TestRoomKeyIsCycledAfterEnoughMessages/{js_hs1}|{js_hs1} (22.84s)
    --- PASS: TestRoomKeyIsCycledAfterEnoughMessages/{js_hs1}|{rust_hs1} (3.12s)
    --- FAIL: TestRoomKeyIsCycledAfterEnoughMessages/{rust_hs1}|{js_hs1} (7.46s)
    --- FAIL: TestRoomKeyIsCycledAfterEnoughMessages/{rust_hs1}|{rust_hs1} (5.84s)

So we failed when we sent the message from Rust.

I saw messages like this:

   rust.go:586: TimelineDiff change: &{ID:$kkI-w3w1GZnC2JpYJ1NGPbCnYNTBQs85lnR92yrikKA Text: Sender:@user-7-alice:hs1 Target: Membership: FailedToDecrypt:true}

(Notice the FailedToDecrypt:true.)

When I add a 1-second pause after WithAliceAndBobSyncing the tests pass, so this makes me think that the message is being sent before the Olm session is established, or something similar.

When I add the pause before WithAliceAndBobSyncing (after (MustJoinRoom) the tests still fail.

This may be a real bug in the Rust SDK, or may be a test thing. Investigating.

@andybalaam andybalaam self-assigned this Mar 26, 2024
@kegsay
Copy link
Member

kegsay commented Mar 26, 2024

This is due to your change in the memory store in rust SDK. It causes a delay when claiming OTKs. I tried to improve this by inverting the client creation order in #33 but it isn't enough.

If you don't use latest main then this will likely be okay for you.

@andybalaam
Copy link
Contributor Author

Confirmed that these tests pass if I revert matrix-org/matrix-rust-sdk#3221

Side note: I was worried that I wasted time by breaking those PRs into small chunks, but I am very grateful now.

@andybalaam
Copy link
Contributor Author

Returning an empty Vec from load_tracked_users is enough to make the tests pass. Investigating.

@poljar
Copy link
Contributor

poljar commented Mar 27, 2024

Returning an empty Vec from load_tracked_users is enough to make the tests pass. Investigating.

There is a possibility that the culprit is the fact the implementation that stores the tracked users is buggy. It's using a Vec which means that inserting the same user with different dirty flags multiple times is possible: https://github.com/matrix-org/matrix-rust-sdk/blob/ac0bc95c253c5d46fec31378016497f433e0067d/crates/matrix-sdk-crypto/src/store/memorystore.rs#L59 and https://github.com/matrix-org/matrix-rust-sdk/blob/ac0bc95c253c5d46fec31378016497f433e0067d/crates/matrix-sdk-crypto/src/store/memorystore.rs#L331-L333

If you load the tracked users, there's no guarantee which one we will use to check if the member actually is dirty.

@andybalaam
Copy link
Contributor Author

Yeah I spotted the bug in the impl and have a fix for it, but unfortunately it doesn't fix the problem... continuing to investigate.

@andybalaam
Copy link
Contributor Author

Bug fix is here: matrix-org/matrix-rust-sdk#3282

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants