Crypto: Posthog analytics for problems when sending message keys over to-device messages #2409

richvdh · 2024-04-29T16:04:21Z

There are various failure modes that can lead to problems sending to-device messages containing message keys, which will in turn lead to UTD errors. Currently, these are not reported in Posthog, so we lack visibility into how often they happen.

Likely root causes are the target user's homeserver being unreachable (related: #2154), or our own homeserver being unresponsive. More specific examples include:

We lack a device list and the /keys/query request failed (or the user is on a server we backed off from)
We failed to /keys/claim for a given device (or have backed off for this device)
We couldn't send the to-device message itself.

See also #234 which covers the receiving side of this (and is IMHO much lower-hanging fruit).

Question

A single sent message could result in hundreds or thousands of errors, depending on the number of devices in the room. Similarly, a single failing user could cause lots of different sent messages to have some sort of error. Should we report an event for each device for each user for each message? Or something more intelligent? What exactly are we trying to achieve with these metrics?

Implementation design

Slightly tricky because the list of things we need to report on are scattered around the codebase, though it is mostly within matrix-sdk-crypto. I think the first step here is to define an interface in matrix-sdk-crypto which emits an enum of potential error codes.

We can then add a method OlmMachine::share_room_keys_failure_stream, which returns a Stream, and each time something on the list above goes wrong, we write a new entry to the stream. The stream could then be wrapped in both (Rust) matrix-sdk and matrix-js-sdk, for turning into Posthog events.

The text was updated successfully, but these errors were encountered:

BillCarsonFr · 2024-05-13T13:57:29Z

Having analytics for failure to decrypt to_device messages would be more usefull now.

richvdh added the A-E2EE label Apr 29, 2024

richvdh mentioned this issue Apr 29, 2024

Crypto | Posthog analytics for to-device decryption errors #234

Open

6 tasks

richvdh added T-Feature Request to add a new feature which does not exist right now A-Telemetry Telemetry / analytics to understand usage labels Apr 29, 2024

andybalaam changed the title ~~Crypto: Posthog analytics for problems when sending message keys keys over to-device messages~~ Crypto: Posthog analytics for problems when sending message keys over to-device messages May 13, 2024

andybalaam closed this as completed Oct 17, 2024

andybalaam reopened this Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crypto: Posthog analytics for problems when sending message keys over to-device messages #2409

Crypto: Posthog analytics for problems when sending message keys over to-device messages #2409

richvdh commented Apr 29, 2024 •

edited

Loading

BillCarsonFr commented May 13, 2024

Crypto: Posthog analytics for problems when sending message keys over to-device messages #2409

Crypto: Posthog analytics for problems when sending message keys over to-device messages #2409

Comments

richvdh commented Apr 29, 2024 • edited Loading

Question

Implementation design

BillCarsonFr commented May 13, 2024

richvdh commented Apr 29, 2024 •

edited

Loading