diff --git a/proposals/4140-delayed-events-futures.md b/proposals/4140-delayed-events-futures.md new file mode 100644 index 00000000000..62dfb925b39 --- /dev/null +++ b/proposals/4140-delayed-events-futures.md @@ -0,0 +1,830 @@ +# MSC4140: Cancellable delayed events + +This MSC proposes a mechanism by which a Matrix client can schedule an event (including a state event) to be sent into +a room at a later time. + +The client does not have to be running or in contact with the Homeserver at the time that the event is actually sent. + +Once the event has been scheduled, the user's homeserver is responsible for actually sending the event at the appropriate +time and then distributing it as normal via federation. + + +- [Background and motivation](#background-and-motivation) +- [Proposal](#proposal) + - [Scheduling a delayed event](#scheduling-a-delayed-event) + - [Managing delayed events](#managing-delayed-events) + - [Getting delayed events](#getting-delayed-events) + - [On demand](#on-demand) + - [On push](#on-push) + - [Homeserver implementation details](#homeserver-implementation-details) + - [Power levels are evaluated at the point of sending](#power-levels-are-evaluated-at-the-point-of-sending) + - [Delayed state events are cancelled by a more recent state event](#delayed-state-events-are-cancelled-by-a-more-recent-state-event) + - [Rate-limiting at the point of sending](#rate-limiting-at-the-point-of-sending) +- [Use case specific considerations](#use-case-specific-considerations) + - [MatrixRTC](#matrixrtc) + - [Background](#background) + - [How this MSC would be used for MatrixRTC](#how-this-msc-would-be-used-for-matrixrtc) + - [Self-destructing messages](#self-destructing-messages) +- [Potential issues](#potential-issues) + - [Compatibility with Cryptographic Identities](#compatibility-with-cryptographic-identities) +- [Alternatives](#alternatives) + - [Delegating delayed events](#delegating-delayed-events) + - [Batch sending](#batch-sending) + - [Not reusing the `send`/`state` endpoint](#not-reusing-the-sendstate-endpoint) + - [Batch delayed events with custom endpoint](#batch-delayed-events-with-custom-endpoint) + - [Batch Response](#batch-response) + - [EventId template variable](#eventid-template-variable) + - [Allocating the event ID at the point of scheduling the send](#allocating-the-event-id-at-the-point-of-scheduling-the-send) + - [MSC4018 (use client sync loop)](#msc4018-use-client-sync-loop) + - [Federated delayed events](#federated-delayed-events) + - [MQTT style Last Will](#mqtt-style-last-will) + - [`M_INVALID_PARAM` instead of `M_MAX_DELAY_EXCEEDED`](#m_invalid_param-instead-of-m_max_delay_exceeded) + - [Naming](#naming) + - [Don't provide a `send` action](#dont-provide-a-send-action) + - [Use `DELETE` HTTP method for `cancel` action](#use-delete-http-method-for-cancel-action) + - [[Ab]use typing notifications](#abuse-typing-notifications) +- [Security considerations](#security-considerations) +- [Unstable prefix](#unstable-prefix) +- [Dependencies](#dependencies) + + +## Background and motivation + +This proposal originates from the needs of VoIP signalling in Matrix: + +The Client-Server API currently has a [Voice over IP module](https://spec.matrix.org/v1.11/client-server-api/#voice-over-ip) +that uses room messages to communicate the call state. However, it only allows for calls with two participants. + +[MSC3401: Native Group VoIP Signalling](https://github.com/matrix-org/matrix-spec-proposals/pull/3401) proposes a scheme +that allows for more than two participants by using room state events. + +In this arrangement each device signals its participant in a call by sending a state event that represents the device's +"membership" of a call. Once the device is no longer in the call, it sends a new state event to update the call state and +communicate that the device is no longer a member. + +This works well when the client is running and can send the state events as needed. However, if the client is not able to +communicate with the homeserver (e.g. the user closes the app or loses connection) the call state is not updated to say +that the participant has left. + +The motivation for this MSC is to allow updating call member state events after the user disconnected by allowing to +schedule/delay/timeout/expire events in a generic way. + +The ["reliability requirements for the room state"](https://github.com/matrix-org/matrix-spec-proposals/blob/toger5/matrixRTC/proposals/4143-matrix-rtc.md#reliability-requirements-for-the-room-state) +section of [MSC4143: MatrixRTC](https://github.com/matrix-org/matrix-spec-proposals/pull/4143) has more details on the +use case. + +There are numerous possible solution to solve the call member event expiration. They are covered in detail +in the [Use case specific considerations/MatrixRTC](#use-case-specific-considerations) section, because they are not part +of this proposal. + +This proposal enables a Matrix client to schedule a "hangup" state event to be sent after a specified time period. +The client can then periodically restart the timer whilst it is running. If the client is no longer running +or able to communicate, then the timer would expire and the homeserver would send the "hangup" event on behalf of the client. + +Such an arrangement can also be described as a "heartbeat" mechanism. The client sends a "heartbeat" to the homeserver +in the form of a "restart" of the delayed event to keep the call "alive". +The homeserver will automatically send the "hangup" if it does not receive a "heartbeat". + +## Proposal + +The following operations are added to the client-server API: + +- Schedule an event to be sent at a later time +- Get a list of delayed events +- Restart the timer of a delayed event +- Send the delayed event immediately +- Cancel a delayed event so that it is never sent + +At the point of an event being scheduled the homeserver is [unable to allocate the event ID](#allocating-the-event-id-at-the-point-of-scheduling-the-send). +Instead, the homeserver allocates a `delay_id` to the scheduled event which is used during the above API operations. + +### Scheduling a delayed event + +An optional `delay` query parameter is added to the existing +[`PUT /_matrix/client/v3/rooms/{roomId}/state/{eventType}/{stateKey}`](https://spec.matrix.org/v1.11/client-server-api/#put_matrixclientv3roomsroomidsendeventtypetxnid) +and +[`PUT /_matrix/client/v3/rooms/{roomId}/send/{eventType}/{txnId}`](https://spec.matrix.org/v1.11/client-server-api/#put_matrixclientv3roomsroomidstateeventtypestatekey) +endpoints. + +The new query parameter is used to configure the event scheduling: + +- `delay` - Optional number of milliseconds the homeserver should wait before sending the event. If no `delay` is provided, +the event is sent immediately as normal. + +The body of the request is the same as it is currently. + +If a `delay` is provided, the homeserver schedules the event to be sent with the specified delay and responds with a +`delay_id` field (omitting the `event_id` as it is not available): + +```http +200 OK +Content-Type: application/json + +{ + "delay_id": "1234567890" +} +``` + +The homeserver can optionally enforce a maximum delay duration. If the requested delay exceeds the maximum, the homeserver +can respond with a [`400`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400) status code +and a body with a Matrix error code `M_MAX_DELAY_EXCEEDED` and the maximum allowed delay (`max_delay` in milliseconds). + +For example, the following specifies a maximum delay of 24 hours: + +```http +400 Bad Request +Content-Type: application/json + +{ + "errcode": "M_MAX_DELAY_EXCEEDED", + "error": "The requested delay exceeds the allowed maximum.", + "max_delay": 86400000 +} +``` + +The homeserver **should** apply rate limiting to the scheduling of delayed events to provide mitigation against the +[High Volume of Messages](https://spec.matrix.org/v1.11/appendices/#threat-high-volume-of-messages) threat. + +The homeserver **may** apply a limit on the maximum number of outstanding delayed events in which case the Matrix error code +`M_MAX_DELAYED_EVENTS_EXCEEDED` can be returned: + +```http +400 Bad Request +Content-Type: application/json + +{ + "errcode": "M_MAX_DELAYED_EVENTS_EXCEEDED", + "error": "The maximum number of delayed events has been reached.", +} +``` + +### Managing delayed events + +A new authenticated client-server API endpoint at `POST /_matrix/client/v1/delayed_events/{delay_id}` allows scheduled events +to be managed. + +The body of the request is a JSON object containing the following fields: + +- `action` - The action to take on the delayed event.\ +Must be one of: + - `send` - Send the delayed event immediately. + - `cancel` - Cancel the delayed event so that it is never sent. + - `restart` - Restart the timeout of the delayed event. + +For example, the following would send the delayed event with delay ID `1234567890` immediately: + +```http +POST /_matrix/client/v1/delayed_events/1234567890 +Content-Type: application/json + +{ + "action": "send" +} +``` + +Where the `action` is `send`, the homeserver **should** apply rate limiting to provide mitigation against the +[High Volume of Messages](https://spec.matrix.org/v1.11/appendices/#threat-high-volume-of-messages) threat. + +### Getting delayed events + +#### On demand + +New authenticated client-server API endpoints `GET /_matrix/client/v1/delayed_events/scheduled` and +`GET /_matrix/client/v1/delayed_events/finalised` allows clients to get a list of +all the delayed events owned by the requesting user that have been scheduled to send, have been sent, or failed to be sent. + +The endpoints accepts a query parameter `from` which is a token that can be used to paginate the list of delayed events as +per the [pagination convention](https://spec.matrix.org/v1.11/appendices/#pagination). The homeserver can choose a suitable +page size. + +The response is a JSON object containing the following fields: + +- For the `GET /_matrix/client/v1/delayed_events/scheduled` endpoint: + - `delayed_events` - Required. An array of delayed events that have been scheduled to be sent, + sorted by `running_since + delay` in increasing order (event that will timeout soonest first). + - `delay_id` - Required. The ID of the delayed event. + - `room_id` - Required. The room ID of the delayed event. + - `type` - Required. The event type of the delayed event. + - `state_key` - Optional. The state key of the delayed event if it is a state event. + - `delay` - Required. The delay in milliseconds before the event is to be sent. + - `running_since` - Required. The timestamp (as Unix time in milliseconds) when the delayed event was scheduled or + last restarted. + - `content` - Required. The content of the delayed event. This is the body of the original `PUT` request, not a preview + of the full event after sending. + - `next_batch` - Optional. A token that can be used to paginate the list of delayed events. + +- For the `GET /_matrix/client/v1/delayed_events/finalised` endpoint: + - `finalised_events` - Required. An array of finalised delayed events, that have either been sent or resulted in an error, + sorted by `origin_server_ts` in decreasing order (latest finalised event first). + - `delayed_event` - Required. Describes the original delayed event in the same format as the `delayed_events` array. + - `outcome`: `"send"|"cancel"` + - `reason`: `"error"|"action"|"delay"` + - `error`: Optional Error. A matrix error (as defined by [Standard error response](https://spec.matrix.org/v1.11/client-server-api/#standard-error-response)) + to explain why this event failed to be sent. The Error can either be the `M_CANCELLED_BY_STATE_UPDATE` or any of the + Errors from the client server send and state endpoints. + - `event_id` - Optional EventId. The `event_id` this event got in case it was sent. + - `origin_server_ts` - Optional Timestamp. The timestamp the event was sent. + - `next_batch` - Optional. A token that can be used to paginate the list of finalised events. + +The batch size and the amount of terminated events that stay on the homeserver can be chosen, by the homeserver. +The recommended values are: + +- `finalised_events` retention: 7 days +- `finalised_events` batch size: 10 +- `finalised_events` max cached events: 1000 + +There is no guarantee for a client that all events will be available in the +finalised events list if they exceed the limits of their homeserver. +Additionally, a homeserver may discard finalised delayed events that have been returned by a +`GET /_matrix/client/v1/delayed_events/finalised` response. + +An example for a response to the `GET /_matrix/client/v1/delayed_events/scheduled` endpoint: + +```http +200 OK +Content-Type: application/json + +{ + "delayed_events": [ + { + "delay_id": "1234567890", + "room_id": "!roomid:example.com", + "type": "m.room.message", + "delay": 15000, + "running_since": 1721732853284, + "content":{ + "msgtype": "m.text", + "body": "I am now offline" + } + }, + { + "delay_id": "abcdefgh", + "room_id": "!roomid:example.com", + "type": "m.call.member", + "state_key": "@user:example.com_DEVICEID", + "delay": 5000, + "running_since": 1721732853284, + "content":{ + "memberships": [] + } + } + ], + "next_batch": "b12345" +} +``` + +Unless the delayed event is updated beforehand, the event will be sent after `running_since` + `delay`. + +This can be used by clients to display events that have been scheduled to be sent in the future. + +For use cases where the existence of a delayed event is also of interest for other room members +(e.g. self-destructing messages), it is recommended to include this information in the original/affected event itself. + +#### On push + +A new optional key, `finalised_events`, is added to the response body of `/sync`. The shape of its +value is equivalent to that of the response body of `GET /_matrix/client/v1/delayed_events/finalised`. +It is an array of the syncing user's delayed events that were sent or failed to be sent after the +`since` timestamp parameter of the associated `/sync` request, or all of them for full `/sync`s. +When no such delayed events exist, the `finalised_events` key is absent from the `/sync` response. + +A new key, `finalised_events`, is defined for `POST /_matrix/client/v3/user/{userId}/filter`. +Its value is a boolean which, if set to `false`, causes an associated `/sync` response to exclude +any `finalised_events` key it may have otherwise included. + +The only delayed events included in `finalised_events` are ones that have been retained by the homeserver, +as per the same retention policies as for the `GET /_matrix/client/v1/delayed_events/finalised` endpoint. +Additionally, a homeserver may discard finalised delayed events that have been returned by a `/sync` response. + +The `finalised_events` key is added to the request bodies of the appservice API `/transactions` endpoint. +It has the same content as the key for `/sync`, and contains all of the target appservice's delayed events +that were sent or failed to be sent since the previous transaction. + +### Homeserver implementation details + +#### Power levels are evaluated at the point of sending + +Power levels are evaluated for each event only once the delay has occurred and it will be distributed/inserted into the +DAG. This implies a delayed event can fail if it violates power levels at the time the delay passes. + +Conversely, it's also possible to successfully schedule an event that the user has no permission to send at the time of sending. +If the power level situation has changed at the time the delay passes, the event can even reach the DAG. + +#### Delayed state events are cancelled by a more recent state event + +> [!NOTE] +> Special rule for delayed state events: +> A delayed event `D` gets cancelled if: +> +> - `D` is a state event with key `k` and type `t` from sender `s`. +> - A new state event `N` with type `t` and key `k` is sent into the room. +> - The sender of `D` is different to the sender `N`. + +If a new state event is sent to the same room at the same entry (`event_type`, `state_key` pair) as a delayed event by a +**different matrix user**, any delayed event for this entry (`event_type`, `state_key` pair) is cancelled. + +This only happens if its a state update from a different user. If it is from the same user, the delayed event will not get cancelled. +If the same user is updating the state which has associated delayed events, this user is in control of those delayed events. +They can just cancel and check the events manually using the `/delayed_events` and the `/delayed_events/scheduled` endpoint. + +In the case where the delayed event gets cancelled due to a different user updating the same state, there +is no race condition here since a possible race between timeout and the _new state event_ will always converge to +the _new state event_: + +- timeout for _delayed event_ followed by _new state event_: the room state will be updated twice: once by the content of + the delayed event but later with the content of _new state event_. +- _new state event_ followed by timeout for _delayed event_: the _new state event_ will cancel the outstanding _delayed event_. + +The finalised delayed event as represented by the finalised list of the GET endpoint (See:[Getting delayed events](#getting-delayed-events)) +will be stored with the following outcome: + +```json +"outcome": "cancel", +"reason": "error", +"error": { + "errorcode": "M_CANCELLED_BY_STATE_UPDATE", + "error":"The delayed event did not get send because a different user updated the same state event. + So the scheduled event might change it in an undesired way."} +``` + +Note that this behaviour does not apply to regular (non-state) events as there is no concept of a (`event_type`, `state_key`) +pair that could be overwritten. + +#### Rate-limiting at the point of sending + +Further to the rate limiting of the API endpoints, the homeserver **should** apply rate limiting to the sending +of delayed messages at the point that they are inserted into the DAG. + +This is to provide mitigation against the +[High Volume of Messages](https://spec.matrix.org/v1.11/appendices/#threat-high-volume-of-messages) threat where a malicious +actor could schedule a large volume of events ahead of time without exceeding a rate limit on the initial `PUT` request, +but has specified a `delay` that corresponds to a common point of time in the future. + +A limit on the maximum number of delayed events that can be outstanding at one time could also provide some mitigation against +this attack. + +## Use case specific considerations + +Delayed events can be used for many different features: tea timers, reminders, or ephemeral events could be implemented +using delayed events, where clients send room events with +intentional mentions or a redaction as a delayed event. +It can even be used to send temporal power levels/mutes or bans. + +### MatrixRTC + +In this section, an overview is given how this MSC is used in [MSC4143: MatrixRTC](https://github.com/matrix-org/matrix-spec-proposals/pull/4143) +and alternative expiration systems are evaluated. + +#### Background + +MatrixRTC makes it necessary to have real time information about the current MatrixRTC session. +To properly display room tiles and header in the room list (or compute a list of ongoing calls), it's required to know: + +- If there is a running session. +- What type that session has. +- Who and how many people are currently participating. + +A particular delicate situation is that clients are not able to inform others if they lose connection. +There are numerous approaches to solve such a situation. They split into two categories: + +- Polling based + - Ask the users if they are still connected. + - Ask an RTC backend (SFU) who is connected. +- Timeout based + - Update the room state every x seconds. + This allows clients to check how long an event has not been updated and ignore it if it's expired. + - Use delayed events with a 10s timeout to send the disconnected from call + in less then 10s after the user is not anymore pinging the `/delayed_events` endpoint + (or delegate the disconnect action to a service attached to the SFU). + - Use the client sync loop as a special case timeout for call member events + (see [Alternatives/MSC4018 (use client sync loop))](#msc4018-use-client-sync-loop)). + +Polling based solutions have a large overhead in complexity and network requests on the clients. +For example: + +> A room list with 100 rooms where there has been a call before in every room +> (or there is an ongoing call) would require the client to send a to-device message +> (or a request to the SFU) to every user that has an active state event to check if +> they are still online. All this is just to display the room tile properly. + +For displaying the room list, timeout based approaches are much more reasonable because they allow computing MatrixRTC +metadata for a room to be synchronous. + +The current solution updates the room state every X minutes. +This is not elegant since room state gets repeatedly sent with the same content. +In large calls, this could result in high traffic and increase the size of the room DAG. + +A call with 100 call members implies 100 state events every X minutes. X cannot be a +long duration because +it is the duration after which the event can be considered expired. Improper +disconnects would result in the user being displayed as "still in the call" for +X minutes (which should be as short as possible). + +Additionally, this approach requires perfect server client time synchronization to compute the expiration. +This is currently not possible over federation since `unsigned.age` is not available over federation. + +#### How this MSC would be used for MatrixRTC + +With this proposal, the client can use delayed events to implement a "heartbeat" mechanism. + +On joining the call, the client sends a "join" state event as normal to indicate that it is participating: + +```http +PUT /_matrix/client/v1/rooms/!wherever:example.com/state/m.call.member/@someone:example.com +Content-Type: application/json + +{ + "memberships": [ + { + ...membership data here... + } + ] +} +``` + +Before sending the join event, it also schedules a delayed "hangup" state event with `delay` of around 5-20 seconds that +marks the end of its participation: + +```http +PUT /_matrix/client/v1/rooms/!wherever:example.com/state/m.call.member/@someone:example.com?delay=10000 +Content-Type: application/json + +{ + "memberships": [] +} +``` + +Let's say the homeserver returns a `delay_id` of `1234567890`. + +The client then periodically sends a "heartbeat" in the form of a "restart" of the delayed "hangup" state event to keep +the call membership "alive". + +For example it could make the request every 5 seconds (or some other period less than the `delay`): + +```http +POST /_matrix/client/v1/delayed_events/1234567890 +Content-Type: application/json + +{ + "action": "restart" +} +``` + +This would have the effect that if the homeserver does not receive a "heartbeat" from the client for 10 seconds, then +it will automatically send the "hangup" state event for the client. + +Since the delayed event is sent first, a client can guarantee (at the time they are sending +the join event) that it will eventually leave. + +### Self-destructing messages + +This MSC also allows an implementation of "self-destructing" messages using redaction: + +First send (or generate the PDU when +[MSC4080: Cryptographic Identities](https://github.com/matrix-org/matrix-spec-proposals/pull/4080) +is available): +`PUT /_matrix/client/v1/rooms/{roomId}/send/m.room.message/{txnId}` + +```jsonc +{ + "msgtype": "m.text", + "body": "this message will self-redact in 10 minutes" +} +``` + +then send: +`PUT /_matrix/client/v1/rooms/{roomId}/send/m.room.redaction/{txnId}?delay=600000` + +```jsonc +{ + "redacts": "{event_id}" +} +``` + +This would redact the message with content: `"m.text": "my msg"` after 10 minutes. + +## Potential issues + +### Compatibility with Cryptographic Identities + +Ideally, this proposal should be compatible with other proposals such as +[MSC4080: Cryptographic Identities](https://github.com/matrix-org/matrix-spec-proposals/pull/4080) which introduce mechanisms +to allow the recipient of an event to determine whether it was sent by a client as opposed to have been spoofed/injected +by a malicious homeserver. + +In the context of this proposal, the delayed events should be signed with the same cryptographic identity as the client +that scheduled them. + +This means that the content of the original scheduled event must be sent "as is" without modification by the homeserver. +The consequence is an implementation detail that client developers must be aware of: if the content of the delayed +event contains a timestamp, then it would be the timestamp of when the event was originally scheduled rather than +anything later. + +However, the `origin_server_ts` of the delayed event should be the time that the event is actually sent by the homeserver. + +This is a general problem that arises with the introduction +of [Cryptographic Identities](https://github.com/matrix-org/matrix-spec-proposals/pull/4080). +A user can intentionally, or caused by network conditions, delay the signing and sending of an event. +A possible solution would be the introduction of a `signing_ts` (in the signed section) and keep the `origin_server_ts` +in the unsigned section. +Both are reasonable data points that clients might want to use. +This would solve issues related to delayed events since +it would make it transparent to clients, when an event was scheduled and when it was distributed over federation. + +## Alternatives + +### Delegating delayed events + +It is useful for external services to also interact with delayed events. If a client disconnects, an external service can +be the best source to send the delayed event/"last will". + +This is not covered in this MSC but could be realized with scoped access tokens. +A scoped token that only allows to interact with the `delayed_events` endpoint and only with a subset of `delay_id`s +would be used. + +With this, an SFU that tracks the current client connection state could be given the power to control the delayed event. +The client would share the scoped token and the required details, so that the SFU can call the +`refresh` endpoint while a user is connected +and can call the delayed event `send` request once the user disconnects +(using a `{"action": "restart"}` and a `{"action": "send"}` `/delayed_events` request.). +This way, the SFU can be used as the source of truth for the call member room state event without knowing anything about +the Matrix call. + +Since the SFU has a much lower chance of running into a network issue, +`{"action": "restart"}` calls may be sent much more infrequently. +Instead of calling the `/delayed_events` endpoint every couple of seconds, a delayed event's +timeout can be set to be long (e.g. 6 hours), as the SFU can be expected to not forget sending the `{"action": "send"}` action +when it detects a disconnecting client. + +### Batch sending + +In some scenarios it is important to allow to send an event with an associated +delay at the same time. + +- One example would be redacting an event. It only makes sense to redact the event if it exists. + It might be important to have the guarantee that the delayed redact is received + by the server at the time where the original message is sent. +- In the case of a state event, a user might want to set the state to `A` and after a + timeout change it back to `{}`. By using two separate requests, sending `A` could work, + but the event with content `{}` could fail. The state would not automatically + reset to `{}`. + +For this use case, batch sending of multiple delayed events would be desired. + +Batch sending is not included in the proposal of this MSC however since batch sending should +become a generic Matrix concept as proposed with `/send_pdus`. (see: [MSC4080: Cryptographic Identities](https://github.com/matrix-org/matrix-spec-proposals/pull/4080)) + +[MSC2716: Incrementally importing history into existing rooms](https://github.com/matrix-org/matrix-spec-proposals/pull/2716) +already proposes a `batch_send` endpoint. However, it is limited to application services and focuses on historic +data. Since the additional capability to use a template `event_id` parameter is also needed, this probably is not a good fit. + +### Not reusing the `send`/`state` endpoint + +Alternatively, new endpoints could be introduced to not overload the `send` and `state` endpoint. +Those endpoints could be called: + +`PUT /_matrix/client/v1/rooms/{roomId}/send_delayed_event/{eventType}/{txnId}?delay={delay_ms}` + +`PUT /_matrix/client/v1/rooms/{roomId}/state_delayed_event/{eventType}/{stateKey}?delay={delay_ms}` + +This would allow the response for the `send` and `state` endpoints to remain as they are currently, +and to have a different return type for the new `send_delayed_event` and `state_delayed_event` endpoints. + +### Allocating the event ID at the point of scheduling the send + +This was considered, but when sending a delayed event the `event_id` is not yet available: + +The Matrix spec says that the `event_id` must use the [reference hash](https://spec.matrix.org/v1.10/rooms/v11/#event-ids) +which is [calculated from the fields](https://spec.matrix.org/v1.10/server-server-api/#calculating-the-reference-hash-for-an-event) +of an event including the `origin_server_timestamp` as defined in [this list](https://spec.matrix.org/v1.10/rooms/v11/#client-considerations) + +Since the `origin_server_timestamp` should be the timestamp the event has when entering the DAG (required for call +duration computation), the `event_id` cannot be computed when using the `send` endpoint before the delayed event has resolved. + +### MSC4018 (use client sync loop) + +[MSC4018: Reliable call membership](https://github.com/matrix-org/matrix-spec-proposals/pull/4018) also +proposes a way to make call memberships reliable. It uses the client sync loop as +an indicator to determine if the event is expired, instead of letting the SFU +inform about the call termination or using the call app ping/refresh loop as proposed earlier in this MSC. + +The advantage is that this does not require introducing a new ping system +(as is proposed here by using the `delayed_events` restart action). +Though with cryptographic identities, the client needs to create the leave event. + +The timeout for syncs are much slower than what would be desirable (30s vs 5s). + +With a widget implementation for calls, it cannot be guaranteed that the widget is running during the sync loop. +So one either has to move the hangup logic to the hosting client or let the widget run all the time. + +A dedicated ping (independent to the sync loop) is more flexible and allows for the widget to +execute the timer restart. +If the widget dies, the call membership will disconnect. + +Additionally, the specification should not include specific +custom server rules if possible. +Sending an event on behalf of a user based on the client sync loop if there is an event with a specific type and specific +content is quite a server-specific behaviour, and also would not work well with encrypted state events and cryptographic +identities. +This proposal is a general behaviour valid for all event types. + +### Federated delayed events + +Delayed events could be sent over federation immediately and then have the receiving servers process (sent down to clients) +them at the appropriate time. + +Downsides of this approach that have been considered are that: + +- individual "heartbeats"/restarts would need to distributed via federation, meaning more traffic and processing +to be done. +- if any homeservers missed the federated "heartbeat"/restart message, then they might decide that the event is visible +to clients whereas +other homeservers might have received it and come to a different conclusion. If the event was later cancelled then +resolving the inconsistency feels more complex than if the event was never sent in the first place. + +[MSC3277: Scheduled messages](https://github.com/matrix-org/matrix-spec-proposals/pull/3277) proposes a similar feature +and there is an extensive analysis of the pros and cons of this MSC vs MSC3277 +[here](https://github.com/matrix-org/matrix-spec-proposals/pull/4140#discussion_r1653083566). + +If it's not needed to allow modification of a delayed event after it has been scheduled, there is a benefit in +federating the scheduled event (adding it to the DAG immediately). It increases resilience: the sender's homeserver can +disconnect and the delayed message still will enter non-soft-failed state (will be sent). + +However, for the MatrixRTC use case it's required to be able to modify the event after it has been scheduled. As such, +this approach has been discounted. + +### MQTT style Last Will + +[MQTT](https://mqtt.org/) has the concept of a Will Message that is published by the server when a client disconnects. + +The client can set a Will Message when it connects to the server. If the client disconnects unexpectedly, the server will +publish the Will Message if the client is not back online within a specified time. + +A similar concept could be applied to Matrix by having the client specify a set of "Last Will" events and have the +homeserver trigger them if the client (possibly identified by device ID) does not send an API request within a specified +time. + +The main differentiator is that this type of approach might use the sync loop as the "heartbeat" equivalent similar to +[MSC4018](https://github.com/matrix-org/matrix-spec-proposals/pull/4018). + +A benefit compared to this proposal is that theoretically there would be no additional network traffic overhead. + +Some complications are: + +- in order to avoid additional network traffic, the homeserver would need to proactively realise that a connection +has dropped. Depending on the network/load balancer stack this might be problematic. +- as an alternative, the client could reduce the long poll timeout (from a typical 30s down to, say, 5s) which would +result in a traffic increase. +- As syncing is a per-client concept, the MatrixRTC app has to either run in the same process as the client so that a +MatrixRTC app failure triggers the client Last Will or the client has to observe the MatrixRTC app and simulate the Last +Will if the MatrixRTC app fails. + +### `M_INVALID_PARAM` instead of `M_MAX_DELAY_EXCEEDED` + +The existing `M_INVALID_PARAM` error code could be used instead of introducing a new error code `M_MAX_DELAY_EXCEEDED`. + +### Naming + +The following alternative names for this concept are considered: + +- Future +- DelayedEvents +- PostponedEvents +- LastWill + +### Don't provide a `send` action + +Instead of providing a `send` action for delayed events, the client could cancel the outstanding delayed event and send +a new non-delayed event instead. + +This would simplify the API, but it's less efficient since the client would have to send two requests instead of one. + +### Use `DELETE` HTTP method for `cancel` action + +Instead of providing a `cancel` action for delayed events, the client could send a `DELETE` request to the same endpoint. + +This feels more elegant, but it doesn't feel like a good suggestion for how the other actions are mapped. + +### [Ab]use typing notifications + +Some exploration of using typing notifications to indicate that a user is still connected to a call was done. + +The idea of extending [MSC3038: Typed typing notifications](https://github.com/matrix-org/matrix-spec-proposals/pull/3038) +to allow for additional meta data (like device ID and call ID) was considered. + +A perceived benefit was that if the delay events were federated, then the typing notification EDUs might provide an +efficient transport. + +However, as the conclusion was to [not federate the delayed events](#federated-delayed-events), this approach was +discounted in favour of a dedicated endpoint. + +### Alternative to `running_since` field + +Some alternatives for the `running_since` field on the `GET` response are: + +- `delaying_from` +- `delayed_since` +- `delaying_since` +- `last_restart` - but this feels less clear than `running_since` for a delayed event that hasn't been restarted + +## Security considerations + +All new endpoints are authenticated. + +Servers **should** impose a maximum timeout value for delay timeouts of not more than a month. + +As described [above](#power-levels-are-evaluated-at-the-point-of-sending), the homeserver **must** evaluate and enforce the +power levels at the time of the delayed event being sent (i.e. added to the DAG). + +This has the risk that this feature could be used by a malicious actor to circumvent existing rate limiting measures which +corresponds to the [High Volume of Messages](https://spec.matrix.org/v1.11/appendices/#threat-high-volume-of-messages) +threat. The homeserver **should** apply rate-limiting to both the scheduling of delayed events and the later sending to +mitigate this risk. + +## Unstable prefix + +Whilst the MSC is in the proposal stage, the following should be used: + +- `org.matrix.msc4140.delay` should be used instead of the `delay` query parameter. +- `POST /_matrix/client/unstable/org.matrix.msc4140/delayed_events/{delay_id}` should be used instead of + the `POST /_matrix/client/v1/delayed_events/{delay_id}` endpoint. +- `GET /_matrix/client/unstable/org.matrix.msc4140/delayed_events` should be used instead of + the `GET /_matrix/client/v1/delayed_events` endpoint. +- `org.matrix.msc4140.finalised_events` should be used as keys of `/sync`, `/transactions`, and + `/filter` instead of `finalised_events`. +- The `M_UNKNOWN` `errcode` should be used instead of `M_MAX_DELAY_EXCEEDED` as follows: + +```json +{ + "errcode": "M_UNKNOWN", + "error": "The requested delay exceeds the allowed maximum.", + "org.matrix.msc4140.errcode": "M_MAX_DELAY_EXCEEDED", + "org.matrix.msc4140.max_delay": 86400000 +} +``` + +instead of: + +```json +{ + "errcode": "M_MAX_DELAY_EXCEEDED", + "error": "The requested delay exceeds the allowed maximum.", + "max_delay": 86400000 +} +``` + +- The `M_UNKNOWN` `errcode` should be used instead of `M_MAX_DELAYED_EVENTS_EXCEEDED` as follows: + +```json +{ + "errcode": "M_UNKNOWN", + "error": "The maximum number of delayed events has been reached.", + "org.matrix.msc4140.errcode": "M_MAX_DELAYED_EVENTS_EXCEEDED" +} +``` + +instead of: + +```json +{ + "errcode": "M_MAX_DELAYED_EVENTS_EXCEEDED", + "error": "The maximum number of delayed events has been reached." +} +``` + +- The `M_UNKNOWN` `errcode` should be used instead of `M_CANCELLED_BY_STATE_UPDATE` as follows: + +```json +{ + "errcode": "M_UNKNOWN", + "org.matrix.msc4140.errcode": "M_CANCELLED_BY_STATE_UPDATE", + "error":"The delayed event did not get send because a different user updated the same state event. + So the scheduled event might change it in an undesired way." + } +``` + +instead of: + +```json +{ + "errcode": "M_CANCELLED_BY_STATE_UPDATE", + "error":"The delayed event did not get send because a different user updated the same state event. + So the scheduled event might change it in an undesired way." + } +``` + +Additionally, the feature is to be advertised as an unstable feature in the `GET /_matrix/client/versions` response, with +the key `org.matrix.msc4140` set to `true`. So, the response could then look as follows: + +```json +{ + "versions": ["..."], + "unstable_features": { + "org.matrix.msc4140": true + } +} +``` + +## Dependencies + +None.