Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove //base threading & scheduling dependency #13

Open
domfarolino opened this issue Aug 23, 2023 · 0 comments
Open

Remove //base threading & scheduling dependency #13

domfarolino opened this issue Aug 23, 2023 · 0 comments
Labels
threading //base threading infra

Comments

@domfarolino
Copy link
Owner

As described in the Mage documentation, specifically in https://github.com/domfarolino/mage#threading--task-scheduling, Mage has a strong dependency on the //base threading & scheduling library which was written with Mage in mind. We have a goal of decoupling this library from //base for this reason:

  • For Mage to be an unopinionated, drop-in IPC utility for any application that wants seamless IPC, it must integrate with applications that utilize arbitrary threading & scheduling providers (that meet a certain baseline API criteria). We don't want to mandate that Mage embedders must use the //base library for all of their scheduling — the integration cost would be too high, and that's generally a bad design goal.

Technically we don't have to remove full reliance on the //base library; if Mage continues to depend on //base for random utilities like CHECK(), that wouldn't prevent the project from being useful and moving forward — it would just be unnecessary. So specifically, the goal is to remove Mage's reliance on the threading & scheduling components of //base (which of course is the primary thing that library is responsible for, but still being precise about this is important).

Decoupling Mage from //base requires making Mage generically thread-safe. Today, Mage is only partially thread-safe. Specifically:

  1. ✅ Some parts are completely thread-safe, like:
  2. 🟠 Some parts are thread-safe only by virtue of the way we use //base; we assert that some methods are only ever called on certain well-named threads, so we don't require locks in some places, because the assertions ensure some data structures are only ever handled on threads that are implicitly safe
    • CreateMessagePipesAndGetEndpoints() is an example of this, where we modify Node::local_endpoints_ without locking access to it, because we only ever add to it on the main thread. Even this might not be fully true actually, but that is the intent, which is simply insufficient once these sorts of operations can truly be called from arbitrary threads (within what the documented Mage API allows for)
  3. ❌ Some parts can today be called on arbitrary threads, but literally aren't protected by any lock or anything. These would need to be fixed even if we kept the //base dependency; these are just bugs plain and simple
    • Node::pending_invitations_ map: insertions are made on the UI thread when an invitation is sent, and removals/look-ups happen on the IO thread when acceptance is received. For a single invitation this is "safe" because receipt of an invitation can only ever follow the sending of an invitation (which includes the insertion which must be finished at that point). But when sending many invitations, some acceptances can be received while others are still being sent. This leads to thread-unsafe mutations of the map
    • Channel's use of write() is not locked based on socket. This means multiple messages being sent in parallel to the same socket (bound for different endpoints, perhaps) can lead to corrupted/interleaving messages

In order to decouple Mage from the threading & scheduling infrastructure that //base provides, we have to fix the 🟠 and ❌ cases — all of them, not just the examples listed above — and at the same time document which methods of the Mage public API are thread-safe, vs. which group of APIs must be called on the same thread (if any of these exist).

@domfarolino domfarolino added the threading //base threading infra label Aug 23, 2023
domfarolino added a commit that referenced this issue Sep 19, 2023
domfarolino added a commit that referenced this issue Sep 19, 2023
domfarolino added a commit that referenced this issue Sep 19, 2023
Makes progress on #13. This PR removes the final usage of `CHECK_ON_THREAD()`
from `Node`, which was in `PrepareToForwardUserMessage()`. This PR:

 1. Removes that usage, and the `thread_checker.h` include
 2. Refactor's the signature of `PrepareToForwardUserMessage()` to make it a
    little more self-documenting
 3. Adds more documentation to that whole method, to better describe its flow
    and how it is used.
domfarolino added a commit that referenced this issue Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
threading //base threading infra
Projects
None yet
Development

No branches or pull requests

1 participant