Skip to content

Commit

Permalink
Document bug and how to reproduce it
Browse files Browse the repository at this point in the history
  • Loading branch information
edsko committed Nov 7, 2023
1 parent df8815f commit 9ebee70
Show file tree
Hide file tree
Showing 2 changed files with 58 additions and 1 deletion.
57 changes: 57 additions & 0 deletions STMBUG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
This version of the branch demonstrates the STM bug.

To reproduce, remove the workaround from `Network.GRPC.Util.Concurrency` by defining

```haskell
atomically :: forall a. STM a -> IO a
atomically = STM.atomically
```

and then run with

```
while (cabal run test-grapesy -- -p earlyTermination3 2>log); do date; done
```

You should hit the "blocked indefinitely" very quickly (usually already on the
first run).

Critical pieces of code:

* The "blocked indefinitely" exception is raised by running a call to `recv`
(src/Network/GRPC/Util/Session/Channel.hs).

* This first gets the "thread interface" (call to `getThreadInterface`) which,
when this is first called, will return an `TMVar` containing `FlowStateRegular`
of another `TMVar`. It is the call to `takeTMVar` on that second `TMVar` that
is blocking.

* It is true that the thread that is meant to write to that `TMVar` dies in the
test case. That thread is spawned in src/Network/GRPC/Util/Session/Client.hs,
specifically in `forkRequest`.

* However, the thread body is wrapped with `threadBody`
(src/Network/GRPC/Util/Thread.hs), and then it dies, the final call to
`writeTVar` in `threadBody` will write the new status to the `TVar`
containing that thread status.

* I have confirmed that this write actually happens, _but_ the call to `recv`
(which as one of the first things that it does is _read_ that thread status)
is not retried. _If_ it would be retried (which is basically what my awful
`atomically` hack does), then it would notice that the thread status has
indeed changed and it will throw an appropriate exception ("connection
closed") rather than that "blocked indefinitely" exception.

Related reports:

* https://gitlab.haskell.org/ghc/ghc/-/issues/9401 /
https://github.com/simonmar/async/issues/14
Different to our case: this is about _two_ threads both being deadlocked, and
an attempt to resolve the deadlock by catching `BlockedIndefinitelyOnSTM`.
This is not the case for us (indeed, one of the threads involved runs to
completion).

* https://gitlab.haskell.org/ghc/ghc/-/issues/11001
This is to do with the other reference to the `T(M)Var` being from a
finalizer. Also not the case in our example.

2 changes: 1 addition & 1 deletion src/Network/GRPC/Util/Concurrency.hs
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ wrapThreadBody body = body
-- but that is not our case here.
--
-- This is obviously a huge hack and has got to be of the worst functions I have
-- ever written (and I have written some pretty horrible functions in my past).
-- ever written (and I have written some pretty horrible functions in my time).
--
-- TODO: Fix this properly.
-- TODO: Is 1 guarded attempt enough..? Do we need more..? Unlimited..?
Expand Down

0 comments on commit 9ebee70

Please sign in to comment.