Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLOSE_WAIT #590

Closed
kazu-yamamoto opened this issue Nov 17, 2024 · 15 comments
Closed

CLOSE_WAIT #590

kazu-yamamoto opened this issue Nov 17, 2024 · 15 comments
Assignees

Comments

@kazu-yamamoto
Copy link
Collaborator

#589 changed gracefullClose as follows:

  1. With the runtime which provides the IO manager, STM race for packet arrival and timeout is introduced.
  2. Otherwise, timeout is used.

I'm testing the approach 1) on my server. Unfortunately, CLOSE_WAIT happened again.
Since my server did not accept new connections, I'm sure that CLOSE_WAIT represents SYN packets in the listen queue.
This means that IO manager is blocked.

I need to understand why writeTVar is blocked.

@kazu-yamamoto
Copy link
Collaborator Author

Cc: @nh2, @khibino

@kazu-yamamoto kazu-yamamoto self-assigned this Nov 17, 2024
@kazu-yamamoto
Copy link
Collaborator Author

One possibility is that we should not acquire two resources in one bracket.

recvEOFevent :: Socket -> Int -> Ptr Word8 -> IO ()
recvEOFevent s tmout0 buf = do
    tmmgr <- Ev.getSystemTimerManager
    tvar <- newTVarIO False
    E.bracket (setup tmmgr tvar) teardown $ \(wait, _) -> do
        waitRes <- wait
        case waitRes of
          TimeoutTripped -> return ()
          -- We don't check the (positive) length.
          -- In normal case, it's 0. That is, only FIN is received.
          -- In error cases, data is available. But there is no
          -- application which can read it. So, let's stop receiving
          -- to prevent attacks.
          MoreData       -> void $ recvBufNoWait s buf bufSize
  where
    setup tmmgr tvar = do
        -- millisecond to microsecond
        key <- Ev.registerTimeout tmmgr (tmout0 * 1000) $
            atomically $ writeTVar tvar True
        (evWait, evCancel) <- waitAndCancelReadSocketSTM s
        let toWait = do
                tmout <- readTVar tvar
                check tmout
            toCancel = Ev.unregisterTimeout tmmgr key
            wait = atomically ((toWait >> return TimeoutTripped)
                           <|> (evWait >> return MoreData))
            cancel = evCancel >> toCancel
        return (wait, cancel)
    teardown (_, cancel) = cancel

Nested bracket should be used:

recvEOFevent :: Socket -> Int -> Ptr Word8 -> IO ()
recvEOFevent s tmout0 buf = do
    tmmgr <- Ev.getSystemTimerManager
    tvar <- newTVarIO False
    E.bracket (setupTimeout tmmgr tvar) (cancelTimeout tmmgr) $ \_ -> do
        E.bracket setupRead cancelRead $ \(evWait,_) -> do
            let toWait = do
                    tmout <- readTVar tvar
                    check tmout
                wait = atomically ((toWait >> return TimeoutTripped)
                               <|> (evWait >> return MoreData))
            waitRes <- wait
            case waitRes of
              TimeoutTripped -> return ()
              -- We don't check the (positive) length.
              -- In normal case, it's 0. That is, only FIN is received.
              -- In error cases, data is available. But there is no
              -- application which can read it. So, let's stop receiving
              -- to prevent attacks.
              MoreData       -> void $ recvBufNoWait s buf bufSize
  where
    -- millisecond to microsecond
    setupTimeout tmmgr tvar =
        Ev.registerTimeout tmmgr (tmout0 * 1000) $
            atomically $ writeTVar tvar True
    cancelTimeout = Ev.unregisterTimeout
    setupRead = waitAndCancelReadSocketSTM s
    cancelRead (_,cancel) = cancel

@kazu-yamamoto
Copy link
Collaborator Author

Page 162 of Parallel and Concurrent Programming in Haskell says:

The IO actions passed in as before and after are performed inside mask. The bracket
function guarantees that if before returns, after will be executed in the future. It is
normal for before to contain a blocking operation; if an exception is raised while before
is blocked, then no harm is done. But before should perform only one blocking operation.
An exception raised by a second blocking operation would not result in after
being executed. If you need to perform two blocking operations, the right way is to nest
calls to bracket, as we did with modifyMVar.

@nh2
Copy link
Member

nh2 commented Nov 18, 2024

Parallel and Concurrent Programming in Haskell says

@kazu-yamamoto Does it say that somewhere in the docs as well? If not, it really should, because book-driven documentation is very bad.

@nh2
Copy link
Member

nh2 commented Nov 18, 2024

@kazu-yamamoto I think it would also be good if we can create a test for this CLOSE_WAIT situation you managed to reproduce.

@kazu-yamamoto
Copy link
Collaborator Author

@kazu-yamamoto Does it say that somewhere in the docs as well? If not, it really should, because book-driven documentation is very bad.

I don't think so.
I will bring this to ghc-devs.

@kazu-yamamoto
Copy link
Collaborator Author

kazu-yamamoto commented Nov 18, 2024

The same thing happens to the nested version:

2 IOManager on cap 0: ThreadBlocked BlockedOnForeignCall
3 IOManager on cap 1: ThreadBlocked BlockedOnMVar
4 TimerManager: ThreadBlocked BlockedOnForeignCall

In the normal situation, IOManager should be in the ThreadBlocked BlockedOnForeignCall state since it's calling epoll_wait().
In this case, IOManager is in the ThreadBlocked BlockedOnMVar state while we don't use MVar at all.

@kazu-yamamoto
Copy link
Collaborator Author

The normal recv (not recvBufNoWait) is essentially takeMVar.
IO manager calls putMVar when packets arrive.
I imagine that IO manager calls putMVar more than once.

@kazu-yamamoto
Copy link
Collaborator Author

For the record, listThreads shows:

2 IOManager on cap 0: ThreadBlocked BlockedOnForeignCall
3 IOManager on cap 1: ThreadBlocked BlockedOnMVar
4 TimerManager: ThreadBlocked BlockedOnForeignCall
5 Mighty main: ThreadBlocked BlockedOnMVar
6 TLS ticket manager: ThreadBlocked BlockedOnMVar
7 Date string cacher of FastLogger (AutoUpdate): ThreadBlocked BlockedOnMVar
8 Loggerset of FastLogger (Debounce): ThreadBlocked BlockedOnMVar
9 (no name): ThreadBlocked BlockedOnSTM
10 Mighty main (bound thread): ThreadBlocked BlockedOnMVar
11 HTTP3 server: ThreadBlocked BlockedOnSTM
12 concurrently: ThreadBlocked BlockedOnMVar
13 HTTP1/2 server: ThreadBlocked BlockedOnMVar
14 HTTP1/2 over TLS server: ThreadBlocked BlockedOnMVar
15 Date cacher (AutoUpdate): ThreadBlocked BlockedOnMVar
16 Date cacher (AutoUpdate): ThreadBlocked BlockedOnMVar
17 Date cacher (AutoUpdate): ThreadBlocked BlockedOnMVar
18 QUIC token manager: ThreadBlocked BlockedOnMVar
19 QUIC dispatcher: ThreadBlocked BlockedOnMVar
20 QUIC dispatcher: ThreadBlocked BlockedOnSTM
24 WAI timeout manager (Reaper): ThreadBlocked BlockedOnMVar
3462 Warp HTTP/1.1 [::ffff:81.161.238.88]:53801: ThreadBlocked BlockedOnMVar
3464 Warp HTTP/1.1 [::ffff:81.161.238.88]:53803: ThreadBlocked BlockedOnMVar
3467 Warp HTTP/1.1 [::ffff:81.161.238.88]:53831: ThreadBlocked BlockedOnMVar
3570 Warp HTTP/1.1 [::ffff:81.161.238.88]:56968: ThreadBlocked BlockedOnMVar
3575 Warp HTTP/1.1 [::ffff:81.161.238.88]:57059: ThreadBlocked BlockedOnMVar
3578 Warp HTTP/1.1 [::ffff:81.161.238.88]:57134: ThreadBlocked BlockedOnMVar
3581 Warp HTTP/1.1 [::ffff:81.161.238.88]:57176: ThreadBlocked BlockedOnMVar
3583 Warp HTTP/1.1 [::ffff:81.161.238.88]:57226: ThreadBlocked BlockedOnMVar
3586 Warp HTTP/1.1 [::ffff:81.161.238.88]:57318: ThreadBlocked BlockedOnMVar
3588 Warp HTTP/1.1 [::ffff:81.161.238.88]:57320: ThreadBlocked BlockedOnMVar
3591 Warp HTTP/1.1 [::ffff:81.161.238.88]:57399: ThreadBlocked BlockedOnMVar
3593 Warp HTTP/1.1 [::ffff:81.161.238.88]:57407: ThreadBlocked BlockedOnMVar
3596 Warp HTTP/1.1 [::ffff:81.161.238.88]:57482: ThreadBlocked BlockedOnMVar
3598 Warp HTTP/1.1 [::ffff:81.161.238.88]:57484: ThreadBlocked BlockedOnMVar
3629 Warp HTTP/1.1 [::ffff:81.161.238.88]:58790: ThreadBlocked BlockedOnMVar
3631 Warp HTTP/1.1 [::ffff:81.161.238.88]:58794: ThreadBlocked BlockedOnMVar
3633 Warp HTTP/1.1 [::ffff:81.161.238.88]:58904: ThreadBlocked BlockedOnMVar
3635 Warp HTTP/1.1 [::ffff:81.161.238.88]:58907: ThreadBlocked BlockedOnMVar
3637 Warp HTTP/1.1 [::ffff:81.161.238.88]:58973: ThreadBlocked BlockedOnMVar
3639 Warp HTTP/1.1 [::ffff:81.161.238.88]:58978: ThreadBlocked BlockedOnMVar
3641 Warp HTTP/1.1 [::ffff:81.161.238.88]:59033: ThreadBlocked BlockedOnMVar
3643 Warp HTTP/1.1 [::ffff:81.161.238.88]:59044: ThreadBlocked BlockedOnMVar
3645 Warp HTTP/1.1 [::ffff:81.161.238.88]:59091: ThreadBlocked BlockedOnMVar
3647 Warp HTTP/1.1 [::ffff:81.161.238.88]:59110: ThreadBlocked BlockedOnMVar
3649 Warp HTTP/1.1 [::ffff:81.161.238.88]:59122: ThreadBlocked BlockedOnMVar
3651 Warp HTTP/1.1 [::ffff:81.161.238.88]:59183: ThreadBlocked BlockedOnMVar
3654 Warp HTTP/1.1 [::ffff:81.161.238.88]:59245: ThreadBlocked BlockedOnMVar
3656 Warp HTTP/1.1 [::ffff:81.161.238.88]:59246: ThreadBlocked BlockedOnMVar
3659 Warp HTTP/1.1 [::ffff:81.161.238.88]:59280: ThreadBlocked BlockedOnMVar
3661 Warp HTTP/1.1 [::ffff:81.161.238.88]:59310: ThreadBlocked BlockedOnMVar
3663 Warp HTTP/1.1 [::ffff:81.161.238.88]:59319: ThreadBlocked BlockedOnMVar
3666 Warp HTTP/1.1 [::ffff:81.161.238.88]:59371: ThreadBlocked BlockedOnMVar
3668 Warp HTTP/1.1 [::ffff:81.161.238.88]:59379: ThreadBlocked BlockedOnMVar
3671 Warp HTTP/1.1 [::ffff:81.161.238.88]:59421: ThreadBlocked BlockedOnMVar
3673 Warp HTTP/1.1 [::ffff:81.161.238.88]:59435: ThreadBlocked BlockedOnMVar
3675 Warp HTTP/1.1 [::ffff:81.161.238.88]:59461: ThreadBlocked BlockedOnMVar
3677 Warp HTTP/1.1 [::ffff:81.161.238.88]:59498: ThreadBlocked BlockedOnMVar
3679 Warp HTTP/1.1 [::ffff:81.161.238.88]:59518: ThreadBlocked BlockedOnMVar
3685 Warp HTTP/1.1 [::ffff:81.161.238.88]:59626: ThreadBlocked BlockedOnMVar
3698 Warp HTTP/1.1 [::ffff:81.161.238.88]:59833: ThreadBlocked BlockedOnMVar
3709 Warp HTTP/1.1 [::ffff:81.161.238.88]:60029: ThreadBlocked BlockedOnMVar
3712 Warp HTTP/1.1 [::ffff:81.161.238.88]:60065: ThreadBlocked BlockedOnMVar
3715 Warp HTTP/1.1 [::ffff:81.161.238.88]:60115: ThreadBlocked BlockedOnMVar
3726 Warp HTTP/1.1 [::ffff:81.161.238.88]:60876: ThreadBlocked BlockedOnMVar
3728 Warp HTTP/1.1 [::ffff:81.161.238.88]:60892: ThreadBlocked BlockedOnMVar
3730 Warp HTTP/1.1 [::ffff:81.161.238.88]:60920: ThreadBlocked BlockedOnMVar
3731 Warp HTTP/1.1 [::ffff:81.161.238.88]:60921: ThreadBlocked BlockedOnMVar
3734 Warp HTTP/1.1 [::ffff:81.161.238.88]:60948: ThreadBlocked BlockedOnMVar
3736 Warp HTTP/1.1 [::ffff:81.161.238.88]:60949: ThreadBlocked BlockedOnMVar
3744 Warp HTTP/1.1 [::ffff:81.161.238.88]:61028: ThreadBlocked BlockedOnMVar
3746 Warp HTTP/1.1 [::ffff:81.161.238.88]:61080: ThreadBlocked BlockedOnMVar
3750 Warp HTTP/1.1 [::ffff:81.161.238.88]:61118: ThreadBlocked BlockedOnMVar
3752 Warp HTTP/1.1 [::ffff:81.161.238.88]:61137: ThreadBlocked BlockedOnMVar
3754 Warp HTTP/1.1 [::ffff:81.161.238.88]:61183: ThreadBlocked BlockedOnMVar
3756 Warp HTTP/1.1 [::ffff:81.161.238.88]:61201: ThreadBlocked BlockedOnMVar
3758 Warp HTTP/1.1 [::ffff:81.161.238.88]:61205: ThreadBlocked BlockedOnMVar
3762 Warp HTTP/1.1 [::ffff:81.161.238.88]:61292: ThreadBlocked BlockedOnMVar
3764 Warp HTTP/1.1 [::ffff:81.161.238.88]:61295: ThreadBlocked BlockedOnMVar
3768 Warp HTTP/1.1 [::ffff:81.161.238.88]:61350: ThreadBlocked BlockedOnMVar
3770 Warp HTTP/1.1 [::ffff:81.161.238.88]:61352: ThreadBlocked BlockedOnMVar
3773 Warp HTTP/1.1 [::ffff:81.161.238.88]:61383: ThreadBlocked BlockedOnMVar
3775 Warp HTTP/1.1 [::ffff:81.161.238.88]:61417: ThreadBlocked BlockedOnMVar
3777 Warp HTTP/1.1 [::ffff:81.161.238.88]:61431: ThreadBlocked BlockedOnMVar
3782 Warp HTTP/1.1 [::ffff:81.161.238.88]:61464: ThreadBlocked BlockedOnMVar
3784 Warp HTTP/1.1 [::ffff:81.161.238.88]:61492: ThreadBlocked BlockedOnMVar
3788 Warp HTTP/1.1 [::ffff:81.161.238.88]:61533: ThreadBlocked BlockedOnMVar
3794 Warp HTTP/1.1 [::ffff:81.161.238.88]:61664: ThreadBlocked BlockedOnMVar
3797 Warp HTTP/1.1 [::ffff:81.161.238.88]:61724: ThreadBlocked BlockedOnMVar
3815 Warp HTTP/1.1 [::ffff:81.161.238.88]:62657: ThreadBlocked BlockedOnMVar
3817 Warp HTTP/1.1 [::ffff:81.161.238.88]:62778: ThreadBlocked BlockedOnMVar
3819 Warp HTTP/1.1 [::ffff:81.161.238.88]:62783: ThreadBlocked BlockedOnMVar
3821 Warp HTTP/1.1 [::ffff:81.161.238.88]:62821: ThreadBlocked BlockedOnMVar
3823 Warp HTTP/1.1 [::ffff:81.161.238.88]:62826: ThreadBlocked BlockedOnMVar
3826 Warp HTTP/1.1 [::ffff:81.161.238.88]:62859: ThreadBlocked BlockedOnMVar
3828 Warp HTTP/1.1 [::ffff:81.161.238.88]:62890: ThreadBlocked BlockedOnMVar
3832 Warp HTTP/1.1 [::ffff:81.161.238.88]:62929: ThreadBlocked BlockedOnMVar
3836 Warp HTTP/1.1 [::ffff:81.161.238.88]:62937: ThreadBlocked BlockedOnMVar
3839 Warp HTTP/1.1 [::ffff:81.161.238.88]:62969: ThreadBlocked BlockedOnMVar
3841 Warp HTTP/1.1 [::ffff:81.161.238.88]:62970: ThreadBlocked BlockedOnMVar
3849 Warp HTTP/1.1 [::ffff:81.161.238.88]:63010: ThreadBlocked BlockedOnMVar
3853 Warp HTTP/1.1 [::ffff:81.161.238.88]:63051: ThreadBlocked BlockedOnMVar
3857 Warp HTTP/1.1 [::ffff:81.161.238.88]:63073: ThreadBlocked BlockedOnMVar
3859 Warp HTTP/1.1 [::ffff:81.161.238.88]:63086: ThreadBlocked BlockedOnMVar
3861 Warp HTTP/1.1 [::ffff:81.161.238.88]:63089: ThreadBlocked BlockedOnMVar
3863 Warp HTTP/1.1 [::ffff:81.161.238.88]:63114: ThreadBlocked BlockedOnMVar
3865 Warp HTTP/1.1 [::ffff:81.161.238.88]:63164: ThreadBlocked BlockedOnMVar
3889 Warp HTTP/1.1 [::ffff:81.161.238.88]:64508: ThreadBlocked BlockedOnMVar
3892 Warp HTTP/1.1 [::ffff:81.161.238.88]:64545: ThreadBlocked BlockedOnMVar
3894 Warp HTTP/1.1 [::ffff:81.161.238.88]:64591: ThreadBlocked BlockedOnMVar
4434 Warp HTTP/1.1 [::ffff:81.161.238.88]:61757: ThreadBlocked BlockedOnMVar
4536 Warp HTTP/1.1 [::ffff:81.161.238.88]:64395: ThreadBlocked BlockedOnMVar
4573 Warp HTTP/1.1 [::ffff:81.161.238.88]:49464: ThreadBlocked BlockedOnMVar
4630 Warp HTTP/1.1 [::ffff:81.161.238.88]:52726: ThreadBlocked BlockedOnMVar
4633 Warp HTTP/1.1 [::ffff:81.161.238.88]:52783: ThreadBlocked BlockedOnMVar
4635 Warp HTTP/1.1 [::ffff:81.161.238.88]:52810: ThreadBlocked BlockedOnMVar
4657 Warp HTTP/1.1 [::ffff:81.161.238.88]:54468: ThreadBlocked BlockedOnMVar
6394 Warp HTTP/1.1 [::ffff:81.161.238.88]:55015: ThreadBlocked BlockedOnMVar
6397 Warp HTTP/1.1 [::ffff:81.161.238.88]:55045: ThreadBlocked BlockedOnMVar
6400 Warp HTTP/1.1 [::ffff:81.161.238.88]:55087: ThreadBlocked BlockedOnMVar
6420 Warp HTTP/1.1 [::ffff:81.161.238.88]:56432: ThreadBlocked BlockedOnMVar
6423 Warp HTTP/1.1 [::ffff:81.161.238.88]:56505: ThreadBlocked BlockedOnMVar
6426 Warp HTTP/1.1 [::ffff:81.161.238.88]:56543: ThreadBlocked BlockedOnMVar
6428 Warp HTTP/1.1 [::ffff:81.161.238.88]:56579: ThreadBlocked BlockedOnMVar
6432 Warp HTTP/1.1 [::ffff:81.161.238.88]:56617: ThreadBlocked BlockedOnMVar
6439 Warp HTTP/1.1 [::ffff:66.249.77.107]:59727: ThreadBlocked BlockedOnMVar
6458 Warp HTTP/1.1 [::ffff:81.161.238.88]:57797: ThreadBlocked BlockedOnMVar
6461 Warp HTTP/1.1 [::ffff:81.161.238.88]:57856: ThreadBlocked BlockedOnMVar
6463 Warp HTTP/1.1 [::ffff:81.161.238.88]:57865: ThreadBlocked BlockedOnMVar
6466 Warp HTTP/1.1 [::ffff:81.161.238.88]:57892: ThreadBlocked BlockedOnMVar
6469 Warp HTTP/1.1 [::ffff:81.161.238.88]:57938: ThreadBlocked BlockedOnMVar
6471 Warp HTTP/1.1 [::ffff:81.161.238.88]:57949: ThreadBlocked BlockedOnMVar
6575 Info signale handler: ThreadRunning

@kazu-yamamoto
Copy link
Collaborator Author

kazu-yamamoto commented Nov 18, 2024

Some of threads labeled with ::ffff:81.161.238.88 is in the CLOSE_WAIT state.
So, their connections are established but close (in gracefulClose) is not called.

The server is not responding.
So, some packets in CLOSE_WAIT are in the listen queue.

The following thread is leaked:

24 WAI timeout manager (Reaper): ThreadBlocked BlockedOnMVar

EDIT: thread 24 is not leaking. It's working since all connections are not terminated yet.

@kazu-yamamoto
Copy link
Collaborator Author

@kazu-yamamoto I think it would also be good if we can create a test for this CLOSE_WAIT situation you managed to reproduce.

I cannot reproduce this by myself so far, sigh.

@kazu-yamamoto
Copy link
Collaborator Author

It appeared that gracefullClose is called twice at the same time.
yesodweb/wai#1013

@kazu-yamamoto
Copy link
Collaborator Author

Registered a documentation issue to GHC:

https://gitlab.haskell.org/ghc/ghc/-/issues/25506

@kazu-yamamoto
Copy link
Collaborator Author

After preventing calling gracefulClose twice, I don't see CLOSE_WAIT.
I'm going to merge the nested-bracket branch.

@kazu-yamamoto
Copy link
Collaborator Author

Closing.

kazu-yamamoto added a commit to kazu-yamamoto/network that referenced this issue Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants