Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to close writer: context canceled #474

Open
lukaspj opened this issue Aug 30, 2024 · 4 comments
Open

failed to close writer: context canceled #474

lukaspj opened this issue Aug 30, 2024 · 4 comments
Labels

Comments

@lukaspj
Copy link

lukaspj commented Aug 30, 2024

I'm using this library to create a zero-trust tunnel into a Kubernetes cluster, when I encountered this error message:

failed to write msg: failed to close writer: failed to acquire lock: context canceled

So what happened was that when client closed the connection, I would send a "connection closed" message over the WebSocket connection, but I (mistakenly) used the request context for the write command.
This context would get cancelled somewhere around this Write call, sometimes after message was written but before it released the lock.

As a result, the lock would never get released and the websocket was no longer usable.

The solution was to create a new context for the write:

	writeCtx, _ := context.WithTimeout(context.Background(), 2*time.Second)

	err = c.Websocket.Write(writeCtx, websocket.MessageText, marshal)

And I don't think I should have used the request context to begin with, but I do think it's not optimal behaviour that the websocket can be left in a broken state by mistimed context cancel.

In any case, just thought I'd share it. Thanks for maintaining this awesome library!

@mafredri
Copy link
Member

mafredri commented Aug 30, 2024

Hey @lukaspj, thanks for the user experience report.

You are definitely not the first! This is a very common mistake to make. There is one valid use-case for r.Context() after the connection being hijacked1, but in general it's almost never what you want.

I see this issue is further exacerbated by the readme and examples using it incorrectly (https://github.com/coder/websocket?tab=readme-ov-file#server).

We should at least update the readmes and examples, but I'll try to think about some alternative solutions as well.

Perhaps the most obvious way would be to put a context on websocket.Conn, but defining it's behavior would be a bit tricky.

Anyway, if you have anything else to add, how you wish this had worked instead, what lead you down the wrong path, etc, please feel free to share.

1 A valid use-case is when you're running the websocket logic in a separate goroutine, and want it to stop when the HTTP handler exits. Usually the logic runs in the HTTP handler, though, where using r.Context() is useless. (https://pkg.go.dev/net/http#Hijacker)

@lukaspj
Copy link
Author

lukaspj commented Aug 30, 2024

I think you pin-pointed it, if the readme and examples explained it, it would probably help.

Additionally, it would be nice if the function documentation would state this danger of closing the context. Something like "if the context is cancelled, it can cause a deadlock" or just "the context should be the context of the websocket not the context of the message being written". Idk if there's a nice way to write it tbh, but would be nice if it showed up on the documentation somewhere on "how to use this library".

@spikecurtis
Copy link

@mafredri I don't think a docs update alone will resolve this issue.

@lukaspj seems to be saying that if the context you pass to Write() gets canceled, it can leave the Websocket in a broken state (a lock that never gets released). While it's true that you probably don't want to use the r.Context() for calls to Write() in most cases, I would still consider it a bug if an ill-timed context timeout leaves the websocket in a broken state.

@mafredri
Copy link
Member

mafredri commented Sep 9, 2024

@spikecurtis I was thinking about the original report, but that's a good callout. We should split the second issue into a new ticket.

There may be multiple issues in how connection closure is propagated and context cancellation is handled. There seems to be multiple code-paths depending on scenario.

I also noticed while looking at #476 that a read/write context that is cancelled actually ends up closing the connection abruptly, whereas I would expect we just give up on the write.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants