Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic from http client #4377

Open
gmichelo opened this issue Oct 25, 2023 · 5 comments
Open

Panic from http client #4377

gmichelo opened this issue Oct 25, 2023 · 5 comments

Comments

@gmichelo
Copy link

Hi! We've observed the following panic in our production use of buildkitd.

panic: runtime error: invalid memory address or nil pointer dereference 
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xcb976e] 

goroutine 5545034 [running]: 
 go.opentelemetry.io/contrib/instrumentation/net/http/httptrace/otelhttptrace.(*clientTracer).end(0xc005f17800, {0x19c0a25, 0xc}, {0x0?, 0x0?}, {0xc0037e4100?, 0x4, 0x4}) 
 	/src/vendor/go.opentelemetry.io/contrib/instrumentation/net/http/httptrace/otelhttptrace/clienttrace.go:231 +0x76e 
 go.opentelemetry.io/contrib/instrumentation/net/http/httptrace/otelhttptrace.(*clientTracer).gotConn(0x1c26d08?, {{0x1c2e778?, 0xc002b58000?}, 0xb8?, 0x89?, 0x1?}) 
 	/src/vendor/go.opentelemetry.io/contrib/instrumentation/net/http/httptrace/otelhttptrace/clienttrace.go:288 +0x645 
 net/http.http2traceGotConn(0xc007fda480?, 0xc008f01680, 0x0) 
	/usr/local/go/src/net/http/h2_bundle.go:10158 +0x1ee 
 net/http.(*http2Transport).RoundTripOpt(0xc000bdd560, 0xc00b62fe00, {0x80?}) 
 	/usr/local/go/src/net/http/h2_bundle.go:7584 +0x1ac 
 net/http.(*http2Transport).RoundTrip(...) 
 	/usr/local/go/src/net/http/h2_bundle.go:7537 
 net/http.http2noDialH2RoundTripper.RoundTrip({0x274ed60?}, 0xc00b62fe00?) 
 	/usr/local/go/src/net/http/h2_bundle.go:10122 +0x1b 
 net/http.(*Transport).roundTrip(0x274ed60, 0xc00b62fe00) 
 	/usr/local/go/src/net/http/transport.go:548 +0x3ca 
 net/http.(*Transport).RoundTrip(0xc008720300?, 0x1c26d08?) 
 	/usr/local/go/src/net/http/roundtrip.go:17 +0x19 
 go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*Transport).RoundTrip(0xc0001d4700, 0xc00b62fa00) 
 	/src/vendor/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp/transport.go:116 +0x5e2 
 github.com/moby/buildkit/source/http.(*sessionHandler).RoundTrip(0x0?, 0x1c12080?) 
 	/src/source/http/transport.go:25 +0x54 
 net/http.send(0xc00b62fa00, {0x1c12080, 0xc00f0abe30}, {0x10?, 0x1978020?, 0x0?}) 
 	/usr/local/go/src/net/http/client.go:252 +0x5f7 
 net/http.(*Client).send(0xc00f0abe60, 0xc00b62fa00, {0x7fe48b91f158?, 0xc00301c2d0?, 0x0?}) 
 	/usr/local/go/src/net/http/client.go:176 +0x9b 
 net/http.(*Client).do(0xc00f0abe60, 0xc00b62fa00) 
 	/usr/local/go/src/net/http/client.go:716 +0x8fb 
 net/http.(*Client).Do(...) 
 	/usr/local/go/src/net/http/client.go:582 
 github.com/moby/buildkit/source/http.(*httpSourceHandler).CacheKey(0xc004ed1c80, {0x1c26d08?, 0xc00f0aba10}, {0x1c11d00?, 0xc00b22cd80}, 0xc00f0abce0?) 
 	/src/source/http/httpsource.go:214 +0x91d 
 github.com/moby/buildkit/solver/llbsolver/ops.(*SourceOp).CacheMap(0xc004ed1c00, {0x1c26d08, 0xc00f0aba10}, {0x1c11d00, 0xc00b22cd80}, 0x1?) 
 	/src/solver/llbsolver/ops/source.go:82 +0x97 
 github.com/moby/buildkit/solver.(*sharedOp).CacheMap.func2({0x1c26c60, 0xc00ba49a40}) 
 	/src/solver/jobs.go:832 +0x533 
 github.com/moby/buildkit/util/flightcontrol.(*call[...]).run(0xc00cf234d8) 
 	/src/util/flightcontrol/flightcontrol.go:121 +0x62 
 sync.(*Once).doSlow(0x4432c5?, 0xc008446fd0?) 
 	/usr/local/go/src/sync/once.go:74 +0xc2 
 sync.(*Once).Do(0x1c12220?, 0xc00f0aacf0?) 
 	/usr/local/go/src/sync/once.go:65 +0x1f 
 created by github.com/moby/buildkit/util/flightcontrol.(*call[...]).wait 
 	/src/util/flightcontrol/flightcontrol.go:165 +0x4d1
@tonistiigi
Copy link
Member

@gmichelo Any specific OpenTelemetry config that you use?

@MrAlias @Aneurysm9 Any ideas what might be going on here? The panic comes from https://github.com/open-telemetry/opentelemetry-go-contrib/blob/v1.20.0/instrumentation/net/http/httptrace/otelhttptrace/clienttrace.go#L231 so only possible case should be that ct.root is nil. https://github.com/open-telemetry/opentelemetry-go-contrib/blob/v1.20.0/instrumentation/net/http/httptrace/otelhttptrace/clienttrace.go#L201 should guarantee that it can't be nil . So only possible case I see is that GetConn did not get called before GotConn although stdlib seems to guarantee it https://cs.opensource.google/go/go/+/refs/tags/go1.21.3:src/net/http/httptrace/trace.go;l=80 . If you think this is a otel problem I can create an issue there for further discussion.

@gmichelo
Copy link
Author

gmichelo commented Nov 1, 2023

Hey @tonistiigi. No, we don't set OpenTelemetry config.

@tonistiigi
Copy link
Member

Marking this as "needs investigation". Atm I don't see any possible fix than adding a nil check in the otel library (although why the nil value appears there isn't completely clear). The only thing we could do from buildkit side is add a recover() block to handle panic.

@koivunej
Copy link

Still hitting this today while trying to push to ECR from github actions. Is it possible to avoid it by setting some bogus opentelemetry configuration?

@tonistiigi
Copy link
Member

Still hitting this today while trying to push to ECR from github actions. Is it possible to avoid it by setting some bogus opentelemetry configuration?

I don't think so. The error is not in opentelemetry reporting but the wrappers around the connections.

If you have some way to reproduce, it might be a way to get open-telemetry/opentelemetry-go-contrib#5187 moving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants