Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heavily contended lock in TraceIDGenerator.generateID #767

Open
ankon opened this issue Aug 10, 2023 · 5 comments
Open

Heavily contended lock in TraceIDGenerator.generateID #767

ankon opened this issue Aug 10, 2023 · 5 comments
Assignees

Comments

@ankon
Copy link
Contributor

ankon commented Aug 10, 2023

func (tg *TraceIDGenerator) generateID(len int) string {
var bits [maxIDByteLen]byte
tg.Lock()
defer tg.Unlock()
tg.rnd.Read(bits[:len])
return hex.EncodeToString(bits[:len])
}

In a heavily used server a lot of segments end at the same time, and from what see in the profiles this lock is a problem.

  1. Can it be avoided at all? For instance, using some magic "per-thread" generator (as much as that is possible, but possibly just having more than one rand instance might work?)
  2. If it cannot be avoided, can it be made smaller? I see that it uses defer to unlock, which means it unlocks after the conversion of the bits to the hex representation needed in the ID. Maybe it can be explicitly unlocked at least before that?

image

@ankon
Copy link
Contributor Author

ankon commented Oct 22, 2024

FWIW: I tried replacing the defer tg.Unlock() with an explicit unlock right after tg.rnd.Read, but benchmarks seem to imply that at least on my local machine that makes things slower.

What does help (see #977) is avoiding the allocation for the EncodeToString, and doing it in-place.

@iamemilio
Copy link
Contributor

iamemilio commented Oct 22, 2024

I do think that an explicit unlock may improve the async performance of the system as a whole, since the lock will not have to wait for the encoding process, even if it isn't necessarily faster for the performance of this function. According to your pprof output, it does look like a majority of the execution time is being spent on allocating and garbage collecting memory. We will take a look at your PR, I just enabled the tests and left a comment.

@ankon
Copy link
Contributor Author

ankon commented Oct 23, 2024

The fun one is: I benchmarked moving the unlock ... and it makes things slower.

goos: linux
goarch: amd64
pkg: github.com/newrelic/go-agent/v3/internal
cpu: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
BenchmarkTraceIDGenerator
BenchmarkTraceIDGenerator-16                    	25461540	        44.09 ns/op	      32 B/op	       2 allocs/op
BenchmarkTraceIDGeneratorParallel
BenchmarkTraceIDGeneratorParallel-16            	10721215	       107.5 ns/op	      32 B/op	       2 allocs/op
BenchmarkTraceIDGeneratorBaseline
BenchmarkTraceIDGeneratorBaseline-16            	24150187	        44.59 ns/op	      32 B/op	       2 allocs/op
BenchmarkTraceIDGeneratorBaselineParallel
BenchmarkTraceIDGeneratorBaselineParallel-16    	15261769	        74.36 ns/op	      32 B/op	       2 allocs/op

(baseline is simply the original generator, and non-baseline is the generator with the defer replaced with the explicit unlock)

@iamemilio
Copy link
Contributor

iamemilio commented Oct 23, 2024

The fun one is: I benchmarked moving the unlock ... and it makes things slower.

🤯 How weird..

Anyway, the proposed changes will be in the next release. Thanks for taking the time. I am going to copy the merge comment from the pr here:

I think what you have proposed is a good incremental improvement, and I'm happy to merge it. There are a few oddities about the design and implementation of this object IMO, and I think you're right about the seeding. If anything, seeding based on timestamps may actually be decreasing the randomness. Luckily, the odds of a duplicate are incredibly low. I don't see the point of tests with a specific seed either, that doesn't really verify much of the expected/intended behavior.

If you wanted to keep playing with this, I don't mind reviewing another PR. Otherwise, we can create a risk item to address this. I think a buffered channel of pre-generated Transaction_IDs, and another for Span_IDs may do the trick. Creating a pool of generators may increase the risk of duplication depending on how they are seeded, and I am not sure the possible performance upside would be worth the added complexity of mitigating that.

@ankon
Copy link
Contributor Author

ankon commented Oct 24, 2024

🤯 How weird..

Yup. In go 1.14 defers got pretty fast (see https://go.googlesource.com/go/+/be64a19d99918c843f8555aad580221207ea35bc; this is an open-coded defer indeed), but that doesn't explain the "why is it faster" question.

Playing in compiler explorer (https://godbolt.org/z/4W4a17jdn) shows some differences, but my low-level go-foo isn't good enough to understand it. Benchmarks to the rescue :)

I think a buffered channel of pre-generated Transaction_IDs, and another for Span_IDs may do the trick. Creating a pool of generators may increase the risk of duplication depending on how they are seeded, and I am not sure the possible performance upside would be worth the added complexity of mitigating that.

I might play with that eventually, for now I'm looking forward to seeing how much this improves our situation in the real world. Iterating, piece by piece.

@iamemilio iamemilio self-assigned this Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants