Heavily contended lock in TraceIDGenerator.generateID #767

ankon · 2023-08-10T11:07:24Z

go-agent/v3/internal/trace_id_generator.go

Lines 52 to 58 in 046b4fc

    
           func (tg *TraceIDGenerator) generateID(len int) string { 
        
           	var bits [maxIDByteLen]byte 
        
           	tg.Lock() 
        
           	defer tg.Unlock() 
        
           	tg.rnd.Read(bits[:len]) 
        
           	return hex.EncodeToString(bits[:len]) 
        
           }

In a heavily used server a lot of segments end at the same time, and from what see in the profiles this lock is a problem.

Can it be avoided at all? For instance, using some magic "per-thread" generator (as much as that is possible, but possibly just having more than one rand instance might work?)
If it cannot be avoided, can it be made smaller? I see that it uses defer to unlock, which means it unlocks after the conversion of the bits to the hex representation needed in the ID. Maybe it can be explicitly unlocked at least before that?

The text was updated successfully, but these errors were encountered:

ankon · 2024-10-22T08:28:33Z

FWIW: I tried replacing the defer tg.Unlock() with an explicit unlock right after tg.rnd.Read, but benchmarks seem to imply that at least on my local machine that makes things slower.

What does help (see #977) is avoiding the allocation for the EncodeToString, and doing it in-place.

iamemilio · 2024-10-22T13:39:24Z

I do think that an explicit unlock may improve the async performance of the system as a whole, since the lock will not have to wait for the encoding process, even if it isn't necessarily faster for the performance of this function. According to your pprof output, it does look like a majority of the execution time is being spent on allocating and garbage collecting memory. We will take a look at your PR, I just enabled the tests and left a comment.

ankon · 2024-10-23T08:45:38Z

The fun one is: I benchmarked moving the unlock ... and it makes things slower.

goos: linux
goarch: amd64
pkg: github.com/newrelic/go-agent/v3/internal
cpu: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
BenchmarkTraceIDGenerator
BenchmarkTraceIDGenerator-16                    	25461540	        44.09 ns/op	      32 B/op	       2 allocs/op
BenchmarkTraceIDGeneratorParallel
BenchmarkTraceIDGeneratorParallel-16            	10721215	       107.5 ns/op	      32 B/op	       2 allocs/op
BenchmarkTraceIDGeneratorBaseline
BenchmarkTraceIDGeneratorBaseline-16            	24150187	        44.59 ns/op	      32 B/op	       2 allocs/op
BenchmarkTraceIDGeneratorBaselineParallel
BenchmarkTraceIDGeneratorBaselineParallel-16    	15261769	        74.36 ns/op	      32 B/op	       2 allocs/op

(baseline is simply the original generator, and non-baseline is the generator with the defer replaced with the explicit unlock)

iamemilio · 2024-10-23T20:00:10Z

The fun one is: I benchmarked moving the unlock ... and it makes things slower.

🤯 How weird..

Anyway, the proposed changes will be in the next release. Thanks for taking the time. I am going to copy the merge comment from the pr here:

I think what you have proposed is a good incremental improvement, and I'm happy to merge it. There are a few oddities about the design and implementation of this object IMO, and I think you're right about the seeding. If anything, seeding based on timestamps may actually be decreasing the randomness. Luckily, the odds of a duplicate are incredibly low. I don't see the point of tests with a specific seed either, that doesn't really verify much of the expected/intended behavior.

If you wanted to keep playing with this, I don't mind reviewing another PR. Otherwise, we can create a risk item to address this. I think a buffered channel of pre-generated Transaction_IDs, and another for Span_IDs may do the trick. Creating a pool of generators may increase the risk of duplication depending on how they are seeded, and I am not sure the possible performance upside would be worth the added complexity of mitigating that.

ankon · 2024-10-24T06:53:53Z

🤯 How weird..

Yup. In go 1.14 defers got pretty fast (see https://go.googlesource.com/go/+/be64a19d99918c843f8555aad580221207ea35bc; this is an open-coded defer indeed), but that doesn't explain the "why is it faster" question.

Playing in compiler explorer (https://godbolt.org/z/4W4a17jdn) shows some differences, but my low-level go-foo isn't good enough to understand it. Benchmarks to the rescue :)

I think a buffered channel of pre-generated Transaction_IDs, and another for Span_IDs may do the trick. Creating a pool of generators may increase the risk of duplication depending on how they are seeded, and I am not sure the possible performance upside would be worth the added complexity of mitigating that.

I might play with that eventually, for now I'm looking forward to seeing how much this improves our situation in the real world. Iterating, piece by piece.

ankon mentioned this issue Oct 21, 2024

Improve performance of trace id generation #977

Merged

iamemilio self-assigned this Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heavily contended lock in TraceIDGenerator.generateID #767

Heavily contended lock in TraceIDGenerator.generateID #767

ankon commented Aug 10, 2023

ankon commented Oct 22, 2024

iamemilio commented Oct 22, 2024 •

edited

Loading

ankon commented Oct 23, 2024

iamemilio commented Oct 23, 2024 •

edited

Loading

ankon commented Oct 24, 2024

Heavily contended lock in TraceIDGenerator.generateID #767

Heavily contended lock in TraceIDGenerator.generateID #767

Comments

ankon commented Aug 10, 2023

ankon commented Oct 22, 2024

iamemilio commented Oct 22, 2024 • edited Loading

ankon commented Oct 23, 2024

iamemilio commented Oct 23, 2024 • edited Loading

ankon commented Oct 24, 2024

iamemilio commented Oct 22, 2024 •

edited

Loading

iamemilio commented Oct 23, 2024 •

edited

Loading