Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(tree): use const and type declaration #421

Merged
merged 4 commits into from
Jan 17, 2024

Conversation

weiihann
Copy link
Contributor

@weiihann weiihann commented Dec 2, 2023

This PR does the following:

  1. Use Stem []byte instead of []byte, as well as KeyToStem(key []byte) instead of key[:31] in most occurrences that require stem. This enhances code readability, without affecting the underlying logic.
  2. Replaces 32 with constants LeafValueSize or KeySize for code maintainability.

BREAKING CHANGE:
The GetProofItems function in the VerkleNode interface now returns []Stem instead of [][]byte, which affects geth's usage of this library

Benchmark results:

                                        old allocs/op   new allocs/op  delta               
CommitLeaves/insert/leaves/1000-8         16.82k ± ∞ ¹   16.96k ± ∞ ¹  +0.87% (p=0.008 n=5)
CommitLeaves/insert/leaves/10000-8        155.4k ± ∞ ¹   155.3k ± ∞ ¹  -0.11% (p=0.008 n=5)
CommitFullNode-8                          3.795k ± ∞ ¹   3.795k ± ∞ ¹       ~ (p=1.000 n=5) ²
ModifyLeaves-8                            164.4k ± ∞ ¹   164.3k ± ∞ ¹       ~ (p=1.000 n=5)

Copy link
Collaborator

@jsign jsign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

You did a good job describing the benefits of the change, and we've considered doing something like this before. Unfortunately, it comes with an unfortunate important side-effect which is more allocations, and impacts performance.

In the current (master) version, since we use []byte this is a light reference to an existing place in memory, so we don't have to do these copy(...) in many places. These copies end up in arrays that escape to the heap, and thus count as heap allocations and adds pressure to the GC.

In the past, we had to be very strict with avoiding allocations to improve the performance of this library, so I'm not sure we can do any change that makes a performance regression on that front without being sure we'll need to revert this change later in the future.

To be a bit more objective, here's a benchmark difference between master and this branch (showing only relevant lines with perf change):

name                                      old allocs/op  new allocs/op  delta
CommitLeaves/insert/leaves/1000-16           17.0k ± 0%     18.5k ± 0%   +8.91%  (p=0.016 n=4+5)
CommitLeaves/insert/leaves/10000-16           155k ± 0%      176k ± 0%  +12.93%  (p=0.008 n=5+5)
CommitFullNode-16                            3.80k ± 0%     4.06k ± 0%   +6.74%  (p=0.008 n=5+5)
ModifyLeaves-16                               162k ± 0%      192k ± 0%  +18.45%  (p=0.008 n=5+5)

This isn't surprising by the nature of the change, since as I expected, this would mean more allocations and thus more GC work.

I think one option is using a feature since Go 1.17 to make ExtStem be a *[31]byte, and point that to the key. (More about this in the spec in the section Conversions from slice to array or array pointer). So basically, it's an array of 31 bytes that will point to the same location in memory, so avoiding the performance problem. But I'm not sure using pointers to arrays just to signal a length of 31 is worth it.

Another option that still avoids the allocation, and gets most of the type system checking and semantics, is to define type ExtStem { stem []byte }. Despite this doesn't have the type-length internally, you could have a KeyToStem(key []byte) ExtStem, which would be mostly a ExtStem { stem: key[:31] } so "by definition", if we always use KeyToStem(...) (and nobody plays tricks creating ExtStem directly), we're much safer. And every place that needs to have a stem value would use ExtStem which already signals readers what this type means, it's length (checked at creation time), and this still avoids "copying bytes".

We could also use a more relaxed type ExtStem []byte, but in function calls for func something(a []ExtStem) you could actually pass any []byte as a parameter. Still isn't that bad since the user is seeing that it should pass ExtStem but it allows coercion at the language level.

cc @gballet if also has some preference around this refactor in general.

@weiihann
Copy link
Contributor Author

weiihann commented Dec 4, 2023

Thanks for the comments! I'm personally more keen to do something like ExtStem { stem: key[:31] } and KeyToStem(...) so it's safer and for better code readability. Will push the latest code changes ASAP.

Copy link
Member

@gballet gballet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the argument for increased readability, but before we even consider merging the newer version of this PR (given its huge footprint), we need to be a 100% sure that there are no speed penalty for that.

So if you want to proceed with that ExtStem{ ... } approach, make sure you have benchmarks to back it up.

conversion.go Outdated Show resolved Hide resolved
debug.go Outdated Show resolved Hide resolved
@weiihann weiihann changed the title refactor(tree): replace []byte with ExtStem [31]byte for stems refactor(tree): use const and type declaration Dec 4, 2023
@weiihann
Copy link
Contributor Author

weiihann commented Dec 4, 2023

@gballet @jsign I've pushed a completely new set of changes and changed the PR title and description, please review again. I've decided to go with the ExtStem []byte structure instead of ExtStem { stem: key[:31] } structure. For code readability, I think it's better comparatively and has less code footprint.

Copy link
Collaborator

@jsign jsign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unfortunate that type Stem []byte can't achieve what your original intention was, since we can literally put anything there as before. But, at least we have some extra semantical "signaling" out there...

The constant additions/changes were good. 👍

Left a review comment for consideration but overall approved, but I'd like @gballet final take on this PR in general.

Thank you again!

tree.go Outdated Show resolved Hide resolved
tree.go Outdated Show resolved Hide resolved
Copy link
Member

@gballet gballet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

tree.go Outdated Show resolved Hide resolved
@gballet gballet merged commit 22541f1 into ethereum:master Jan 17, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants