-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(tree): use const and type declaration #421
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR!
You did a good job describing the benefits of the change, and we've considered doing something like this before. Unfortunately, it comes with an unfortunate important side-effect which is more allocations, and impacts performance.
In the current (master
) version, since we use []byte
this is a light reference to an existing place in memory, so we don't have to do these copy(...)
in many places. These copies end up in arrays that escape to the heap, and thus count as heap allocations and adds pressure to the GC.
In the past, we had to be very strict with avoiding allocations to improve the performance of this library, so I'm not sure we can do any change that makes a performance regression on that front without being sure we'll need to revert this change later in the future.
To be a bit more objective, here's a benchmark difference between master
and this branch (showing only relevant lines with perf change):
name old allocs/op new allocs/op delta
CommitLeaves/insert/leaves/1000-16 17.0k ± 0% 18.5k ± 0% +8.91% (p=0.016 n=4+5)
CommitLeaves/insert/leaves/10000-16 155k ± 0% 176k ± 0% +12.93% (p=0.008 n=5+5)
CommitFullNode-16 3.80k ± 0% 4.06k ± 0% +6.74% (p=0.008 n=5+5)
ModifyLeaves-16 162k ± 0% 192k ± 0% +18.45% (p=0.008 n=5+5)
This isn't surprising by the nature of the change, since as I expected, this would mean more allocations and thus more GC work.
I think one option is using a feature since Go 1.17 to make ExtStem
be a *[31]byte
, and point that to the key. (More about this in the spec in the section Conversions from slice to array or array pointer). So basically, it's an array of 31 bytes that will point to the same location in memory, so avoiding the performance problem. But I'm not sure using pointers to arrays just to signal a length of 31 is worth it.
Another option that still avoids the allocation, and gets most of the type system checking and semantics, is to define type ExtStem { stem []byte }
. Despite this doesn't have the type-length internally, you could have a KeyToStem(key []byte) ExtStem
, which would be mostly a ExtStem { stem: key[:31] }
so "by definition", if we always use KeyToStem(...)
(and nobody plays tricks creating ExtStem
directly), we're much safer. And every place that needs to have a stem value would use ExtStem
which already signals readers what this type means, it's length (checked at creation time), and this still avoids "copying bytes".
We could also use a more relaxed type ExtStem []byte
, but in function calls for func something(a []ExtStem)
you could actually pass any []byte
as a parameter. Still isn't that bad since the user is seeing that it should pass ExtStem
but it allows coercion at the language level.
cc @gballet if also has some preference around this refactor in general.
Thanks for the comments! I'm personally more keen to do something like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the argument for increased readability, but before we even consider merging the newer version of this PR (given its huge footprint), we need to be a 100% sure that there are no speed penalty for that.
So if you want to proceed with that ExtStem{ ... }
approach, make sure you have benchmarks to back it up.
use StemSize instead of 31 use ExtStem instead of []byte
@gballet @jsign I've pushed a completely new set of changes and changed the PR title and description, please review again. I've decided to go with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit unfortunate that type Stem []byte
can't achieve what your original intention was, since we can literally put anything there as before. But, at least we have some extra semantical "signaling" out there...
The constant additions/changes were good. 👍
Left a review comment for consideration but overall approved, but I'd like @gballet final take on this PR in general.
Thank you again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
This PR does the following:
Stem []byte
instead of[]byte
, as well asKeyToStem(key []byte)
instead ofkey[:31]
in most occurrences that require stem. This enhances code readability, without affecting the underlying logic.32
with constantsLeafValueSize
orKeySize
for code maintainability.BREAKING CHANGE:
The
GetProofItems
function in theVerkleNode
interface now returns[]Stem
instead of[][]byte
, which affects geth's usage of this libraryBenchmark results: