Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement: Non-recursively drop TokenStreams with a visitor #489

Closed
wants to merge 1 commit into from

Conversation

WalkerKnapp
Copy link

Hello! I noted in a profiler on my system that while running the benchmarks in the syn repository that an outsided amount of time is spent in the non-recursive Drop implementation for TokenStream that was introduced in #232 to solve the stack overflow noted in #55. A large portion of this time can be removed by using a visitor pattern to drop segments of the TokenStream instead of flattening groups into the inner Vec.

This adds some complexity which may be undesirable, but (at least in my tests) it does yield good results when repeatedly running those benchmarks.

Running syns rust benchmark with proc-macro2 1.0.92:

tokenstream_parse: elapsed=0.602s (+/- 0.005s)
syn_parse: elapsed=2.558s (+/- 0.024s)

Running syns rust benchmark with this fork:

tokenstream_parse: elapsed=0.563s (+/- 0.005s)
syn_parse: elapsed=2.236s (+/- 0.030s)

This yields about a 7% performance improvement in the tokenstream_parse test and a 12% performance improvement in the syn_parse test on my system.

Opening as a draft to get feedback and testing with other use-cases. If no regressions are found, I will mark it ready for review. Thanks!

Copy link
Owner

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

If I understand correctly, the speedup in this PR is from moving non-Group tokens from heap to stack directly, instead of heap to heap to stack. In real-world inputs there is always a much larger number of non-Group tokens than Group tokens, so the less moving matters.

To implement that, you have created a linked list of Groups, where a new linked list node is heap-allocated for every Group token visited (other than top-level ones).

I don't think allocating a separate linked list node for every Group is the best representation for this algorithm. Would you be willing to try benchmarking this against #490?

@WalkerKnapp
Copy link
Author

Sure! This is what I'm seeing with that branch on my machine:

tokenstream_parse: elapsed=0.563 (+/- 0.039)
syn_parse: elapsed=2.240 (+/- 0.047)

Seems to be within margin of error of my fork, and I'd agree that the solution you gave is quite a bit simpler :). I'll close this as obsolete in that case, but I'm glad to have been able to spark the idea!

@dtolnay
Copy link
Owner

dtolnay commented Jan 11, 2025

Published in 1.0.93.

I am glad you are profiling syn benchmarks. I have not looked at that in years.

You might be interested to pick up #336.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants