-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg: version comparison fastpath #11273
Conversation
I think this is a cool idea! For such optimizations I think it would be good to have a simple microbenchmark thing to be able to tell whether some change makes it faster or slower - I have some ideas but I would like to be able to estimate the performance impact of the suggestion. |
How about using the version comparison in this blog post? https://roscidus.com/blog/blog/2024/07/22/performance-2/ Does it not yield us enough of a benefit? Also, since this is a change in the vendored library, it needs to have a patch in our "fork" repo |
I'll try out the idea from that post and compare it to my change. For benchmarks I think a macrobenchmark would be more useful so our optimizations are guided by real-life workloads. Perhaps we could use the solve-times of a few different packages of different sizes? |
The problem with real-life package testing is, while also useful, that they introduce a lot of noise due to I/O and are potentially slow. If I want to compare 1000 runs of the optimized code to 1000 runs to the older code, the difference is harder to see if each runs also e.g. loads 1000 OPAM files via calls to |
I'm inclined to agree with Marek. It should be possible to write a version comparison function that is faster on all inputs. So a micro benchmark on say 100 common version comparison pairs should hopefully demonstrate that one algorithm is strictly superior to the other one. At a first glance, it seems like your approach is the more promising one. Though I would like to confirm this first. |
480bbda
to
c5ec620
Compare
I've closed the gap between this PR and main by implementing some rather simple optimizations. Nevertheless, this PR seems very valuable. I think there are still some bugs lurking here though as CI is failing. |
Solving dependencies requires performing many package version comparisons. Opam's logic for comparing versions is complicated in the worst case, however in practice most package versions follow an approximation of semantic versioning. In such cases the components of the version can be packed into a single integer value which can be efficiently compared. This change introduces a pre-processing step for package versions that packs them into an int of possible and uses int comparison when comparing package versions when possible. This is based on an optimization in python's uv package manager. Signed-off-by: Stephen Sherratt <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]> <!-- ps-id: d634f7b4-a7db-4e84-87ed-1f3f153fab48 --> Signed-off-by: Rudi Grinberg <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]> <!-- ps-id: a9d39007-ba86-4ce3-8ab6-618146a0f4e5 --> Signed-off-by: Rudi Grinberg <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]>
c5ec620
to
54d7cee
Compare
Signed-off-by: Rudi Grinberg <[email protected]>
starts_with requires 4.13 Signed-off-by: Rudi Grinberg <[email protected]>
Pushed a few fixes to make this ready. Thanks! |
Solving dependencies requires performing many package version comparisons. Opam's logic for comparing versions is complicated in the worst case, however in practice most package versions follow an approximation of semantic versioning. In such cases the components of the version can be packed into a single integer value which can be efficiently compared.
This change introduces a pre-processing step for package versions that packs them into an int of possible and uses int comparison when comparing package versions when possible.
This is based on an optimization in python's uv package manager.
I benchmarked this change by solving https://github.com/gridbugs/climate where the time to solve went from ~2.6s to ~2.3s.
With #11264 applied the solve time goes from ~0.65s to ~0.55s.
For larger projects (e.g. bonsai) the improvement is surprisingly less pronounced. I don't yet understand why this is.