Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg: version comparison fastpath #11273

Merged
merged 6 commits into from
Jan 19, 2025

Conversation

gridbugs
Copy link
Collaborator

@gridbugs gridbugs commented Jan 6, 2025

Solving dependencies requires performing many package version comparisons. Opam's logic for comparing versions is complicated in the worst case, however in practice most package versions follow an approximation of semantic versioning. In such cases the components of the version can be packed into a single integer value which can be efficiently compared.

This change introduces a pre-processing step for package versions that packs them into an int of possible and uses int comparison when comparing package versions when possible.

This is based on an optimization in python's uv package manager.

I benchmarked this change by solving https://github.com/gridbugs/climate where the time to solve went from ~2.6s to ~2.3s.

With #11264 applied the solve time goes from ~0.65s to ~0.55s.

For larger projects (e.g. bonsai) the improvement is surprisingly less pronounced. I don't yet understand why this is.

@Leonidas-from-XIV
Copy link
Collaborator

I think this is a cool idea! For such optimizations I think it would be good to have a simple microbenchmark thing to be able to tell whether some change makes it faster or slower - I have some ideas but I would like to be able to estimate the performance impact of the suggestion.

@rgrinberg
Copy link
Member

How about using the version comparison in this blog post? https://roscidus.com/blog/blog/2024/07/22/performance-2/ Does it not yield us enough of a benefit?

Also, since this is a change in the vendored library, it needs to have a patch in our "fork" repo

@gridbugs
Copy link
Collaborator Author

gridbugs commented Jan 9, 2025

I'll try out the idea from that post and compare it to my change. For benchmarks I think a macrobenchmark would be more useful so our optimizations are guided by real-life workloads. Perhaps we could use the solve-times of a few different packages of different sizes?

@Leonidas-from-XIV
Copy link
Collaborator

The problem with real-life package testing is, while also useful, that they introduce a lot of noise due to I/O and are potentially slow. If I want to compare 1000 runs of the optimized code to 1000 runs to the older code, the difference is harder to see if each runs also e.g. loads 1000 OPAM files via calls to git and I have to wait an hour for them to finish.

@rgrinberg
Copy link
Member

I'm inclined to agree with Marek. It should be possible to write a version comparison function that is faster on all inputs. So a micro benchmark on say 100 common version comparison pairs should hopefully demonstrate that one algorithm is strictly superior to the other one.

At a first glance, it seems like your approach is the more promising one. Though I would like to confirm this first.

@rgrinberg rgrinberg force-pushed the version-comparison-fastpath branch from 480bbda to c5ec620 Compare January 14, 2025 23:20
@rgrinberg
Copy link
Member

I've closed the gap between this PR and main by implementing some rather simple optimizations. Nevertheless, this PR seems very valuable. I think there are still some bugs lurking here though as CI is failing.

gridbugs and others added 4 commits January 18, 2025 21:46
Solving dependencies requires performing many package version
comparisons. Opam's logic for comparing versions is complicated in the
worst case, however in practice most package versions follow an
approximation of semantic versioning. In such cases the components of
the version can be packed into a single integer value which can be
efficiently compared.

This change introduces a pre-processing step for package versions that
packs them into an int of possible and uses int comparison when
comparing package versions when possible.

This is based on an optimization in python's uv package manager.

Signed-off-by: Stephen Sherratt <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]>

<!-- ps-id: d634f7b4-a7db-4e84-87ed-1f3f153fab48 -->

Signed-off-by: Rudi Grinberg <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]>

<!-- ps-id: a9d39007-ba86-4ce3-8ab6-618146a0f4e5 -->

Signed-off-by: Rudi Grinberg <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]>
@rgrinberg rgrinberg force-pushed the version-comparison-fastpath branch from c5ec620 to 54d7cee Compare January 18, 2025 23:11
Signed-off-by: Rudi Grinberg <[email protected]>
starts_with requires 4.13

Signed-off-by: Rudi Grinberg <[email protected]>
@rgrinberg
Copy link
Member

Pushed a few fixes to make this ready. Thanks!

@rgrinberg rgrinberg merged commit 4a5ece6 into ocaml:main Jan 19, 2025
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants