Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Sorting Algorithm #173

Open
3 tasks done
ashvardanian opened this issue Sep 27, 2024 · 0 comments
Open
3 tasks done

New Sorting Algorithm #173

ashvardanian opened this issue Sep 27, 2024 · 0 comments
Labels
core Work on the algorithm design and implementation enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed huge Large task potentially involving architectural or breaking changes performance Performance related discussion or suggestion

Comments

@ashvardanian
Copy link
Owner

Describe what you are looking for

For lexicographic sorting of strings, StringZilla uses a "hybrid-hybrid" approach with $O(n * log(n))$ and.

  1. Radix sort for first bytes exported into a continuous buffer for locality.
  2. IntroSort on partially ordered chunks to balance efficiency and worst-case performance.
    1. IntroSort begins with a QuickSort.
    2. If the recursion depth exceeds a certain threshold, it switches to a HeapSort.

That approach has several issues.

  • The current solution can't work for arrays over 4 billion entries.
  • It's also potentially illformed on 32-bit systems.
  • It doesn't use SIMD anywhere, not even for the Radix step.
  • It loses to hybrid algorithm prototype built with std::sort by a factor of 3!

So there is a lot we can improve, and there several links to keep in mind for anyone considering collaborating here.

  1. Sorting networks are marvelous structures and there is a collection of fast circuits by @bertdobbelaere we can implement with SIMD.
  2. Similar to TimSort, once sorted chunks are produced, one can merge them more efficiently using SIMD techniques implemented in SimSIMD for set intersections in v5.3 for SVE2, AVX-512, and AVX2.

This is not an inclusive list, and I'm open to other algorithmic ideas.

Can you contribute to the implementation?

  • I can contribute

Is your feature request specific to a certain interface?

It applies to everything

Contact Details

No response

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct
@ashvardanian ashvardanian added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers performance Performance related discussion or suggestion core Work on the algorithm design and implementation huge Large task potentially involving architectural or breaking changes labels Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Work on the algorithm design and implementation enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed huge Large task potentially involving architectural or breaking changes performance Performance related discussion or suggestion
Projects
None yet
Development

No branches or pull requests

1 participant