Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: refactor calculate_gather_stats to disallow repeated downsampling #3352

Merged
merged 29 commits into from
Oct 15, 2024

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Oct 14, 2024

This PR builds on the refactoring in #3342 to do less downsampling and also avoids doing intersections twice (per #3196).

Benchmarks in sourmash-bio/sourmash_plugin_branchwater#471 are pretty astonishing...

Fixes #3196

Copy link

codecov bot commented Oct 14, 2024

Codecov Report

Attention: Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.

Project coverage is 86.42%. Comparing base (7d11173) to head (72d2044).
Report is 1 commits behind head on latest.

Files with missing lines Patch % Lines
src/core/src/index/mod.rs 83.33% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           latest    #3352   +/-   ##
=======================================
  Coverage   86.42%   86.42%           
=======================================
  Files         137      137           
  Lines       16069    16070    +1     
  Branches     2211     2211           
=======================================
+ Hits        13888    13889    +1     
  Misses       1874     1874           
  Partials      307      307           
Flag Coverage Δ
hypothesis-py 25.47% <ø> (ø)
python 92.39% <ø> (ø)
rust 62.06% <90.90%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ctb ctb changed the title WIP: refactor calculate_gather_stats to disallow repeated downsampling MRG: refactor calculate_gather_stats to disallow repeated downsampling Oct 14, 2024
@ctb
Copy link
Contributor Author

ctb commented Oct 14, 2024

ready for review @luizirber @bluegenes

@luizirber luizirber self-requested a review October 15, 2024 16:58
Base automatically changed from refactor_rs_downsample to latest October 15, 2024 17:12
@ctb ctb merged commit 4bae86b into latest Oct 15, 2024
44 checks passed
@ctb ctb deleted the gather_stats_refactor branch October 15, 2024 18:06
ctb added a commit that referenced this pull request Oct 15, 2024
## [0.16.0] - 2024-10-15

MSRV: 1.65

Changes/additions:

* refactor `calculate_gather_stats` to disallow repeated downsampling
(#3352)
* improve downsampling behavior on `KmerMinHash`; fix `RevIndex::gather`
bug around `scaled`. (#3342)
* derive Hash for `HashFunctions` (#3344)

Updates:

* Bump web-sys from 0.3.70 to 0.3.72 (#3354)
* Bump tempfile from 3.12.0 to 3.13.0 (#3340)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor calculate_gather_stats to avoid calculating intersections twice
2 participants