Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature requests: R bindings and early stopping #80

Open
traversc opened this issue Aug 8, 2023 · 1 comment
Open

Feature requests: R bindings and early stopping #80

traversc opened this issue Aug 8, 2023 · 1 comment
Labels
feature Feature request

Comments

@traversc
Copy link

traversc commented Aug 8, 2023

Nice work!

Would it be possible to get R bindings for this? I put together a minimal example here: https://github.com/traversc/WavefrontAlignR (feel free to do whatever with it)

I'd also like to a request an "early stopping" feature, where if the best possible alignment distance exceeds a user defined threshold, stop alignment and return a flag value (like INT_MAX). Assuming this doesn't add too much overhead, this would be useful because I'm mostly interested in finding only highly similar sequences between two sets.

Last, I ran a quick benchmark comparing an existing R package. Is this a fair comparison? Code used to run WFA2 here: https://github.com/traversc/WavefrontAlignR/blob/main/src/WFA_bindings.cpp

# Benchmark for a 10,000 x 10,000 alignment
# "seqs" is a vector of DNA sequences on average 43 bp long
library(WavefrontAlignR)
library(stringdist)
library(tictoc)

# WFA2 levenshtein
tic()
y1 <- WavefrontAlignR::edit_dist_matrix(seqs, seqs)
toc()
# 191.452 sec elapsed, 522324 alignments / sec

# stringdist levenshtein
tic()
y2 <- stringdist::stringdistmatrix(seqs, seqs, method = "lv", nthread=1)
toc()
# 677.356 sec elapsed, 147633 alignments / sec
@smarco
Copy link
Owner

smarco commented Sep 25, 2023

Sorry for the late reply (I was about to send this message, and then it slipped my mind...).

(1) R bindings

Yes, sure, that would be awesome. At this moment, don't have the bandwidth to implement this feature. But is definitely something I would like to have. Thanks for the example and request.

If you feel like it, you could wrap your example under bindings/r (linked to the current version) and make a pull request. I would be very happy if you take over and take the credit for it. Only if you want to.

(2) Early stop

There is actually one here. the function wavefront_aligner_set_max_alignment_steps allows to set the maximum number of sets (i.e., max alignment score) to reach before quitting. Have a look and let me know if that is what are you looking for.

Let me know,
Thanks.

(3) (NxN) benchmark

In principle, seems fair to me (edit, score only, ...).

@smarco smarco added the feature Feature request label Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature request
Projects
None yet
Development

No branches or pull requests

2 participants