Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suboptimal alignment in endsfree mode when match score != 0 #102

Open
joyeuxnoel8 opened this issue Oct 22, 2024 · 1 comment
Open

suboptimal alignment in endsfree mode when match score != 0 #102

joyeuxnoel8 opened this issue Oct 22, 2024 · 1 comment

Comments

@joyeuxnoel8
Copy link

Hi,

I notice that the program can return a suboptimal alignment in endsfree mode when match score != 0.
Given a pattern (sequencing read) GGGGCGCGTCGGGCTCCGGGTGTGGGGGGGGTGTGGGGGGGGGGGGTGGTGTGTGGGGGTGTGGCTGGTGAATGGGGTGAGGGTGGTGAGTGGAGTGAGGGTGGTGAGTGGGGTGAGGGTGGTGAATGGGGTGAGGGTGGTGAGTGGGGT,
and text (repetitive DNA in tandem repeat region) CCTGAGGCCCCGGGTGTGGAGCGGAGGTGGACCAGAGGTGGACACAGACCCACGGGCCGCCAAGGCCCACCCAGGATCCCCCGGGGGCCATCCACATCTGGTAAAGCCGAGGTGTGGGCGGACCCCAGGAAGCAGCCCCCACCCCTGCCCCCAGTGGCTCAGGCCTGGGCAGAGAAAACAGGCCCAGCAGGGCGGCAGGGTGGGATCCCCACGATTCACCGAGGATGCGTCTTCCACAGGGAGAGTTTGGGGGAGCTGTGTGTGAAAATGTGAGTAACGTACATAAATCAGTATCACAGGAATCCAGGCGGGCGGAGGATGCATGACTGAACTTGGAGGACGCTCATCAGGGAGGTCAGTGCTCCCCTCCGGGGACAGGATCCTGCCTTCGCCTGGCCTGCGGGACAGGGCTCCCCTTGCCGGCCAGGGGCTACTGGCCACTGATGCTCACTTTGGGCTTCCGCCCCCCAGGGGAAGGGGTGCTGAGAGCCCCGTGTCCGGAGGGCTGGTGAGTGGGGCTGAGGCTGGTGGAGTGGGGGTGAGGCTGGTGAATGGGGTGAGGGTGGTGAGTGGAGTGAGGGTGGTGAGTGGGGGTGAGGGTGGTGAGTGGAGTGAGGGTGGTGAGTGGGGGTGAGGGTGGTGAGTGGGGGTGAGGGTGGTGAGTGGAGTGAGGGTGGTGTGGGTGAGTGGTGAGTGGAGTGAGGGTGGTGAGTGGGGGTGAGGGTGATGAGTGGGGTGAGGGTGGTGAGTGGAGTGAGGGTGGTGAGTGGGGGTGAGGGTGGTGAGTGGGGTGAGGGTGGTGAATGGGGTGAGGCTGGTGAATGGGGTGAGGGTGATGAGTGGGGTGAGGGTGGTGAGTGGGGTGAGGCTGGTGAGTGGGGGTGAGGGTGATGAGTGGGGTGAGGGTGGTGAGTGGGGTGAGGCTGGTGAGTGGGGGTGAGGGTGGTGAATGGAGTGAGGGTGGTGAGTGGGGGTGAGGGTGGTGAGTGGGGGTGAGGCTGGTGAGTGGGGGTGAGGGTGGTGAGTGGGGGTGAGGGTGATGAGTGGGGTGAGGGTGGTGAGTGGAGTGAGGGTGGTGAATGGGGTGAGGGTGGTGAGTGGGGGTGAGGGTGGTGAGTGGGGTGAGGGTGGTGAATGGGGTGAGGGTGATGAGTGGAGTGAGGGTGATGAGTGGGGTGAGGGTGGTGAGTGGAGTGAGGGTGGTGAGTGGGGGTGAGGGTGGTGAGTGGGGGTGAGGGTGGTGAGTGGGGGTGAGGGTGGTGAGTGGGGGTGAGGGTGGTGAGTGGGGGTGAGGGTGGTGAGTGGGGGTGAGGGTGGTGAGTGGGGGTGAGGGTGATGAGTGGGGTGAGGGTGGTGAGTGGAGTGAGGGTGGTGAATGGGGTGAGGGTGGTGAGTGGGGGTGAGGGTGGTGAGTGGGGTGAGGGTGGTGAATGGGGTGAGGCTGGTGAATGGGGTGAGGGTGATGAGTGGGGTGAGGGTGGTGAGTGGGGTGAGGCTGGTGAGTGGGGGTGAGGGTGATGAGTGGGGTGAGGGTGGTGAGTGGGGTGAGGCTGGTGAGTGGGGGTGAGGGTGGTGAGTGGAGTGAGGGTGGTGAGTGGGGGTGAGGCTGGTGAATGGGGTGAGGGTGGTGAGTGGGGTGAGGCTGGTGAGTGGGGGTGAGGCTGGTGAATGGGGTGAGGGTGGTGAGTGGGGTGAGGGTGGTGAGTGGGGGTGAGGGTGATGAGTGGGGTGAGGGTGGTGAGTGGGGTGAGGCTGGTGAGTGGGGGTGAGGGTGGTGAGTGGAGTGAGGGTGGTGAGTGGGGGTGAGGGTGGTGAGTGGGGGTGAGGCTGGTGAGTGGGGTGAGGGTAGTGGGTGGGGCTGAGGTTATTCCAGCCTCGGGCACTGGATCTTCTCGGGGTGGGGGGGTTTGTGAGCGCTGACCCCCTGGGCTGTCTCCACCTTGTCCTGGGGCTGGGTCCCCGGACGACGCGGCCACAGCTCCTGGGAGAGTGGCCAGCCCTCGGACAGCTGTGAGCCCCCACGGGGGTGTCTGGGTTCGAGGCCACGTTGCAGACCCGCTGGCTGCTGGGGCTCAGGGAGGAAATGACCTGGCCTCCTGGAGCTTCAGATTCCTCATCTGTGTGCTGAGGGAAGGGGCACATCTCGGAGCCTGGGGACTCCCGGCGTGTGGGCTGCTTGCCTGGCACCCGCTCACCCAGGAGTTGTCCTTGCTGTGGGCTCTGAGCCTCCGGGATGGAGTGGGGCTGAGAGCGTGTCCACCACCTCCACCACATCAGCCTGTCCCTGGTCCTGCTCCGCCAGATGACAAATCTCTGGGAAATCTTCTTTAATTTTGTTCTCTGGGAAGTGGTAGGTTTTGGAGA,
the output has an alignment score of 136 with the CIGAR string being

The optimal alignment should have a score of 160 against this substring in text GGGGTGAGGCTGGTGAATGGGGTGAGGGTGGTGAGTGGGGTGAGGCTGGTGAGTGGGGGTGAGGCTGGTGAATGGGGTGAGGGTGGTGAGTGGGGTGAGGGTGGTGAGTGGGGGTGAGGGTGATGAGTGGGGTGAGGGTGGTGAGTGGGGT with the following CIGAR string MMMMXMXMXMXMMXXXXXMMXMMMXMMMXMMMMXMXMMMMXMXMMXMMMMMXMMMMMMMMMXMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMXMMMMMMMMMMMMMMMDMMMMMMMMMMMMXMMMXMMMMMMMMMMMMMMMMMMMMMMMM.

Upon inspection, the suboptimal alignment has a better prefix compared to the optimal alignment as shown below (replaced M with = for visualization):
====X=X=X=X==XXXXX==X===X===X====X=X====X=X==X=====X=========X===============================X===============D============X===X======================== optimal cigar
====X=X=X=X==XXXXX==X===X===X====X=X====X=X========X=========X==X==X===X=====================X======X============D========M===X===X===================X suboptimal cigar
It seems like a nonzero match score plus a better prefix in the suboptimal substring causes the program to keep extending the suboptimal wavefront as long as the alignment score doesn't drop below the second best wavefront. Not sure how easily this can be fixed. My initial thought is maybe we need to consider the potential of each wavefront to surpass the best wavefront under nonzero match scoring scheme.

This test was performed with the following c++ template

#include "bindings/cpp/WFAligner.hpp"
WFAlignerGapAffine aligner(-2,4,6,2,WFAligner::Alignment,WFAligner::MemoryHigh);
int freelen = text.size() - pattern.size();
aligner.alignEndsFree(text, freelen, freelen, pattern, 0, 0);
cigar = aligner.getAlignment();
score = aligner.getAlignmentScore();

Thanks,
Tony

@joyeuxnoel8
Copy link
Author

joyeuxnoel8 commented Oct 22, 2024

Another note, I tried using match score=0 with WFAlignerGapAffine aligner(0,4,6,2,WFAligner::Alignment,WFAligner::MemoryHigh); for the above example, but the program didn't return any reasonable alignment and just aligned pattern to the end of the text. The output alignment score is -310 withwhile the optimal alignment CIGAR would have a score of -91 under this scoring scheme.

Not sure if I missed anything. Any help would be appreciated. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant