Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge k-mers of different lengths #3

Open
Leo-ccc opened this issue Nov 15, 2024 · 1 comment
Open

merge k-mers of different lengths #3

Leo-ccc opened this issue Nov 15, 2024 · 1 comment

Comments

@Leo-ccc
Copy link

Leo-ccc commented Nov 15, 2024

Hi,

The lengths of the features (or k-mers) I'm interested in are different. It didn't return any warning in the 'index' processing, but I got an error message when I ran the 'merge' module.
[ERROR] KaMRaT-merge relies on the index in k-mer mode, please rerun KaMRaT-index with -klen option

I wonder if it will be possible to update a method in the future that can merge k-mers of different lengths to obtain contigs?

@hl-xue
Copy link
Collaborator

hl-xue commented Nov 16, 2024

Hi,

Thanks for using Kamrat.

Currently, Kamrat does not support merging features with variable length, as our definition of k-mer is “sequence of fixed length k”. Also, when kamrat index is launched without the argument -klen INT, the features are treated as general character strings (i.e., not absolutely being sequences A/C/G/T but can be gene names such as TP53, transcript IDs such as ENST00000714409, etc.). So, it doesn’t check the feature length and thus doesn’t warn k-mers having variable lengths. However, normally it should raise a warning "indexing in general: features are not considered as k-mers” in the index step. Please let me know if this is not the case.

I imagine a potential solution to your demand: maybe firstly launching a k-mer counting method (e.g., jellyfish) to break the variable-length sequences into fixed-length k-mers, then joining the count vectors into a count table would help. After this, you should be able to run Kamrat on the newly obtained k-mer count table (please run with kamrat index -klen INT to define the analysis in k-mer mode).
As for k-mer length, I would suggest targeting for k=31 if possible, or alternatively an odd number not shorter than 21.

Potentially in a future version of Kamrat, we may support the merge function with variable k-mer length. This is within our long-term plan.

Please feel free to comment below if any further questions.

Kind regards,
Haoliang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants