Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PolyA tail inquiry #199

Open
simonecaniONT opened this issue Jan 14, 2025 · 2 comments
Open

PolyA tail inquiry #199

simonecaniONT opened this issue Jan 14, 2025 · 2 comments

Comments

@simonecaniONT
Copy link

hello,

not sure if this is the right channel to ask but I'll try.
A user is wants to train his algorithm to detect ribonucleoties that can be incorporated in both DNA and RNA.
to achieve this, they are about to order 60 bp oligos which can be sequenced as the minimum read length allowed is now 50 bp for both DNA and RNA seq.

Can these short molecules be impacted by low quality basecalling given their redueced length?
User asked if adding a polyA tail (already during oligo synthesis or during prep) could improve sequencing overall quality.
Does anyone have any suggestion on how to get the best quality out of an experiment like this?

many thanks

@marcus1487
Copy link
Collaborator

A number of issues here. I'll try to break these down to separate responses.

Models for RNA and DNA would be complete separate. Thus models for incorporation in RNA and DNA would require separate training datasets and models.

Short molecules can certainly produce noisier training data. This is not necessarily a bad thing for modified base training. We train production models from strands that are approximately 100-120 bases in length. So 60 bases is short, but not infeasible. We do not have specific testing at this range, so the best advise I can give here is for the user to test model training from these strands and iterate based on the results.

I would suggest that such a task has a high probability of failure and would require a high level of expertise in the combination of wet lab protocols, machine learning expertise and bioinformatics skills. This is not a simple project on which straightforward "catch all" advice can be provided. I can give further advise on specific aspects of this project if the user can supply more details about the intended final application and the exact nature of the training data.

@simonecaniONT
Copy link
Author

thanks a lot for the feedback!
I'll point out all the critical aspect with customer.
In case he will be willing to share more about the project I'll let you know and follow up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants