Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How many peaks should I use for the input of scATAC-seq data? #15

Open
Smilenone opened this issue Jul 27, 2024 · 1 comment
Open

How many peaks should I use for the input of scATAC-seq data? #15

Smilenone opened this issue Jul 27, 2024 · 1 comment

Comments

@Smilenone
Copy link

I found it very slow when I used a 30k scATACK-seq data with top 50k peaks, how many peaks should I use for the input of scATAC-seq data?

@PeterZZQ
Copy link
Owner

Yes, the running time of the model depends on the number of features (especially the peaks) you used in the data, because scDART builds a larger neural network when the number of peaks is larger. That is why we did some peak filtering before running the model.

To improve the running speed of the model, you can

  1. reduce the size of each mini-batch when training scDART.
  2. select the highly variable peaks and reduce the peak number
  3. Bin the closely located peaks into a larger peak and reduce the overall peak numbers.

There is no recommended number of peaks for scATAC-seq data, fewer peaks can make the model run faster but can also cause the loss of important biological information. There is definitely a trade-off and it heavily depends on the sequencing quality of your scATAC-seq data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants