Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue merging many large bsseq objects with biscuiteer::unionize() #25

Open
deanpettinga opened this issue Apr 20, 2020 · 5 comments
Open

Comments

@deanpettinga
Copy link

Working on my first analysis w/ BISCUIT/biscuiteer, but I’ve encountered some issues handling the data. I have 20 gzip/tabix’d VCFs (15-20Gb each) with accompanying bed.gz files. Biscuiteer seems to be working just fine with small/toy datasets. However, I’ve been having issues merging all these samples into a single bsseq object. I think part of the issue is simply due to the large sample number and the amount of data for each sample. I have attempted to solve this issue with two approaches that have failed thus far:

  1. biscuiteer::readBiscuit() for each sample individually and then use biscuiteer::unionize() to get a single object.
  2. Merge vcf.gz and bed.gz files on the command line and then import together using biscuiteer::readBiscuit()

Do you have any advice for a better/ideal approach in this situation?

thanks in advance!

@ttriche
Copy link
Member

ttriche commented Apr 20, 2020 via email

@ttriche
Copy link
Member

ttriche commented Apr 20, 2020 via email

@deanpettinga
Copy link
Author

Tim,

I suppose this isn't so much of an issue as a question, hence, my lack of sessionInfo() and error message. I think the package is working as intended. I was just hoping to understand best-practice when it comes to improving performance/speed.

I'll move forward with your suggestion of jointly calling variants with BISCUIT into a single VCF. Feel free to close this issue unless you'd like further info from my experience.

thanks much!
Dean

@ttriche
Copy link
Member

ttriche commented Apr 20, 2020 via email

@deanpettinga
Copy link
Author

just to clarify, i don't have any errors. I'm not used to handling objects of this magnitude in R, so i was just looking for direction regarding an optimal approach :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants