-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributeError when running OGset.py #59
Comments
Hi Sebastian It would be great if you could start over from an empty folder and run with I would be happy to try read2tree with your read dataset if it is possible to share it with us. Best, Ps. It would be also helpful to make sure that all the
|
Hi Sina, Thanks for your reply! I have run the same command with teh The I would really appreciate if you could try from your side with my input , but it appears the fastqs are a bit too big (376 MB) for github, is there a way I can share them with you via a drive or something else? Basically these are a 4% subset of the reads for samples from Kuderna et al.2023. I can also try and subset a smaller % of the reads. Let me know what suits you best. Thanks again! Sebastian |
Thanks. I assume you got the same error (the error appearing in terminal is not reported in the log file). Could you please try google drive to upload the your read set and share the link with me? |
Yes the error is the same from before. Here's the link for the folder, it contains the fastqs for one sample, the folder with the marker genes and the dna_ref fasta with all the OGs. Hope you can reproduce it from your side. Thanks again! Sebastian |
Thanks for providing the data. It seems that the sequencing coverage is very low to generate the consensus sequence for building the MSA and tree as all of the I can see there are around 2.6m reads in the sample so the sequencing coverage would be 0.2x. We had some cases that the we were able to infer trees with such coverage, however this depends on the sampling strategy of reads and the gene markers. In your case, very few of the reads were mapped to the 200 genes. Would it be possible for you to run with more reads? maybe a coverage of 5x, 5*3Gbp/300bp=50m reads. Best, |
Hi Sina, Thanks for running it from your side! Interesting, yes the coverage factor makes sense, specially given that there's only 200 markers. The sampling setup was originally designed to give as input to MitoFinder to eventually build mitochondrial phylogenies. 4% of a single fastq seemed to do the trick for most cases (also because the program used A LOT of memory and given it a higher proportion of reads would have been a computational burden). Also a small detail is that these reads have been trimmed with cutadapt. From what I've understood read2tree gets rid of the necessity of all these filtering procedures, so I don't know if that can affect something (or maybe not at all). I will try with sampling a higher proportion of reads to get to the numbers you propose and run read2tree again. I'll let you know if it works then. Thanks a lot again. Sebastian |
Hi Sina, I've been trying to run with different subsampling levels and even giving it the whole fastq (from a given lane, the samples are multiplexed so around 65M reads) but the error stays the same. I've also tried other samples to no avail. The Also the first step to generate the 01-03 directories should only be run once unless you change the set of OGs/species you build your reference with right? I can also get the full fastq for a given sample (no demultiplex) by converting alignments we already produced, to have full 30X coverage. But other than that I don't have much more ideas to explore. Thanks. Sebastian |
Thanks for the follow-up and sorry for late reply. I'm downloading the ERR10941432 dataset and I will run read2tree and update you afterwards. |
It looks working
could you also send us the Ps. this is the tree that I got (after rooting, output of IQTree/read2tree is not rooted), The leaves names are mentioned in https://omabrowser.org/oma/export_markers after selecting primates. The sample is grouped with the rest of species
|
Hi Sina, Apologies for the late reply. I've run again read2tree in a couple of different samples and it worked this time around with 65-69M reads! When it was failing before with the same samples I did delete the output directory but not the mplog file so I don't know if that could be the reason why it failed previously. My question now is, if read2tree is run in multispecies mode, and for one pair of sequences it fails. Should the mplogfile be deleted together with the respective subdirectories in the output directory before fixing the error and launching read2tree again? Thanks a lot for the help. I think with this last question resolved the issue can be closed. Sebastian |
You're welcome. Read2tree tries to parse the mplog.log file and output folder and estimate the progress and resume it. I'm not sure how it can handle such scenario. Sorry for that. @dvdylus who wrote the code might be aware of. Otherwise, I would find out by trial and error. |
Hi Sina,
I've been trying to run read2tree in multispecies mode using a custom set of marker genes (200) for 24 primate species I downloaded from OMA using the tutorial you've provided. The first step of building the 01-03 folders worked but when running with the reads of the first species, it abruptly stops after finishing the mapping step, just when it starts to run OGset.py.
Traceback (most recent call last): File "/scratch_isilon/groups/compgen/scuadros/r2t/bin/read2tree", line 4, in <module> __import__('pkg_resources').run_script('read2tree==0.1.5', 'read2tree') File "/scratch_isilon/groups/compgen/scuadros/r2t/lib/python3.10/site-packages/pkg_resources/__init__.py", line 706, in run_script self.require(requires)[0].run_script(script_name, ns) File "/scratch_isilon/groups/compgen/scuadros/r2t/lib/python3.10/site-packages/pkg_resources/__init__.py", line 1555, in run_script exec(script_code, namespace, namespace) File "/scratch_isilon/groups/compgen/scuadros/r2t/lib/python3.10/site-packages/read2tree-0.1.5-py3.10.egg/EGG-INFO/scripts/read2tree", line 16, in <module> File "/scratch_isilon/groups/compgen/scuadros/r2t/lib/python3.10/site-packages/read2tree-0.1.5-py3.10.egg/read2tree/main.py", line 360, in main File "/scratch_isilon/groups/compgen/scuadros/r2t/lib/python3.10/site-packages/read2tree-0.1.5-py3.10.egg/read2tree/OGSet.py", line 494, in add_mapped_seq AttributeError: 'Mapper' object has no attribute 'og_records'
I have installed read2tree from source, creating an environment with conda, installing all dependencies with conda and then downloading and installing read2tree. I tested the installation with the toydataset in
read2tree/tests
and it worked without problems.I've attached the mplogfile as well.
mplog.log
Lastly I'm running this in an Slurm Scheduled HPC, asking for 70 GB of memory.
I don't know if this is related to the fastqs I am giving as input, or the installation of read2tree. I would appreciate if you have an idea on this problem.
Many thanks!
Sebastian
The text was updated successfully, but these errors were encountered: