-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Value Error in prune_ngsLD.py #43
Comments
Do you have any missing values on your input file? |
Yes, I generate LD_input file from beagle file. And I guess I do have some NaNs in there. |
If this helps, these are my steps ... angsd -GL 2 -out $OUT/chr25_gl -minMapQ 30 -minQ 20 -nThreads 10 -doGlf 2 -doMajorMinor 1 -SNP_pval 1e-6 -doMaf 1 -bam $BAM\ #2.make LD input file ngsLD --geno chr25_gl.beagle.gz --probs --ignore_miss_data --rnd_sample 0.05 --seed 1 --n_ind 20 --n_sites 78000 --pos sites --out LD_input #3. run py script for pruning prune_ngsLD.py --input LD_input --max_dist 50000 --min_weight 0.1 --out testLD_unlinked.pos |
Can you paste here some lines where you have missing data? |
25:449 25:3558 3109 0.135633 -0.000000 nan nan |
I suspect that the problem is that some of those sites are not variable. If so, you can try increasing the option If not, can you send me the information for sites |
Thanks for your reply, Mafs Beagle Mafs Beagle |
What is the coverage of these samples? |
The coverage is ~10x and we want to run analysis also with 4x and 2x coverage in the future. Checking if file is gzipped... |
But with 10x coverage, how do you have so much missing data (those two sites have 70% missing data)?! Is it whole genome or targeted sequencing? |
yeah 70% seems a lot. It's whole genome sequencing, I used only one chromosome just to check if the script is running ok. |
Could it be some repetitive regions? If so, maybe it is good to use a |
So, I tested with -minind 10 (I have 20 ind overall), and then I ran ngsLD with min_maf 0.03 and no missing data, and I still get the same error... Quick explanation: Those 20 samples are from 2 populations (10 ind per pop) with Fst 2%. The species I'm working with has low diversity anyways. Despite this issue, should the py script ignore NaNs? Is there some way to circumvent this issue? When I deleted lines with nan, the py script worked just fine. |
It should be ok to remove Just noticed that it seems you run |
Ah sorry, I guess it got lost in copy pasting, I have a bamlist file with all 20 bam files:) Thanks a lot for your quick replies! |
If you used Can you also paste here the extended output of some sites with missing data? I also just made a faster alternative to the pruning scripts; do you think you could give it a try and see how it goes? |
Hi again:)
I am trying now to run prune_ngsLD.py script, but I get an error:
Checking if file is gzipped...
Reading in data...
Filtering edges by distance...
Filtering edges by weight...
Beginning pruning by dropping heaviest position...
Starting with 77988 positions with 393573 edges between them...
Traceback (most recent call last):
File "/home/ubuntu/ngsLD/scripts/prune_ngsLD.py", line 160, in
map_property_values(G.ep["weight"], edge_weight, lambda x: int(x * weight_precision))
File "/home/ubuntu/anaconda3/envs/ngsLD_N/lib/python3.11/site-packages/graph_tool/init.py", line 1191, in map_property_values
libcore.property_map_values(u._Graph__graph,
File "/home/ubuntu/ngsLD/scripts/prune_ngsLD.py", line 160, in
map_property_values(G.ep["weight"], edge_weight, lambda x: int(x * weight_precision))
^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: cannot convert float NaN to integer
Would you have any suggestion and ideas how to fix this issue?
Thanks in advance
The text was updated successfully, but these errors were encountered: