Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation during scanning for optimal CRF weight for certain chromosomes only #19

Open
rozaimirazali opened this issue May 29, 2019 · 4 comments

Comments

@rozaimirazali
Copy link

Hello.

For chromosome 6 to 22, it works fine. But I am getting segmentation fault only for the first 5 human chromosomes.

Loading genetic map for chromosome 1 ... done
Mapping samples ... 8720 samples combined
Scanning input VCFs for common SNPs on chromosome 1 ... 1994754 SNPs
Loading haplotypes... done
Defining and initializing conditional random field...
setting up CRF points and random forest windows...
computing random forest window spacing overlay...
initializing apriori reference subpop across CRF...
setting up random forest probability estimation arrays... done
Defining and initializing conditional random field... done
2531889212 (7.3%) variant alleles 0 (0.0%) missing alleles

Generating internal simulation samples...
Internally simulated 1132 samples from 263 randomly selected reference parents.

Scanning for optimal CRF Weight....
/home/rmohamadra_qgp/.lsbatch/1558862753.183980: line 8: 218012 Segmentation fault (core dumped) rfmix --query-file=qgp_chrall.vcf.gz --reference-file=1kgp_snp_only.vcf.gz --sample-map=subpop_1kg.txt --genetic-map=genetic_map_hg19_withX_3col.txt --output-basename=REF_1kg_QUERY_qgp_chr1 --chromosome=1 --n-threads=143

It creates a core dumped file -> core.XXXXX

Initially, I thought it was due to memory but the maximum amount of memory it uses was way below what is available. This is the LSF summary of the job:

Exited with exit code 139.

Resource usage summary:

CPU time :                                   1554621.88 sec.
Max Memory :                                 646447 MB
Average Memory :                             456611.47 MB
Total Requested Memory :                     -
Delta Memory :                               -
Max Swap :                                   -
Max Processes :                              5
Max Threads :                                147
Run time :                                   109857 sec.
Turnaround time :                            110020 sec.

My command :
rfmix --query-file=qgp_chrall.vcf.gz --reference-file=1kgp_snp_only.vcf.gz --sample-map=subpop_1kg.txt --genetic-map=genetic_map_hg19_withX_3col.txt --output-basename=REF_1kg_QUERY_qgp_chr1 --chromosome=1 --n-threads=143

As I mentioned earlier, I am getting the error only for chr1 until chr5 and the rest works fine. Any idea what is going?

@vicbp1
Copy link

vicbp1 commented Sep 8, 2020

I have had the same problem. My data was 33551 individuals with 248611 SNPs (for one chr) and is not working after setting 300GB RAM

@xumousheng
Copy link

Has this issue been resolved? I have core dumps as well. The cause might be a little bit different, though: segmentation faults only occur only "-e 1" is used. If the same command excluding "-e 1" is used, everything works fine. Memory usage is preset to be 250GB.

@lmtani
Copy link

lmtani commented May 25, 2021

Similar problem here. I'm running on 100 samples and some of the chromosomes works fine, but for others we got Killed.

Also, I'm using this docker image with rfmix: quay.io/biocontainers/rfmix:2.03.r0.9505bfa--h1b792b2_2

RFMIX v2.03-r0 - Local Ancestry and Admixture Inference
(c) 2016, 2017 Mark Koni Hamilton Wright
Bustamante Lab - Stanford University School of Medicine
Based on concepts developed in RFMIX v1 by Brian Keith Maples, et al.

This version is licensed for non-commercial academic research use only
For commercial licensing, please contact [email protected]

--- For use in scientific publications please cite original publication ---
Brian Maples, Simon Gravel, Eimear E. Kenny, and Carlos D. Bustamante (2013).
RFMix: A Discriminative Modeling Approach for Rapid and Robust Local-Ancestry
Inference. Am. J. Hum. Genet. 93, 278-288


Loading genetic map for chromosome chr16 ...  done
Mapping samples ... 4223 samples combined
Scanning input VCFs for common SNPs on chromosome chr16 ...   16379 SNPs
Loading haplotypes... 
Warning: chr16.vcf.gz - 354431 unphased genotypes treated as phased
done
Defining and initializing conditional random field...  
   setting up CRF points and random forest windows... 
   computing random forest window spacing overlay... 
   initializing apriori reference subpop across CRF... 
   setting up random forest probability estimation arrays... done
Defining and initializing conditional random field...   done
29264344 (21.2%) variant alleles	134640 (0.1%) missing alleles

Generating internal simulation samples...    
Internally simulated 1200 samples from 416 randomly selected reference parents.
/cromwell_root/script: line 31:    22 Killed                  rfmix -f subset-query.vcf.gz -r chr16.vcf.gz -m SuperPopulationMap.txt -g chr16.b38.gmap -o rfmix --chromosome=chr16

Do you have any updates on this issue?

@xumousheng
Copy link

Similar problem here. I'm running on 100 samples and some of the chromosomes works fine, but for others we got Killed.

Also, I'm using this docker image with rfmix: quay.io/biocontainers/rfmix:2.03.r0.9505bfa--h1b792b2_2

RFMIX v2.03-r0 - Local Ancestry and Admixture Inference
(c) 2016, 2017 Mark Koni Hamilton Wright
Bustamante Lab - Stanford University School of Medicine
Based on concepts developed in RFMIX v1 by Brian Keith Maples, et al.

This version is licensed for non-commercial academic research use only
For commercial licensing, please contact [email protected]

--- For use in scientific publications please cite original publication ---
Brian Maples, Simon Gravel, Eimear E. Kenny, and Carlos D. Bustamante (2013).
RFMix: A Discriminative Modeling Approach for Rapid and Robust Local-Ancestry
Inference. Am. J. Hum. Genet. 93, 278-288


Loading genetic map for chromosome chr16 ...  done
Mapping samples ... 4223 samples combined
Scanning input VCFs for common SNPs on chromosome chr16 ...   16379 SNPs
Loading haplotypes... 
Warning: chr16.vcf.gz - 354431 unphased genotypes treated as phased
done
Defining and initializing conditional random field...  
   setting up CRF points and random forest windows... 
   computing random forest window spacing overlay... 
   initializing apriori reference subpop across CRF... 
   setting up random forest probability estimation arrays... done
Defining and initializing conditional random field...   done
29264344 (21.2%) variant alleles	134640 (0.1%) missing alleles

Generating internal simulation samples...    
Internally simulated 1200 samples from 416 randomly selected reference parents.
/cromwell_root/script: line 31:    22 Killed                  rfmix -f subset-query.vcf.gz -r chr16.vcf.gz -m SuperPopulationMap.txt -g chr16.b38.gmap -o rfmix --chromosome=chr16

Do you have any updates on this issue?

My guess is that the core dump has nothing to do with the number of samples you are painting, since rfmix paint sample one by one. It looks like the core dump has something to do with the number of SNPs used. Bigger chromosomes have more SNPs and require more memory and thus cause problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants