Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEQLinkage output run in MERLIN results in "Requires impossible recombination pattern" Error #19

Open
oleraj opened this issue Mar 26, 2020 · 4 comments

Comments

@oleraj
Copy link

oleraj commented Mar 26, 2020

Hi,

I'm running SEQLinkage on 6 families with an autosomal dominant trait we have previously mapped. I'm using this set of families to test SEQLinkage to see if we can recapitulate the results.
I have no issues running SEQLinkage:

seqlink --fam ${base}.ped --vcf ${base}.recode.ex.16.vcf.gz --freq ExAC_AF -f MERLIN --blueprint ../SEQLinkage-master/data/genemap.hg19.txt
...
MESSAGE: Checking local resources 5/5 ...                                                                
MESSAGE: 31 samples found in ...
MESSAGE: 9 samples found in FAM file but not in VCF file:
Gma10
Gma17
Dad9
Dad18
Gpa9
mGma22
Dad10
Mom11
pGma22
MESSAGE: 6 families with a total of 31 samples will be scanned for 28,488 pre-defined units
MESSAGE: 573 units (from 2,271 variants) processed; 0 Mendelian inconsistencies and 424 recombination events handled
MESSAGE: 27,907 units ignored due to absence in VCF file
MESSAGE: 8 units ignored due to absence of variation in samples
MESSAGE: Archiving regional marker data to directory 
MESSAGE: 573 units will be converted to MERLIN format
MESSAGE: 573 units successfully converted to MERLIN format
MESSAGE: Archiving MERLIN format to directory ...
MESSAGE: Saving data to ...

However when I run the results in MERLIN all of the families are skipped:

i=16; merlin -d LINKAGE/MERLIN/LINKAGE.chr${i}.dat -p LINKAGE/MERLIN/LINKAGE.chr${i}.ped -m LINKAGE/MERLIN/LINKAGE.chr${i}.map --model model | awk '$0 ~ /Para|HLOD/ || $4 > 1.0 {print}'; 
References for this version of Merlin:
The following parameters are in effect:
                     Data File : LINKAGE/MERLIN/LINKAGE.chr16.dat (-dname)
                 Pedigree File : LINKAGE/MERLIN/LINKAGE.chr16.ped (-pname)
            Missing Value Code :         -99.999 (-xname)
                      Map File : LINKAGE/MERLIN/LINKAGE.chr16.map (-mname)
            Allele Frequencies : ALL INDIVIDUALS (-f[a|e|f|m|file])
                   Random Seed :          123456 (-r9999)
          Limits : --bits [24], --megabytes, --minutes
Estimating allele frequencies... [using all genotypes]
Done estimating frequencies for 703 markers
Analysis models will be retrieved from file [model]...
Table processed. 1 models recognized
Family: Family-22 - Founders: 5  - Descendants: 7  - Bits: 10      
  SKIPPED: Requires impossible recombination pattern
Family: Family-868 - Founders: 3  - Descendants: 4  - Bits: 5 
  SKIPPED: Requires impossible recombination pattern
Family: Family-2435 - Founders: 3  - Descendants: 3  - Bits: 3 
  SKIPPED: Requires impossible recombination pattern
Family: Family-2439 - Founders: 3  - Descendants: 3  - Bits: 3 
  SKIPPED: Requires impossible recombination pattern
Family: Family-2443 - Founders: 3  - Descendants: 2  - Bits: 2 
  SKIPPED: Requires impossible recombination pattern
Family: Family-2452 - Founders: 2  - Descendants: 2  - Bits: 2 
  SKIPPED: Requires impossible recombination pattern
Parametric Analysis, Model Rare_Dominant
       POSITION        LOD      ALPHA       HLOD

The documentation for MERLIN says this impossible recombination pattern error is often a result of multiple markers having the same position, which is indeed the case in the SEQLinkage output. For example:

grep PTX4 LINKAGE/MERLIN/LINKAGE.chr16.map
16	PTX4[1]	5.180964049091157	9.445039156840059	1.1235577008138633
16	PTX4[2]	5.180964049091157	9.445039156840059	1.1235577008138633

It appears that some of the markers are being split into two separate bins, but the position is the same for both, which is apparently a problem for MERLIN. It seems that SEQLinkage should construct a new position for each split marker based to avoid issues with MERLIN. I'm using the default option for --bin, which uses LD to bin the variants. Is there any way to use this LD binning method that will produce output suitable for MERLIN, i.e., different position for each marker?

Thanks!

Andrew

@oleraj
Copy link
Author

oleraj commented Mar 26, 2020

I also tried using --bin 0 option and that also splits the markers so it appears the --bin option is not being recognized or I'm misunderstanding how to use it. --bin 1 produces identical results as well.

grep PTX4 LINKAGE/MERLIN/LINKAGE.chr16.map
16	PTX4[1]	5.180964049091157	9.445039156840059	1.1235577008138633
16	PTX4[2]	5.180964049091157	9.445039156840059	1.1235577008138633

I would like to have optimized binning into haplotypes with a separate position for each haplotype. Any suggestions on how to do that would be appreciated.

@gaow
Copy link
Owner

gaow commented Mar 26, 2020

Thanks @oleraj for the report. This issue is related to recombination events which we most recently discussed in a related project:

statgenetics/rvnpl#2

We should make some updates this week on that issue, depending on @percylinhai 's schedule. I suppose we can try fix the genetic map distance along the line. I'll discuss with @percylinhai and make adjustments. Options --bin happens after these sub-regional markers are generated.

@oleraj
Copy link
Author

oleraj commented Mar 26, 2020

Great, thanks for the reply and glad to hear you're planning some updates on this.

@gaow
Copy link
Owner

gaow commented Apr 29, 2020

@oleraj sorry we've done this a while ago but forgot to update you here. @percylinhai has implemented a fix that should give different genetic distance for subregions, based on their physical positions. Please try pulling our docker images again and test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants