You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a bit of a niche case, but came up when I was trying to create some consensus VCFs using trios. Here's a minimal example:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG002 HG003
chr1 219783256 . AAC ACAC,A . . SYNC=219783257;BASE=FN_CA;CALL=FP_CA;AN=4;AC=1,2 GT 2/1 ./.
chr1 219783256 . A ACACACACACACAC . . SYNC=219783257;CALL=FP_CA;AN=2;AC=1 GT ./. 0/1
Here we have a two-sample VCF, with two sites with the same POS. The key point is the second entry shows HG003 having GT 0/1, wheras the first as missing ./.. The meanings are different: the first signifies some amount of confidence in seeing both reference and alt supporting reads, whereas the latter reflects uncertainty about how those alts relate to the sample. When running bcftools norm -m +any on this, we get the following:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG002 HG003
chr1 219783256 . AAC ACAC,A,ACACACACACACACAC . . SYNC=219783257;BASE=FN_CA;CALL=FP_CA;AN=4;AC=1,2,1 GT 2/1 ./3
Now the HG003 GT at this one site is ./3 (corresponding to ./1 with the previous order). In other words, the confidence about seeing a reference haplotype here is "lost" in the normalization. I would consider this a bug, since the tool is deciding to throw away confidence about matching the reference and replacing with some uncertainty. This is actually an important distinction, as this example shows that setting HG003 to have GT 0/3 here (matching the original) is a definitive mendelian violation compared with the 2/1 GT in the son, whereas ./3 leaves open the possibility that an MV hasn't occurred. Some tools are sensitive to this, and I'd expect the behavior involving 0/3 if I didn't catch this example. It's also worth noting if you throw away the other sample (make it a single sample VCF), the behavior is as I expected: the site collapses into a 0/1 site.
Can anyone confirm if this is expected behavior (and why), or if this is a bug? This was done with bcftools v1.20.
The text was updated successfully, but these errors were encountered:
This is a bit of a niche case, but came up when I was trying to create some consensus VCFs using trios. Here's a minimal example:
Here we have a two-sample VCF, with two sites with the same POS. The key point is the second entry shows HG003 having GT
0/1
, wheras the first as missing./.
. The meanings are different: the first signifies some amount of confidence in seeing both reference and alt supporting reads, whereas the latter reflects uncertainty about how those alts relate to the sample. When runningbcftools norm -m +any
on this, we get the following:Now the HG003 GT at this one site is
./3
(corresponding to./1
with the previous order). In other words, the confidence about seeing a reference haplotype here is "lost" in the normalization. I would consider this a bug, since the tool is deciding to throw away confidence about matching the reference and replacing with some uncertainty. This is actually an important distinction, as this example shows that setting HG003 to have GT0/3
here (matching the original) is a definitive mendelian violation compared with the2/1
GT in the son, whereas./3
leaves open the possibility that an MV hasn't occurred. Some tools are sensitive to this, and I'd expect the behavior involving0/3
if I didn't catch this example. It's also worth noting if you throw away the other sample (make it a single sample VCF), the behavior is as I expected: the site collapses into a0/1
site.Can anyone confirm if this is expected behavior (and why), or if this is a bug? This was done with
bcftools v1.20
.The text was updated successfully, but these errors were encountered: