Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REHEADER gzip #127

Open
EladH1 opened this issue Jan 15, 2025 · 3 comments
Open

REHEADER gzip #127

EladH1 opened this issue Jan 15, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@EladH1
Copy link
Contributor

EladH1 commented Jan 15, 2025

Description of the bug

The current version of REHEADER require vcf and not vcf.gz
also I believe the the test HG002_GRCh38_CMRG_smallvar_v1.00.vcf.gz you have is gunzip and not gzip.

I think you could repreduce this by

  1. download the GIAB from https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/
    or

gunzip HG002_GRCh38_CMRG_smallvar_v1.00.vcf.gz
gzip HG002_GRCh38_CMRG_smallvar_v1.00.vcf

on commit of bcftools/reheader
https://github.com/nf-core/modules/commit/c32611ac6813055b9321d2827678e2f8aebcb394
they do

`
def create_cmd = extension.endsWith(".gz") ? "echo '' | gzip >" : "touch"

${create_cmd} ${prefix}.${extension}
`

not sure if this helps

`ERROR ~ Error executing process > 'NFCORE_VARIANTBENCHMARKING:VARIANTBENCHMARKING:PREPARE_VCFS_TRUTH:VCF_REHEADER_SAMPLENAME:BCFTOOLS_REHEADER (GIAB-NA24385)'

Caused by:
Process NFCORE_VARIANTBENCHMARKING:VARIANTBENCHMARKING:PREPARE_VCFS_TRUTH:VCF_REHEADER_SAMPLENAME:BCFTOOLS_REHEADER (GIAB-NA24385) terminated with an error exit status (255)

Command executed:

bcftools
reheader
--fai GRCh38_latest_genomic_final.fa.fai


--samples GIAB-NA24385.txt
--threads 2
HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
| bcftools view
--output-type z
--output HG002_GRCh38_1_22_v4.2.1_benchmark.rh.vcf.gz

cat <<-END_VERSIONS > versions.yml
"NFCORE_VARIANTBENCHMARKING:VARIANTBENCHMARKING:PREPARE_VCFS_TRUTH:VCF_REHEADER_SAMPLENAME:BCFTOOLS_REHEADER":
bcftools: $(bcftools --version 2>&1 | head -n1 | sed 's/^.bcftools //; s/ .$//')
END_VERSIONS

Command exit status:
255

Command output:
(empty)

Command error:
Error: cannot reheader gzip-compressed files, first convert with bcftools view --output-type to a supported format
Failed to read from standard input: unknown file type`

Command used and terminal output

Relevant files

No response

System information

No response

@EladH1 EladH1 added the bug Something isn't working label Jan 15, 2025
@kubranarci
Copy link
Contributor

@EladH1 Hey, can you explain this further? I think this is not through one of test profiles? If that is the case, I would recommend to apply preprocesses before running the pipeline anyways. It is not easy to standardize preprocessing for each type..

@EladH1
Copy link
Contributor Author

EladH1 commented Jan 17, 2025

I can try..

if you 'file' on your TEST vcf you will see this":
file HG002.strelka.variants.chr21.vcf.gz
HG002.strelka.variants.chr21.vcf.gz: gzip compressed data, extra field

when I run it on my ref or the files of GIAB I don't see this extra field.
if you do
gunzip HG002.strelka.variants.chr21.vcf.gz
gzip HG002.strelka.variants.chr21.vcf

then, re-run I think you ill get the error I got :

"cannot reheader gzip-compressed files"

REHEADER had a change to introduce working with gzip...
but this is not in the current version.

@kubranarci
Copy link
Contributor

I am updating bcftools reheader to 1.2 in this recent https://github.com/nf-core/variantbenchmarking/pull/128/files pull. But the problem is still not clear to me. Why changing truth GIAB vcfs will affect reheadering test vcfs.. Can you please add the full error files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants