Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Information about Running Exomiser Without HPO-IDs and Adding CADD-SV Scores #585

Open
poddarharsh15 opened this issue Jan 9, 2025 · 2 comments

Comments

@poddarharsh15
Copy link

Hi @julesjacobsen

Thank you for maintaining Exomiser—it’s a great tool! I had a few questions regarding its usage:

Running Without HPO-IDs:
Is it possible to run Exomiser effectively without providing HPO terms? If yes, what configurations or settings would you recommend to ensure meaningful results?

Adding CADD-SV Scores:
How can I integrate CADD-SV scores into Exomiser to add more detail to structural variants? Are there specific steps or modifications needed to enable this? https://cadd-sv.bihealth.org/download (prescored files)

Improving Pathogenicity and Frequency Data:
When analyzing single proband or trio VCF files, I notice that some pathogenicity, frequency, and ClinVar data are often missing. I’m currently using database 2406. Could you suggest ways to refine the analysis for more comprehensive results?

I’ve attached example output files for your reference. Any advice on adjustments or alternative workflows would be greatly appreciated!

Thanks in advance for your help!

results.zip

@poddarharsh15 poddarharsh15 changed the title Information about adding CADD-SV scores and running exomiser without HPO-IDs Information about Running Exomiser Without HPO-IDs and Adding CADD-SV Scores Jan 9, 2025
@julesjacobsen
Copy link
Contributor

Hi @poddarharsh15, thanks for getting in touch.

Running Without HPO-IDs:

Is it possible to run Exomiser effectively without providing HPO terms? If yes, what configurations or settings would you recommend to ensure meaningful results?

Maybe. It depends on what you're trying to do and what your input VCF is. Multi-sample VCFs won't be annotated correctly and the memory usage will be very high as this wasn't the way Exomiser was designed to be run. I presume you want the gnomAD, ClinVar and pathogenicity score annotations?

Adding CADD-SV Scores:

How can I integrate CADD-SV scores into Exomiser to add more detail to structural variants? Are there specific steps or modifications needed to enable this? https://cadd-sv.bihealth.org/download (prescored files)

You can't use these right now, but they will be good to add in a future release. I have opened a ticket - #587

Improving Pathogenicity and Frequency Data

When analyzing single proband or trio VCF files, I notice that some pathogenicity, frequency, and ClinVar data are often missing. I’m currently using database 2406. Could you suggest ways to refine the analysis for more comprehensive results?

I’ve attached example output files for your reference. Any advice on adjustments or alternative workflows would be greatly appreciated!

Can you include some detail about which variants were missing which annotations, please.

@poddarharsh15
Copy link
Author

Hi @julesjacobsen
Thank you for your response, Just to clarify, for my test runs, I’ve been using a VCF file containing data from a trio (father, mother, and proband). My setup was inspired by the example files provided in the Exomiser v14.0.0 package. However, I’m encountering an issue: I’m not getting any pathogenicity scores or frequency scores in the output, even though I’ve tried multiple sources in the .yml configuration file, such as REVEL, MVP, SIFT, ALPHA_MISSENSE, SPLICE_AI, and POLYPHEN.

To provide more context, I’ve attached the HTML file for an overview, along with the top 50 lines of the TSV output I generated. Any advice on what might be going wrong or suggestions to refine my workflow would be greatly appreciated!

Thanks in advance for your help!
UD_AN001_2-PASS_ONLY.zip
Variants.zip

YML_paramters

> ## Exomiser Analysis Template for multi-sample VCF files
> # These are all the possible options for running exomiser. Use this as a template for
> # your own set-up.
> analysis:
>     # hg19 or hg38 - ensure that the application has been configured to run the specified assembly otherwise it will halt.
>     genomeAssembly: hg38
>     vcf: examples/family.vcf.gz
>     ped: examples/family.ped
>     proband: UD_AN001_P
>     hpoIds: ['HP:0001561', 'HP:0001276', 'HP:0002371', 'HP:0025313','HP:0033725','HP:0002197']
>     # These are the default settings, with values representing the maximum minor allele frequency in percent (%) permitted for an
>     # allele to be considered as a causative candidate under that mode of inheritance.
>     # If you just want to analyse a sample under a single inheritance mode, delete/comment-out the others. For AUTOSOMAL_RECESSIVE
>     # or X_RECESSIVE ensure *both* relevant HOM_ALT and COMP_HET modes are present.
>     # In cases where you do not want any cut-offs applied an empty map should be used e.g. inheritanceModes: {}
>     inheritanceModes: {
>             AUTOSOMAL_DOMINANT: 0.1,
>             AUTOSOMAL_RECESSIVE_HOM_ALT: 0.1,
>             AUTOSOMAL_RECESSIVE_COMP_HET: 2.0,
>             X_DOMINANT: 0.1,
>             X_RECESSIVE_HOM_ALT: 0.1,
>             X_RECESSIVE_COMP_HET: 2.0,
>             MITOCHONDRIAL: 0.2
>     }
>     #FULL or PASS_ONLY
>     analysisMode: PASS_ONLY
>   # Possible frequencySources:
>   # UK10K - http://www.uk10k.org/ (UK10K)
>   # gnomAD - http://gnomad.broadinstitute.org/ (GNOMAD_E, GNOMAD_G)
>   # note that as of gnomAD v2.1 1000 genomes, ExAC are part of gnomAD
>   # as of gnomAD v4 TOPMed & ESP are also included in gnomAD
>     frequencySources: [
>         UK10K,
> 
>         GNOMAD_E_AFR,
>         GNOMAD_E_AMR,
>         GNOMAD_E_ASJ,
>         GNOMAD_E_EAS,
>         GNOMAD_E_FIN,
>         GNOMAD_E_NFE,
>         GNOMAD_E_OTH,
>         GNOMAD_E_SAS,
> 
>       #  GNOMAD_G_AFR,
>       #  GNOMAD_G_AMR,
>       #  GNOMAD_G_ASJ,
>       #  GNOMAD_G_EAS,
>       #  GNOMAD_G_FIN,
>       #   GNOMAD_G_NFE,
>       #  GNOMAD_G_OTH,
>       #  GNOMAD_G_SAS
>     ]
>   # Possible pathogenicitySources: (POLYPHEN, MUTATION_TASTER, SIFT), (REVEL, MVP), CADD, REMM, SPLICE_AI, ALPHA_MISSENSE
>   # REMM is trained on non-coding regulatory regions
>   # *WARNING* if you enable CADD or REMM ensure that you have downloaded and installed the CADD/REMM tabix files
>   # and updated their location in the application.properties. Exomiser will not run without this.
>     pathogenicitySources: [ REVEL, MVP, SIFT, ALPHA_MISSENSE, SPLICE_AI, POLYPHEN ]
>   # this is the standard exomiser order.
>   # all steps are optional
>     steps: [
>       #intervalFilter: {interval: 'chr10:123256200-123256300'},
>       # or for multiple intervals:
>       #intervalFilter: {intervals: ['chr10:123256200-123256300', 'chr10:123256290-123256350']},
>       # or using a BED file - NOTE this should be 0-based, Exomiser otherwise uses 1-based coordinates in line with VCF
>       #intervalFilter: {bed: /full/path/to/bed_file.bed},
>       #genePanelFilter: {geneSymbols: ['FGFR1','FGFR2']},
>       # geneBlacklistFilter: { },
>         failedVariantFilter: { },
>       #qualityFilter: {minQuality: 50.0},
>         variantEffectFilter: {
>           remove: [
>               FIVE_PRIME_UTR_INTRON_VARIANT,
>               NON_CODING_TRANSCRIPT_EXON_VARIANT,
>               UPSTREAM_GENE_VARIANT,
>               INTERGENIC_VARIANT,
>                 REGULATORY_REGION_VARIANT,
>                 CODING_TRANSCRIPT_INTRON_VARIANT,
>                 NON_CODING_TRANSCRIPT_INTRON_VARIANT,
>                 DOWNSTREAM_GENE_VARIANT
>               ]
>         },
>         #knownVariantFilter: {}, #removes variants represented in the database
>         frequencyFilter: {maxFrequency: 2.0},
>         pathogenicityFilter: {keepNonPathogenic: true},
>         #inheritanceFilter and omimPrioritiser should always run AFTER all other filters have completed
>         #they will analyse genes according to the specified modeOfInheritance above- UNDEFINED will not be analysed.
>         inheritanceFilter: {},
>         #omimPrioritiser isn't mandatory.
>         omimPrioritiser: {},
>         #priorityScoreFilter: {minPriorityScore: 0.4},
>         #Other prioritisers: Only combine omimPrioritiser with one of these.
>         #Don't include any if you only want to filter the variants.
>         hiPhivePrioritiser: {},
>         # or run hiPhive in benchmarking mode:
>         #hiPhivePrioritiser: {runParams: 'mouse'},
>         #phivePrioritiser: {}
>         phenixPrioritiser: {}
>         #exomeWalkerPrioritiser: {seedGeneIds: [11111, 22222, 33333]}
>     ]
> outputOptions:
>     outputContributingVariantsOnly: false
>     #numGenes options: 0 = all or specify a limit e.g. 500 for the first 500 results
>     numGenes: 50
>     # Path to the desired output directory. Will default to the 'results' subdirectory of the exomiser install directory
>     #outputDirectory: results
>     # Filename for the output files. Will default to {input-vcf-filename}-exomiser
>     outputFileName: UD_AN001_2-PASS_ONLY
>     #out-format options: HTML, JSON, TSV_GENE, TSV_VARIANT, VCF (default: HTML)
>     outputFormats: [HTML, TSV_GENE, TSV_VARIANT]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants