Information about Running Exomiser Without HPO-IDs and Adding CADD-SV Scores #585

poddarharsh15 · 2025-01-09T11:23:08Z

Thank you for maintaining Exomiser—it’s a great tool! I had a few questions regarding its usage:

Running Without HPO-IDs:
Is it possible to run Exomiser effectively without providing HPO terms? If yes, what configurations or settings would you recommend to ensure meaningful results?

Adding CADD-SV Scores:
How can I integrate CADD-SV scores into Exomiser to add more detail to structural variants? Are there specific steps or modifications needed to enable this? https://cadd-sv.bihealth.org/download (prescored files)

Improving Pathogenicity and Frequency Data:
When analyzing single proband or trio VCF files, I notice that some pathogenicity, frequency, and ClinVar data are often missing. I’m currently using database 2406. Could you suggest ways to refine the analysis for more comprehensive results?

I’ve attached example output files for your reference. Any advice on adjustments or alternative workflows would be greatly appreciated!

Thanks in advance for your help!

results.zip

julesjacobsen · 2025-01-10T17:41:04Z

Hi @poddarharsh15, thanks for getting in touch.

Running Without HPO-IDs:

Is it possible to run Exomiser effectively without providing HPO terms? If yes, what configurations or settings would you recommend to ensure meaningful results?

Maybe. It depends on what you're trying to do and what your input VCF is. Multi-sample VCFs won't be annotated correctly and the memory usage will be very high as this wasn't the way Exomiser was designed to be run. I presume you want the gnomAD, ClinVar and pathogenicity score annotations?

Adding CADD-SV Scores:

How can I integrate CADD-SV scores into Exomiser to add more detail to structural variants? Are there specific steps or modifications needed to enable this? https://cadd-sv.bihealth.org/download (prescored files)

You can't use these right now, but they will be good to add in a future release. I have opened a ticket - #587

Improving Pathogenicity and Frequency Data

When analyzing single proband or trio VCF files, I notice that some pathogenicity, frequency, and ClinVar data are often missing. I’m currently using database 2406. Could you suggest ways to refine the analysis for more comprehensive results?

I’ve attached example output files for your reference. Any advice on adjustments or alternative workflows would be greatly appreciated!

Can you include some detail about which variants were missing which annotations, please.

poddarharsh15 · 2025-01-13T11:47:28Z

Hi @julesjacobsen
Thank you for your response, Just to clarify, for my test runs, I’ve been using a VCF file containing data from a trio (father, mother, and proband). My setup was inspired by the example files provided in the Exomiser v14.0.0 package. However, I’m encountering an issue: I’m not getting any pathogenicity scores or frequency scores in the output, even though I’ve tried multiple sources in the .yml configuration file, such as REVEL, MVP, SIFT, ALPHA_MISSENSE, SPLICE_AI, and POLYPHEN.

To provide more context, I’ve attached the HTML file for an overview, along with the top 50 lines of the TSV output I generated. Any advice on what might be going wrong or suggestions to refine my workflow would be greatly appreciated!

Thanks in advance for your help!
UD_AN001_2-PASS_ONLY.zip
Variants.zip

YML_paramters

> ## Exomiser Analysis Template for multi-sample VCF files
> # These are all the possible options for running exomiser. Use this as a template for
> # your own set-up.
> analysis:
>     # hg19 or hg38 - ensure that the application has been configured to run the specified assembly otherwise it will halt.
>     genomeAssembly: hg38
>     vcf: examples/family.vcf.gz
>     ped: examples/family.ped
>     proband: UD_AN001_P
>     hpoIds: ['HP:0001561', 'HP:0001276', 'HP:0002371', 'HP:0025313','HP:0033725','HP:0002197']
>     # These are the default settings, with values representing the maximum minor allele frequency in percent (%) permitted for an
>     # allele to be considered as a causative candidate under that mode of inheritance.
>     # If you just want to analyse a sample under a single inheritance mode, delete/comment-out the others. For AUTOSOMAL_RECESSIVE
>     # or X_RECESSIVE ensure *both* relevant HOM_ALT and COMP_HET modes are present.
>     # In cases where you do not want any cut-offs applied an empty map should be used e.g. inheritanceModes: {}
>     inheritanceModes: {
>             AUTOSOMAL_DOMINANT: 0.1,
>             AUTOSOMAL_RECESSIVE_HOM_ALT: 0.1,
>             AUTOSOMAL_RECESSIVE_COMP_HET: 2.0,
>             X_DOMINANT: 0.1,
>             X_RECESSIVE_HOM_ALT: 0.1,
>             X_RECESSIVE_COMP_HET: 2.0,
>             MITOCHONDRIAL: 0.2
>     }
>     #FULL or PASS_ONLY
>     analysisMode: PASS_ONLY
>   # Possible frequencySources:
>   # UK10K - http://www.uk10k.org/ (UK10K)
>   # gnomAD - http://gnomad.broadinstitute.org/ (GNOMAD_E, GNOMAD_G)
>   # note that as of gnomAD v2.1 1000 genomes, ExAC are part of gnomAD
>   # as of gnomAD v4 TOPMed & ESP are also included in gnomAD
>     frequencySources: [
>         UK10K,
> 
>         GNOMAD_E_AFR,
>         GNOMAD_E_AMR,
>         GNOMAD_E_ASJ,
>         GNOMAD_E_EAS,
>         GNOMAD_E_FIN,
>         GNOMAD_E_NFE,
>         GNOMAD_E_OTH,
>         GNOMAD_E_SAS,
> 
>       #  GNOMAD_G_AFR,
>       #  GNOMAD_G_AMR,
>       #  GNOMAD_G_ASJ,
>       #  GNOMAD_G_EAS,
>       #  GNOMAD_G_FIN,
>       #   GNOMAD_G_NFE,
>       #  GNOMAD_G_OTH,
>       #  GNOMAD_G_SAS
>     ]
>   # Possible pathogenicitySources: (POLYPHEN, MUTATION_TASTER, SIFT), (REVEL, MVP), CADD, REMM, SPLICE_AI, ALPHA_MISSENSE
>   # REMM is trained on non-coding regulatory regions
>   # *WARNING* if you enable CADD or REMM ensure that you have downloaded and installed the CADD/REMM tabix files
>   # and updated their location in the application.properties. Exomiser will not run without this.
>     pathogenicitySources: [ REVEL, MVP, SIFT, ALPHA_MISSENSE, SPLICE_AI, POLYPHEN ]
>   # this is the standard exomiser order.
>   # all steps are optional
>     steps: [
>       #intervalFilter: {interval: 'chr10:123256200-123256300'},
>       # or for multiple intervals:
>       #intervalFilter: {intervals: ['chr10:123256200-123256300', 'chr10:123256290-123256350']},
>       # or using a BED file - NOTE this should be 0-based, Exomiser otherwise uses 1-based coordinates in line with VCF
>       #intervalFilter: {bed: /full/path/to/bed_file.bed},
>       #genePanelFilter: {geneSymbols: ['FGFR1','FGFR2']},
>       # geneBlacklistFilter: { },
>         failedVariantFilter: { },
>       #qualityFilter: {minQuality: 50.0},
>         variantEffectFilter: {
>           remove: [
>               FIVE_PRIME_UTR_INTRON_VARIANT,
>               NON_CODING_TRANSCRIPT_EXON_VARIANT,
>               UPSTREAM_GENE_VARIANT,
>               INTERGENIC_VARIANT,
>                 REGULATORY_REGION_VARIANT,
>                 CODING_TRANSCRIPT_INTRON_VARIANT,
>                 NON_CODING_TRANSCRIPT_INTRON_VARIANT,
>                 DOWNSTREAM_GENE_VARIANT
>               ]
>         },
>         #knownVariantFilter: {}, #removes variants represented in the database
>         frequencyFilter: {maxFrequency: 2.0},
>         pathogenicityFilter: {keepNonPathogenic: true},
>         #inheritanceFilter and omimPrioritiser should always run AFTER all other filters have completed
>         #they will analyse genes according to the specified modeOfInheritance above- UNDEFINED will not be analysed.
>         inheritanceFilter: {},
>         #omimPrioritiser isn't mandatory.
>         omimPrioritiser: {},
>         #priorityScoreFilter: {minPriorityScore: 0.4},
>         #Other prioritisers: Only combine omimPrioritiser with one of these.
>         #Don't include any if you only want to filter the variants.
>         hiPhivePrioritiser: {},
>         # or run hiPhive in benchmarking mode:
>         #hiPhivePrioritiser: {runParams: 'mouse'},
>         #phivePrioritiser: {}
>         phenixPrioritiser: {}
>         #exomeWalkerPrioritiser: {seedGeneIds: [11111, 22222, 33333]}
>     ]
> outputOptions:
>     outputContributingVariantsOnly: false
>     #numGenes options: 0 = all or specify a limit e.g. 500 for the first 500 results
>     numGenes: 50
>     # Path to the desired output directory. Will default to the 'results' subdirectory of the exomiser install directory
>     #outputDirectory: results
>     # Filename for the output files. Will default to {input-vcf-filename}-exomiser
>     outputFileName: UD_AN001_2-PASS_ONLY
>     #out-format options: HTML, JSON, TSV_GENE, TSV_VARIANT, VCF (default: HTML)
>     outputFormats: [HTML, TSV_GENE, TSV_VARIANT]

poddarharsh15 changed the title ~~Information about adding CADD-SV scores and running exomiser without HPO-IDs~~ Information about Running Exomiser Without HPO-IDs and Adding CADD-SV Scores Jan 9, 2025

julesjacobsen mentioned this issue Jan 10, 2025

Users want to use Exomiser as an annotation tool... #498

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Information about Running Exomiser Without HPO-IDs and Adding CADD-SV Scores #585

Information about Running Exomiser Without HPO-IDs and Adding CADD-SV Scores #585

poddarharsh15 commented Jan 9, 2025

julesjacobsen commented Jan 10, 2025

Running Without HPO-IDs:

Adding CADD-SV Scores:

Improving Pathogenicity and Frequency Data

poddarharsh15 commented Jan 13, 2025

Information about Running Exomiser Without HPO-IDs and Adding CADD-SV Scores #585

Information about Running Exomiser Without HPO-IDs and Adding CADD-SV Scores #585

Comments

poddarharsh15 commented Jan 9, 2025

julesjacobsen commented Jan 10, 2025

Running Without HPO-IDs:

Adding CADD-SV Scores:

Improving Pathogenicity and Frequency Data

poddarharsh15 commented Jan 13, 2025