Skip to content

2. Getting Started

Bob Dolin edited this page Nov 9, 2022 · 14 revisions

Data

This section describes patient data and knowledge data used to drive the operations. You can experiment with predefined queries (see postman collection) or create your own queries based on available data.

Patient Data

Representative patients are listed in the following table. Not all patients have genetic testing data. Several patients (e.g. HG00403, HG00406) have whole exome sequencing data; many patients (e.g. ABC456, HCC1143) have been studied for structural variants; some patients (e.g. HCC1143) have somatic data; there are patients with PGx star alleles (e.g. XYZ123) and HLA haplotypes (e.g. NB6TK328). Patient NA19238 is the mother, and patient NA19239 is the father of patient NA19240. Some patient data is based on build37 (e.g. HG02657), and some is based on build38 (e.g. CA12345). The find-study-metadata operation can be used to see what types of testing a patient has had.

patientID Sex patientID Sex patientID Sex
ABC123 M m123 M NA19240 F
ABC456 M NA18498 M NA19247 F
ABC789 F NA18499 F NA19256 M
CA12345 M NA18870 F NB6TK328 F
HCC1143 F NA18871 M NB6TK329 F
HG00403 M NA19190 F XYZ123 F
HG00406 M NA19210 M XYZ234 F
HG02657 M NA19238 F XYZ345 F
huC30902 M NA19239 M --- ---

Knowledge Data

Knowledge data is used to dynamically compute diagnostic and therapeutic implications of genetic variants. This reference implementation is piloting the draft GA4GH Variant Annotation (VA) knowledge structures distributed as part of the GA4GH Genomic Knowledge Pilot. In the future, we anticipate using GA4GH VA-encoded knowledge to drive automated knowledge updates of the reference implementation.

Clinvar

Clinvar knowledge is based on a Aug 2022 extract, using both variant summary data and submission summary data. The Clinvar snapshop is limited to ACMG genes. Conditions are coded with Medgen codes (codeSystem='https://www.ncbi.nlm.nih.gov/medgen')

PharmGKB

PharmGKB knowledge is based on a Dec 2021 extract. The PharmKGB snapshot is limited to CPIC Level A star alleles in CYP2B6, CYP2C9, CYP2C19, CYP2D6, CYP3A5, NUDT15, SLCO1B1, TPMT, UGT1A1. Medications are coded with RxNorm ingredient codes (codeSystem='http://www.nlm.nih.gov/research/umls/rxnorm')

CIViC

CIViC knowledge is based on a Sep 2022 extract. The CIViC snapshot is limited to simple variants. Conditions are coded with Disease Ontology codes (codeSystem='https://disease-ontology.org'). Medications are coded with RxNorm ingredient codes (codeSystem='http://www.nlm.nih.gov/research/umls/rxnorm')

Molecular Consequences

Variants in the reference implementation are enhanced with population allele frequencies and predicted molecular consequences. A software utility 'vcfPrepper' that implements our molecular consequence pipeline can be found here.

Population allele frequency

Population allele frequency data is obtained from gnomAD. gnomAD v2.1.1 contains data from 125,748 exomes, mapped to GRCh37; gnomAD v2 liftover contains gnomAD v2.1.1 data lifted over to GRCh38. Population allele frequencies are returned in the FHIR Genomics Variant profile, in component population-allele-frequency (LOINC 92821-8).

Predicted molecular consequences

Utilities

This section describes additional APIs provided as part of the reference implementation that are not part of FHIR Genomics Operations.

get-feature-coordinates

This utility returns genomic feature coordinates and other annotations. All data are from NCBI Human Genome Resources. For chromosomes, build 37 and build 38 reference sequences are returned. For genes, genomic coordinates are returned, along with a list of transcripts. MANE transcript is flagged. For transcripts, genomic coordinates are returned, along with the gene name and composite exons, along with exon coordinates. For proteins, the corresponding transcript is returned.

find-the-gene

This utility returns all genes that intersect with a provided genomic region. Gene locations are from NCBI Human Genome Resources.

Clone this wiki locally