Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constraint for mitochondrial genes #1451

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions browser/help/topics/mitochondrial-constraint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
id: mitochondrial-constraint
title: 'Mitochondrial gene constraint'
---

Variants in the mitochondrial genome (mtDNA) are available for 56,434 genome samples from gnomAD v4.1. Assessment of [constraint](/help/constraint) within the mtDNA requires a different approach than the nuclear genome, as its unique features make nuclear constraint models unsuitable. To measure intolerance to variation within the mtDNA, we developed a mitochondrial mutational model that predicts the level of variation expected to be seen in the gnomAD dataset for a given gene based on local sequence context. We then compare the expected values for each gene to the observed amount of variation and consider genes that are significantly depleted of their expected variation to be constrained, or intolerant of this variation. Our gene-level constraint metrics for the mtDNA are detailed below.

The sections below will review:

- [Methods](/help/mitochondrial-constraint#methods)
- [Observed/expected (oe) metric](/help/mitochondrial-constraint#oe)
- [Differences between mitochondrial and nuclear gene constraint metrics](/help/mitochondrial-constraint#differences-from-nuclear)

More details on these methods can be found in the main article and supplement of [Lake et al. Nature 2024](https://www.nature.com/articles/s41586-024-08048-x).

### <a id="methods"></a>Methods

#### Genes, transcripts, and variant classes included in the analyses

We provide gene constraint metrics for all protein-coding, ribosomal RNA (rRNA), and transfer RNA (tRNA) genes in the mitochondrial genome. Since each human mtDNA gene has only one transcript, distinction between canonical and non-canonical transcripts was not required. In the protein-coding genes, metrics are provided for (i) synonymous, (ii) missense and (iii) stop gain variants caused by single nucleotide changes. Note that splice site variants are not applicable to genes in mtDNA. In the rRNA and tRNA genes, metrics are provided for all single nucleotide variants.

#### Observed value

The observed value is the sum of the maximum observed heteroplasmy level (‘maximum heteroplasmy’) of every possible single nucleotide variant in the gene. Heteroplasmy refers to the proportion of mtDNA copies that carry the variant. Every possible variant is assigned a maximum heteroplasmy value between 0.0 and 1.0, representing the highest level at which the variant is observed across all individuals in gnomAD. Heteroplasmy is important to account for when detecting selection in mtDNA, as most pathogenic variants have maximum heteroplasmy levels below 1.0 due to selection, reflecting that individuals can carry pathogenic variants but be asymptomatic if heteroplasmy levels are low enough.

#### Expected value

We calculated the expected sum maximum heteroplasmy of single nucleotide variants in each gene using a mitochondrial mutational model that accounts for trinucleotide sequence context. While nuclear constraint models include corrections for coverage and methylation, we do not apply these given the high and even mtDNA coverage in gnomAD and lack of robust data on mtDNA methylation.

### <a id="oe"></a>Observed / expected (oe) metric

We calculated the ratio of observed to expected (oe) sum maximum heteroplasmy of variants in each gene in the mitochondrial genome and the 90% confidence interval (CI) around these ratios. These values provide an inference on the strength of selection against variation in each gene. Observed/expected (oe) ratios are a continuous measure of how tolerant a gene is to a certain class of variation (e.g. missense). Genes with lower oe values are under stronger selection pressure, while higher oe values indicate greater tolerance.

We calculated the 90% CI around each oe ratio using a beta distribution, adapting methods previously used for nuclear genome constraint. The CI captures uncertainty around the ratio estimate, which can vary depending on sample size. When evaluating how constrained a gene is, it is important to take the 90% CI into consideration. We suggest using the upper bound of this confidence interval, termed the OEUF (observed to expected upper bound fraction), which provides a conservative measure of constraint. A lower OEUF indicates stronger selection, while a higher value suggests greater tolerance.

### <a id="differences-from-nuclear"></a> Differences between mitochondrial and nuclear gene constraint metrics

The gene constraint metrics for the mitochondrial and nuclear genome differ due to the unique characteristics of the mtDNA. This includes its smaller size, lack of introns, high copy number, presence of heteroplasmy, distinct mutational mechanisms, and higher rate of mutation. These unique features precluded the application of nuclear constraint models to the mtDNA, necessitating a mitochondrial genome constraint model. Key differences between the mitochondrial and nuclear constraint models include:

- Mutational Model: A mitochondrial mutational model was developed and applied, as nuclear mutational models could not be used due to distinct mutational mechanisms and signatures between the genomes.
- Observed/Expected Calculation: Nuclear constraint models assess the number of unique variants, while mitochondrial constraint models evaluate the sum of maximum heteroplasmy for variants.
- Confidence Interval: A beta distribution, rather than a Poisson distribution, is used to calculate confidence intervals for the observed/expected ratios based on maximum heteroplasmy values.
40 changes: 40 additions & 0 deletions browser/help/topics/mitochondrial-regional-constraint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
id: mitochondrial-regional-constraint
title: 'Mitochondrial gene regional constraint'
---

Regional constraint identifies regions within each gene that are more constrained than the entire gene. Knowing if a variant falls within an interval of regional constraint can help prioritize variants most likely to have a deleterious functional impact. Assessment of regional constraint within the mitochondrial genome (mtDNA) requires a different approach than the nuclear genome, as its distinct features make [nuclear regional constraint](/help/regional-constraint) models unsuitable. Variants in the mtDNA are available for 56,434 genome samples from gnomAD v4.1. For the mitochondrial genome, we provide regional missense constraint for protein-coding genes and regional constraint for single nucleotide variants in ribosomal RNA (rRNA) genes. Our mtDNA regional constraint metrics are detailed below.

The sections below will review:

- [Methods](/help/mitochondrial-regional-constraint#methods)
- [Observed/expected (oe) metric](/help/mitochondrial-regional-constraint#oe)
- [Differences between mitochondrial and nuclear gene constraint metrics](/help/mitochondrial-regional-constraint#differences-from-nuclear)

More details on these methods can be found in the main article and supplement of [Lake et al. Nature 2024](https://www.nature.com/articles/s41586-024-08048-x).

### <a id="methods"></a>Methods

To identify regional missense constraint within protein-coding genes, we calculated the observed / expected (oe) missense ratio for all possible regions ≥ 30 bp within each gene. Regions with an oe ratio significantly lower than the gene’s overall oe ratio were identified using a beta distribution. A greedy algorithm was then applied to prioritize regions with the most significant p-values to produce a list of non-overlapping intervals. The false discovery rate (FDR) of each interval was estimated by applying the same method to 1,000 random permutations of each gene, retaining only regions with FDR <0.1 as high-confidence intervals. Regional constraint in the rRNA genes was evaluated using the same process with minor modifications. We provide the oe ratio of each interval of regional constraint and the 90% confidence interval around these ratios.

#### Genes, transcripts, and variant classes included in the analyses

We provide regional constraint metrics for all protein-coding and ribosomal RNA (rRNA) genes in the mitochondrial genome. Since each human mtDNA gene has only one transcript, distinction between canonical and non-canonical transcripts was not required. For protein-coding genes, we measure regional intolerance to missense variants. For rRNA genes, we assess regional intolerance to all single nucleotide variants. Note regional constraint metrics are not provided for transfer RNA (tRNA) genes due to their small size.

### <a id="oe"></a>Observed/expected (oe) metric

#### Observed values

The observed value is the sum of the maximum observed heteroplasmy level (‘maximum heteroplasmy’) of every possible single nucleotide variant in the gene (or just missense for protein genes). Heteroplasmy refers to the proportion of mtDNA copies that carry the variant. Every possible variant is assigned a maximum heteroplasmy value between 0.0 and 1.0, representing the highest level at which the variant is observed across all individuals in gnomAD. Heteroplasmy is important to account for when detecting selection in mtDNA, as most pathogenic variants have maximum heteroplasmy levels below 1.0 due to selection, reflecting that individuals can carry pathogenic variants but be asymptomatic if heteroplasmy levels are low enough.

#### Expected values

We calculated the expected sum maximum heteroplasmy of single nucleotide variants in each gene using a mitochondrial mutational model that accounts for trinucleotide sequence context. While nuclear constraint models include corrections for coverage and methylation, we do not apply these given the high and even mtDNA coverage in gnomAD and lack of robust data on mtDNA methylation.

### <a id="differences-from-nuclear"></a> Differences between mitochondrial and nuclear regional constraint metrics

The methods for identifying regional constraint differ between the mitochondrial and nuclear genomes due to the unique characteristics of mtDNA. This includes the lack of exons and introns in mtDNA, and using heteroplasmy instead of unique variant counts for calculating observed and expected values. These differences required the development of a specialized method for mtDNA regional constraint analysis. Key differences between mitochondrial and nuclear methods include:

- Observed/Expected Values: Nuclear models assess constraint based on the number of unique variants, whereas mitochondrial models use the sum of maximum heteroplasmy values for every variant.
- Statistical Model: A beta distribution is used to identify regions that are significantly more constrained than the gene, in contrast to the Poisson distribution used for nuclear models.
- Application to RNA Genes: While nuclear regional constraint methods focus on protein-coding genes, the mitochondrial approach extends to RNA genes in the mtDNA.
1 change: 1 addition & 0 deletions browser/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
"@gnomad/ui": "2.0.0",
"@hot-loader/react-dom": "^17.0.0",
"@visx/axis": "^3.0.0",
"@visx/group": "^3.0.0",
"core-js": "3.5.0",
"css-loader": "^6.7.3",
"d3-array": "^1.2.4",
Expand Down
72 changes: 70 additions & 2 deletions browser/src/ConstraintTable/ConstraintTable.spec.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ import { BrowserRouter } from 'react-router-dom'
import ConstraintTable from './ConstraintTable'
import { ExacConstraint } from './ExacConstraintTable'
import { GnomadConstraint } from './GnomadConstraintTable'
import {
ProteinMitochondrialGeneConstraint,
RNAMitochondrialGeneConstraint,
} from '../GenePage/GenePage'

const exacConstraintFactory = Factory.define<ExacConstraint>(() => ({
exp_lof: 0.123,
Expand Down Expand Up @@ -42,6 +46,34 @@ const gnomadConstraintFactory = Factory.define<GnomadConstraint>(() => ({
oe_syn_upper: 0.95,
}))

const proteinMitochondrialConstraintFactory = Factory.define<ProteinMitochondrialGeneConstraint>(
() => ({
exp_lof: 0.123,
exp_syn: 0.234,
exp_mis: 0.345,
oe_lof: 0.789,
oe_lof_lower: 0.6,
oe_lof_upper: 0.9,
oe_mis: 0.891,
oe_mis_lower: 0.8,
oe_mis_upper: 0.99,
oe_syn: 0.912,
oe_syn_lower: 0.8,
oe_syn_upper: 0.95,
obs_lof: 0.111,
obs_syn: 0.222,
obs_mis: 0.333,
})
)

const rnaMitochondrialConstraintFactory = Factory.define<RNAMitochondrialGeneConstraint>(() => ({
observed: 1.1,
expected: 22.2,
oe: 0.33,
oe_lower: 0.31,
oe_upper: 0.35,
}))

forAllDatasets('ConstraintTable with "%s" dataset selected', (datasetId) => {
describe('with a minimal gene', () => {
test('has no unexpected changes', () => {
Expand All @@ -65,20 +97,56 @@ forAllDatasets('ConstraintTable with "%s" dataset selected', (datasetId) => {
})
})

describe('with a mitochondrial gene', () => {
describe('with a mitochondrial protein gene', () => {
test('has no unexpected changes', () => {
const constraint = proteinMitochondrialConstraintFactory.build()
const tree = renderer.create(
<BrowserRouter>
<ConstraintTable
datasetId={datasetId}
geneOrTranscript={geneFactory.build({
chrom: 'M',
mitochondrial_constraint: constraint,
})}
/>
</BrowserRouter>
)
expect(tree).toMatchSnapshot()
})
})

describe('with a mitochondrial RNA gene', () => {
test('has no unexpected changes', () => {
const constraint = rnaMitochondrialConstraintFactory.build()
const tree = renderer.create(
<BrowserRouter>
<ConstraintTable
datasetId={datasetId}
geneOrTranscript={geneFactory.build({ chrom: 'M' })}
geneOrTranscript={geneFactory.build({
chrom: 'M',
mitochondrial_constraint: constraint,
})}
/>
</BrowserRouter>
)
expect(tree).toMatchSnapshot()
})
})

describe('with a mitochondrial gene missing constraint data', () => {
const tree = renderer.create(
<BrowserRouter>
<ConstraintTable
datasetId={datasetId}
geneOrTranscript={geneFactory.build({
chrom: 'M',
})}
/>
</BrowserRouter>
)
expect(tree).toMatchSnapshot()
})

describe('with a mitochondrial transcript', () => {
test('has no unexpected changes', () => {
const tree = renderer.create(
Expand Down
17 changes: 8 additions & 9 deletions browser/src/ConstraintTable/ConstraintTable.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import Link from '../Link'

import ExacConstraintTable from './ExacConstraintTable'
import GnomadConstraintTable from './GnomadConstraintTable'
import MitochondrialConstraintTable from './MitochondrialConstraintTable'

type Props = {
datasetId: DatasetId
Expand Down Expand Up @@ -65,18 +66,16 @@ const ConstraintTable = ({ datasetId, geneOrTranscript }: Props) => {
const { transcriptId, transcriptVersion, transcriptDescription } =
transcriptDetails(geneOrTranscript)

const gnomadConstraint = geneOrTranscript.gnomad_constraint
const exacConstraint = geneOrTranscript.exac_constraint

if (geneOrTranscript.chrom === 'M') {
return (
<p>
Constraint is not available for mitochondrial{' '}
{isGene(geneOrTranscript) ? 'genes' : 'transcripts'}
</p>
)
if (isGene(geneOrTranscript)) {
return <MitochondrialConstraintTable constraint={geneOrTranscript.mitochondrial_constraint} />
}
return <p>Constraint is not available for mitochondrial transcripts</p>
}

const gnomadConstraint = geneOrTranscript.gnomad_constraint
const exacConstraint = geneOrTranscript.exac_constraint

if (datasetId === 'exac') {
if (!exacConstraint) {
return (
Expand Down
136 changes: 136 additions & 0 deletions browser/src/ConstraintTable/MitochondrialConstraintTable.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
import React from 'react'
import {
MitochondrialGeneConstraint,
ProteinMitochondrialGeneConstraint,
RNAMitochondrialGeneConstraint,
} from '../GenePage/GenePage'
import { BaseTable } from '@gnomad/ui'

const isProteinMitochondrialGeneConstraint = (
constraint: MitochondrialGeneConstraint
): constraint is ProteinMitochondrialGeneConstraint =>
Object.prototype.hasOwnProperty.call(constraint, 'exp_lof')

const ConstraintRow = ({
category,
expected,
observed,
oe,
oeLower,
oeUpper,
}: {
category: string
expected: number
observed: number
oe: number
oeLower: number
oeUpper: number
}) => (
<tr>
<th scope="row">{category}</th>
<td>{expected < 10 ? expected.toFixed(2) : expected.toFixed(1)}</td>
<td>{observed < 10 ? observed.toFixed(2) : observed.toFixed(1)}</td>
<td>
o/e = {oe.toFixed(2)} ({oeLower.toFixed(2)} - {oeUpper.toFixed(2)})
</td>
</tr>
)

const ProteinConstraintMetrics = ({
constraint,
}: {
constraint: ProteinMitochondrialGeneConstraint
}) => {
const {
exp_lof,
exp_mis,
exp_syn,
obs_lof,
obs_mis,
obs_syn,
oe_lof,
oe_lof_lower,
oe_lof_upper,
oe_mis,
oe_mis_lower,
oe_mis_upper,
oe_syn,
oe_syn_lower,
oe_syn_upper,
} = constraint
return (
<tbody>
<ConstraintRow
category="Synonymous"
expected={exp_syn}
observed={obs_syn}
oe={oe_syn}
oeLower={oe_syn_lower}
oeUpper={oe_syn_upper}
/>
<ConstraintRow
category="Missense"
expected={exp_mis}
observed={obs_mis}
oe={oe_mis}
oeLower={oe_mis_lower}
oeUpper={oe_mis_upper}
/>
<ConstraintRow
category="pLoF"
expected={exp_lof}
observed={obs_lof}
oe={oe_lof}
oeLower={oe_lof_lower}
oeUpper={oe_lof_upper}
/>
</tbody>
)
}

const RNAConstraintMetrics = ({ constraint }: { constraint: RNAMitochondrialGeneConstraint }) => {
const { expected, observed, oe, oe_lower, oe_upper } = constraint
return (
<tbody>
<ConstraintRow
category="RNA variant"
expected={expected}
observed={observed}
oe={oe}
oeLower={oe_lower}
oeUpper={oe_upper}
/>
</tbody>
)
}

const MitochondrialConstraintTable = ({
constraint,
}: {
constraint: MitochondrialGeneConstraint | null
}) => {
if (constraint === null) {
return <p>Constraint is not available on this gene</p>
}

return (
// @ts-expect-error
<BaseTable>
<thead>
<tr>
<th scope="col">Category</th>
<th scope="col">Expected SNVs</th>
<th scope="col">Observed SNVs</th>
<th scope="col">Constraint metrics</th>
</tr>
</thead>
{isProteinMitochondrialGeneConstraint(constraint) ? (
<ProteinConstraintMetrics constraint={constraint} />
) : (
<RNAConstraintMetrics constraint={constraint} />
)}
</BaseTable>
)
}

export default MitochondrialConstraintTable
Loading
Loading