The fibroblast growth factor receptor 2 (FGFR2) exists in multiple isoforms from alternative splicing events. In particular, it is known from literature that
- under physiological conditions, the isoform b (FGFR2IIIb) is typically expressed at epithelium level, while the isoform c (FGFR2IIIc) is usually found at stromal/mesenchymal level;
- expression of the c isoform at the epithelial level promotes carcinogenesis, with special reference to pancreatic ductal adenocarcinoma (PDAC).
References
It would be interesting to re-analyze at isoform-level the RNA-Seq data set from our 2022 study about epithelial–to-mesenchymal transition (EMT) effects induced by either acute or prolonged exposure of PANC-1 cells to acidic pH environment:
Audero, M.M. et al. Acidic Growth Conditions Promote Epithelial-to-Mesenchymal Transition to Select More Aggressive PDAC Cell Phenotypes In Vitro. Cancers 2023, 15, 2572. https://doi.org/10.3390/cancers15092572 (PMID 37174038)
The aim now is to investigate the expression patterns of FGFR2 isoforms in PANC-1 cell line as an in vitro model for PDAC when a more aggressive phenotype is triggered by acidosis.
The Homo_sapiens.GRCh38.110.gtf
GTF file. TPM and expected count
expression matrices were eventually assembled for both genes and isoforms,
using Ensembl ENSG and ENST, respectively, as primary IDs. Expression data were
then copied in the ./data/in
project's subfolder with the following names:
pH_CountMatrix_genes_expected_count.tsv
pH_CountMatrix_genes_TPM.tsv
pH_CountMatrix_isoforms_expected_count.tsv
pH_CountMatrix_isoforms_TPM.tsv
together with a sample_metadata.tsv
file, created for the purpose along the
lines of the sample metadata tables used on ENA database.
First, a list of all transcripts associated with the ENSG000066468 gene (FGFR2) was obtained from the local copy of the GTF file (in order to be 100% consistent with the annotation version used for aligment and quantification) by using the following Bash line
cat Homo_sapiens.GRCh38.110.gtf | grep -oP "ENSG00000066468.*ENST.{11}" | grep -oP "ENST.*" | uniq > ~/all_FGFR2_transcripts.txt
The resulting list of 41 transcripts was then copied in the ./data/in
project's subfolder.
Accordingly, searching for human FGFR2 gene through the Ensembl genome
browser
(ENSG00000066468 > Summary
> Show transcript table
)
returned the same 41 possible splice variants, among which 25 protein-coding
transcripts. Of these, only 8 were golden genes.
Tip
Gene/transcript color code: Species with both HAVANA and Ensembl gene annotation (i.e., Human, Mouse, Rat, and Zebrafish) undergo a merge of the two sets of gene models. A merged (or golden) protein-coding gene/transcript indicates that annotation was provided by both Ensembl and HAVANA. Where a coding transcript model is annotated only by Ensembl or HAVANA, it is displayed as an unmerged (or red) model. Non-coding transcripts are in blue.
References
Nevertheless, it was not obvious to find which of those ENST IDs corresponded to the two isoforms of interest identified by the two "common" names FGFR2IIIb and FGFR2IIIc.
Thankfully, the P21802 · FGFR2_HUMAN entry from the UniProt database featured a specific section about protein Sequence & Isoforms, also reporting all their names and aliases (synonyms). Based on this information, the correspondence between the common names of the isoforms and their official UniProtKB/Swiss-Prot IDs was found to be as follows:
- P21802-1 (canonical sequence) Synonyms: BEK, FGFR2IIIc
- P21802-3 Synonyms: BFR-1, FGFR2IIIb, KGFR
This in turn allowed the transcripts of interest to be identified in Ensembl
by the UniProt Match
column.
Common Name | FGFR2IIIb | FGFR2IIIc |
---|---|---|
Transcript ID | ENST00000457416.7 | ENST00000358487.10 |
Name | FGFR2-215 | FGFR2-206 |
Translation ID | ENSP00000410294.2 | ENSP00000351276.6 |
Biotype | Gold Protein-coding | Gold Protein-coding |
CCDS | CCDS7620 | CCDS31298 |
UniProt Match | P21802-3 | P21802-1 |
RefSeq Match | NM_022970.4 | NM_000141.5 |
Tip
You can get a schematic representations of the exon-intron structure for the
isoforms of interest by following these few steps:
Ensembl Gene
Tab >
Summary
> Splice variants
> Basic Gene Annotations from GENCODE 47
>
left click on a transcript > Zoom on feature
. Alternatively, you can
directly select the Ensembl Location
Tab > Region in detail
menu item. In
the Region Image pane, transcripts are drawn as boxes (exons) and lines
connecting the boxes (introns). Filled boxes represent coding sequence and
unfilled boxes (or portions of boxes) represent UnTranslated Regions (UTRs).
Tracks above the blue bar (Contigs) are on the forward strand of the
chromosome, while tracks under the blue bar are on the reverse strand.
Note
The FGFR2 gene lies on the reverse strand of chromosome 10, so its exon sequence must be read right to left.
Tip
You can get the sequences of spliced transcripts for a gene by following these
steps:
Ensembl Gene
Tab >
Summary
> Transcript comparison
> Select transcripts
. Upon selection,
the gene sequence will be shown above in brown, labelled with the gene name.
Below, the transcript sequences are color-coded to indicate the spliced
sequences. In particular, coding sequences are colored blue (we are talking
only about the font color, not the background!), non-coding sequences in
black, and UTRs are colored orange. Introns are shown in grey.
If you decide to locally Download sequence
you can choose what to download
among: the spliced transcripts (as cDNA), just the coding sequences (CDS), 5'
UTRs, 3' UTRs, the list of exons, the list of introns, the entire genomic
sequences.
Provided that the input data and metadata matrices are correctly named and
present in the ./data/in
folder, this
Kerblam!
project runs 2 independent workflows.
kerblam run dea
can be used to rerun the same DEA published in PMID 37174038, but with the most up-to-date versions of the same software and R packages used there (in times before Kerblam!). It may be interesting to compare the resulting lists of DEGs and reflect on reproducibility...
kerblam run iso
is the workflow to be used for the isoform-level analysis of FGFR2 expression.
gene_level isoform_sum
4-days-pH-6_6-12-21_09_21 48 48.00
4-days-pH-6_6-16 31 31.01
4-days-pH-6_6-7_1 11 11.00
control-12 36 36.01
control-13-21_09_21 27 27.00
control-7_4 23 23.00
control-7_6 22 22.00
pH-selected-p7_5 94 94.00
pH-selected-p7_6 59 59.00
pH-selected-p7_7 53 53.00
>>> Ctrl
mean SD
ENST00000358487 1.005 2.01
ENST00000457416 0.000 0.00
>>> Acute
mean SD
ENST00000358487 0 0
ENST00000457416 0 0
>>> Selected
mean SD
ENST00000358487 9.22 8.7795
ENST00000457416 0.00 0.0000
gene_level isoform_sum
4-days-pH-6_6-12-21_09_21 0.59 0.59
4-days-pH-6_6-16 0.60 0.60
4-days-pH-6_6-7_1 0.15 0.15
control-12 0.30 0.30
control-13-21_09_21 1.02 1.00
control-7_4 0.56 0.56
control-7_6 0.31 0.31
pH-selected-p7_5 0.73 0.73
pH-selected-p7_6 0.43 0.43
pH-selected-p7_7 0.53 0.52
>>> Ctrl
mean SD
ENST00000358487 0.005 0.01
ENST00000457416 0.000 0.00
>>> Acute
mean SD
ENST00000358487 0 0
ENST00000457416 0 0
>>> Selected
mean SD
ENST00000358487 0.0533 0.0551
ENST00000457416 0.0000 0.0000
FGFR2IIIb (ENST00000457416) isoform does not appear to be expressed in any experimental group (and this may be consistent with the fact that PANC-1 cells are a model of epithelioid carcinoma).
FGFR2IIIc (ENST00000358487) isoform would appear to be more expressed in the acidic-pH Selected group, consistent with the fact that this treatment (pH-selection by long-term acidic pressure followed by recovery to pH 7.4) favored EMT and correlated with a more aggressive tumor phenotype. However, considering the standard deviations within groups, and applying even the most "liberal" thresholds of 0.5 TPMs (or 10 counts) for lowly expressed genes, it is hard to support the claim that the FGFR2IIIc isoform is actually expressed (or more expressed) in pH-selected PANC-1 cells.
Ultimately, we have just a (very) weak evidence that prolonged exposure of PANC-1 cells to acidic pH environment induces the expression of isoform IIIc.
The reasons for this could be that (i) the receptor is poorly expressed by PANC-1 in general or (ii) the sequencing depth is not sufficient. However, considering the average depth of ~70 Mreads/sample and the fact that FGFR2 gene is detected at very low--but still significant--levels (10-100 counts; ~0.5 TPMs) in every sample, I lean toward the first hypothesis and do not think that further increasing the sequencing depth would bring any benefit.
References