Skip to content

TCP-Lab/2410-Fiorio-PDAC_pH_ICTR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

2410-Fiorio-PDAC_pH_ICTR

FGFR2 isoforms in PDAC

Background

The fibroblast growth factor receptor 2 (FGFR2) exists in multiple isoforms from alternative splicing events. In particular, it is known from literature that

  • under physiological conditions, the isoform b (FGFR2IIIb) is typically expressed at epithelium level, while the isoform c (FGFR2IIIc) is usually found at stromal/mesenchymal level;
  • expression of the c isoform at the epithelial level promotes carcinogenesis, with special reference to pancreatic ductal adenocarcinoma (PDAC).

References

Aim

It would be interesting to re-analyze at isoform-level the RNA-Seq data set from our 2022 study about epithelial–to-mesenchymal transition (EMT) effects induced by either acute or prolonged exposure of PANC-1 cells to acidic pH environment:

Audero, M.M. et al. Acidic Growth Conditions Promote Epithelial-to-Mesenchymal Transition to Select More Aggressive PDAC Cell Phenotypes In Vitro. Cancers 2023, 15, 2572. https://doi.org/10.3390/cancers15092572 (PMID 37174038)

The aim now is to investigate the expression patterns of FGFR2 isoforms in PANC-1 cell line as an in vitro model for PDAC when a more aggressive phenotype is triggered by acidosis.

Methods

From Reads to Counts

The $12 \times 2$ PE FASTQ files from PMID 37174038 went through a standard x.FASTQ pipeline for quality control, adapter and quality trimming, read alignment (by STAR), and transcript abundance quantification (by RSEM). For the last two steps, Genome assembly GRCh38 (hg38) was used, together with gene annotations from Homo_sapiens.GRCh38.110.gtf GTF file. TPM and expected count expression matrices were eventually assembled for both genes and isoforms, using Ensembl ENSG and ENST, respectively, as primary IDs. Expression data were then copied in the ./data/in project's subfolder with the following names:

pH_CountMatrix_genes_expected_count.tsv
pH_CountMatrix_genes_TPM.tsv
pH_CountMatrix_isoforms_expected_count.tsv
pH_CountMatrix_isoforms_TPM.tsv

together with a sample_metadata.tsv file, created for the purpose along the lines of the sample metadata tables used on ENA database.

Identifying the isoforms of interest

First, a list of all transcripts associated with the ENSG000066468 gene (FGFR2) was obtained from the local copy of the GTF file (in order to be 100% consistent with the annotation version used for aligment and quantification) by using the following Bash line

cat Homo_sapiens.GRCh38.110.gtf | grep -oP "ENSG00000066468.*ENST.{11}" | grep -oP "ENST.*" | uniq > ~/all_FGFR2_transcripts.txt

The resulting list of 41 transcripts was then copied in the ./data/in project's subfolder.

Accordingly, searching for human FGFR2 gene through the Ensembl genome browser (ENSG00000066468 > Summary > Show transcript table) returned the same 41 possible splice variants, among which 25 protein-coding transcripts. Of these, only 8 were golden genes.

Tip

Gene/transcript color code: Species with both HAVANA and Ensembl gene annotation (i.e., Human, Mouse, Rat, and Zebrafish) undergo a merge of the two sets of gene models. A merged (or golden) protein-coding gene/transcript indicates that annotation was provided by both Ensembl and HAVANA. Where a coding transcript model is annotated only by Ensembl or HAVANA, it is displayed as an unmerged (or red) model. Non-coding transcripts are in blue.

References

Nevertheless, it was not obvious to find which of those ENST IDs corresponded to the two isoforms of interest identified by the two "common" names FGFR2IIIb and FGFR2IIIc.

Thankfully, the P21802 · FGFR2_HUMAN entry from the UniProt database featured a specific section about protein Sequence & Isoforms, also reporting all their names and aliases (synonyms). Based on this information, the correspondence between the common names of the isoforms and their official UniProtKB/Swiss-Prot IDs was found to be as follows:

  • P21802-1 (canonical sequence) Synonyms: BEK, FGFR2IIIc
  • P21802-3 Synonyms: BFR-1, FGFR2IIIb, KGFR

This in turn allowed the transcripts of interest to be identified in Ensembl by the UniProt Match column.

Common Name FGFR2IIIb FGFR2IIIc
Transcript ID ENST00000457416.7 ENST00000358487.10
Name FGFR2-215 FGFR2-206
Translation ID ENSP00000410294.2 ENSP00000351276.6
Biotype Gold Protein-coding Gold Protein-coding
CCDS CCDS7620 CCDS31298
UniProt Match P21802-3 P21802-1
RefSeq Match NM_022970.4 NM_000141.5

Tip

You can get a schematic representations of the exon-intron structure for the isoforms of interest by following these few steps: Ensembl Gene Tab > Summary > Splice variants > Basic Gene Annotations from GENCODE 47 > left click on a transcript > Zoom on feature. Alternatively, you can directly select the Ensembl Location Tab > Region in detail menu item. In the Region Image pane, transcripts are drawn as boxes (exons) and lines connecting the boxes (introns). Filled boxes represent coding sequence and unfilled boxes (or portions of boxes) represent UnTranslated Regions (UTRs). Tracks above the blue bar (Contigs) are on the forward strand of the chromosome, while tracks under the blue bar are on the reverse strand.

Note

The FGFR2 gene lies on the reverse strand of chromosome 10, so its exon sequence must be read right to left.

Full transcripts Zoom on swapped exons

Tip

You can get the sequences of spliced transcripts for a gene by following these steps: Ensembl Gene Tab > Summary > Transcript comparison > Select transcripts. Upon selection, the gene sequence will be shown above in brown, labelled with the gene name. Below, the transcript sequences are color-coded to indicate the spliced sequences. In particular, coding sequences are colored blue (we are talking only about the font color, not the background!), non-coding sequences in black, and UTRs are colored orange. Introns are shown in grey.

If you decide to locally Download sequence you can choose what to download among: the spliced transcripts (as cDNA), just the coding sequences (CDS), 5' UTRs, 3' UTRs, the list of exons, the list of introns, the entire genomic sequences.

Kerblam! Workflows

Provided that the input data and metadata matrices are correctly named and present in the ./data/in folder, this Kerblam! project runs 2 independent workflows.

kerblam run dea

can be used to rerun the same DEA published in PMID 37174038, but with the most up-to-date versions of the same software and R packages used there (in times before Kerblam!). It may be interesting to compare the resulting lists of DEGs and reflect on reproducibility...

kerblam run iso

is the workflow to be used for the isoform-level analysis of FGFR2 expression.

Results

RSEM Expected Counts

                          gene_level isoform_sum
4-days-pH-6_6-12-21_09_21         48       48.00
4-days-pH-6_6-16                  31       31.01
4-days-pH-6_6-7_1                 11       11.00
control-12                        36       36.01
control-13-21_09_21               27       27.00
control-7_4                       23       23.00
control-7_6                       22       22.00
pH-selected-p7_5                  94       94.00
pH-selected-p7_6                  59       59.00
pH-selected-p7_7                  53       53.00

>>> Ctrl
                    mean      SD
ENST00000358487    1.005    2.01
ENST00000457416    0.000    0.00

>>> Acute
                    mean      SD
ENST00000358487        0       0
ENST00000457416        0       0

>>> Selected
                    mean      SD
ENST00000358487     9.22  8.7795
ENST00000457416     0.00  0.0000

TPMs

                          gene_level isoform_sum
4-days-pH-6_6-12-21_09_21       0.59        0.59
4-days-pH-6_6-16                0.60        0.60
4-days-pH-6_6-7_1               0.15        0.15
control-12                      0.30        0.30
control-13-21_09_21             1.02        1.00
control-7_4                     0.56        0.56
control-7_6                     0.31        0.31
pH-selected-p7_5                0.73        0.73
pH-selected-p7_6                0.43        0.43
pH-selected-p7_7                0.53        0.52

>>> Ctrl
                    mean      SD
ENST00000358487    0.005    0.01
ENST00000457416    0.000    0.00

>>> Acute
                    mean      SD
ENST00000358487        0       0
ENST00000457416        0       0

>>> Selected
                    mean      SD
ENST00000358487   0.0533  0.0551
ENST00000457416   0.0000  0.0000

Discussion

FGFR2IIIb (ENST00000457416) isoform does not appear to be expressed in any experimental group (and this may be consistent with the fact that PANC-1 cells are a model of epithelioid carcinoma).

FGFR2IIIc (ENST00000358487) isoform would appear to be more expressed in the acidic-pH Selected group, consistent with the fact that this treatment (pH-selection by long-term acidic pressure followed by recovery to pH 7.4) favored EMT and correlated with a more aggressive tumor phenotype. However, considering the standard deviations within groups, and applying even the most "liberal" thresholds of 0.5 TPMs (or 10 counts) for lowly expressed genes, it is hard to support the claim that the FGFR2IIIc isoform is actually expressed (or more expressed) in pH-selected PANC-1 cells.

Ultimately, we have just a (very) weak evidence that prolonged exposure of PANC-1 cells to acidic pH environment induces the expression of isoform IIIc.

The reasons for this could be that (i) the receptor is poorly expressed by PANC-1 in general or (ii) the sequencing depth is not sufficient. However, considering the average depth of ~70 Mreads/sample and the fact that FGFR2 gene is detected at very low--but still significant--levels (10-100 counts; ~0.5 TPMs) in every sample, I lean toward the first hypothesis and do not think that further increasing the sequencing depth would bring any benefit.

References

About

No description, website, or topics provided.

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published