best practices for parsing GTF files - gene_version field? #2040
-
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
Hi @jamesnemesh I am not 100% sure (it's better to ask ENSEMBL/Gencode people), but I think the gene version is just used to track the change in gene annotations between different releases of annotations. In a given release, each gene can only have one version. |
Beta Was this translation helpful? Give feedback.
-
If each gene can have only one version, then how do you parse the two lines above, that come from the same GTF? Do you consider these two truly different biological entities (genes) that happen to have the same gene symbol? The source of this data is the cellranger source GTF file |
Beta Was this translation helpful? Give feedback.
-
Since they have different gene_id, I would consider them different genes. |
Beta Was this translation helpful? Give feedback.
-
Hi! I just wanted to wrap this up and say I think you're absolutely right. The two ENSGs are distinct gene models that happen to have the same gene symbol. The V8 ENSG happens to have 7 other previous versions as encoded by the Filtering to one of these two ENSGs based on the version happens to work, but that's completely random and no basis for systematic decision. Thanks again for your thoughts! |
Beta Was this translation helpful? Give feedback.
Since they have different gene_id, I would consider them different genes.
They are "read-through" fusion genes that combine two normal genes, so they are not normal genes.