| License | CCZero |
WikiPathways is a database with machine-readable models of biological processes for human and multiple other species [Q21092742,Q102205677]. It comes with a SPARQL endpoint with a human-oriented interface at sparql.wikipathways.org [Q26261238].
WikiPathways RDF has two parts. The first is the GPMLRDF which is an RDF representation of the Graphical Pathway Markup Language (GPML) in which the biological pathways are stored in the database. The second is the WPRDF which is the represented biological knowledge [Q26261238,Q111656837]. This chapter focuses on the WPRDF only.
Figure of simplified RDF schema:
The RDF contains all pathways, their datanodes (genes, proteins, metabolites, etc.), author information, molecular descriptors, and more. The main classes are:
- Pathway: a biological pathway
- GeneProduct: can be a gene, strand of RNA, and a protein.
- Rna: RNA, e.g. miRNA.
- Protein: a protein. Post-translational modifications can be indicated with states
- Metabolite: metabolites, ions, and other small molecules. It includes peptides.
- Interaction: can be a lot of things: translocation, inhibition, metabolic conversions (see [Q111656837]).
In all cases, the specific meaning is not clearly defined. Each of the above types is roughly defined by the database identifies linked to the entity. For example, a UniProt identifier linked to a GeneProduct suggests the entity is actually a protein.
Because the WikiPathways RDF contains many properties of all subjects (such as pathways), we can also directly request all
contents through the SPARQL query. For example, to extract the pathway title, we add ?pathway dc:title ?pathwaytitle
to the SPARQL query and add ?pathwaytitle
in the SELECT
list. The returned table upon running the query will get
wider, so you might need to scroll to the right to see it all.
The simplest SPARQL queries to explore RDF is to retrieve full lists of subjects of a particular type, which is
frequently defined with the predicate rdfs:type
or a
which can be used interchangably. See the below example
of listing all pathways.
pathways
The list is long and this is the first five:
pathways
With this exercise, the RDF will be explored a little more extensively. By combining statements in the RDF query,
we can link multiple subjects and filter for content that we want to get back from the service. Important: when
filtering for a literal (gene label, organism, etc.) the literal should have the following format:
"text"^^xsd:string
. For example, the next query returns the title for pathway with ID WP4846
:
pathwayWP4846
Which returns the following title:
pathwayWP4846
For example, we can ask a list of pathways describing the biology of oxygenated hydrocarbons (LMFA12
):
lipidPathways
This gives:
lipidPathways
This final example adds an extra level of difficulty by linking the AOP-Wiki RDF with another database through SPARQL (this is called a Federated SPARQL query). In this exercise we will explore the connection between WikiPathways and AOP-Wiki (see this chapter).
The SPARQL query will need to contain a SERVICE
function and the final query will have the following structure:
PREFIX aopo: <http://vocabularies.wikipathways.org/wp#>
SELECT [variables] WHERE {
[query WikiPathways]
SERVICE <https://aopwiki.rdf.bigcat-bioinformatics.org/sparql> {
[query AOP-Wiki]
}
}