This is a guide and introduction to querying the
beta.gss-data.org.uk
site with a
new prototype GraphQL service.
This document contains example queries, with links which demonstrate them in the GraphiQL playground environment.
This proof of concept prototype exists primarily to prove the following concepts:
- That GraphQL can serve as an easier and more familiar way for developers to access some of the data we hold. For targetted use cases.
- That GraphQL schema can serve as a better interface to the data than SPARQL for some use cases, and be more easily aligned and optimised towards user and platform needs. SPARQL is an incredibly flexible query language, but that generality brings a performance cost, as SPARQL has to target all possible use cases, where as a given GraphQL schema can target and optimise performance for specific needs. We can then hide optimised services behind GraphQL schema/resolvers.
- That we can seamlessly bridge the divide between Linked Data and GraphQL.
This prototype does not replace SPARQL, internally it dynamically generates SPARQL queries to perform catalog and ultimately provide an API for faceted search.
We developed a prototype schema which we believe with some further refinement could support the proposed designs for our faceted catalog search.
We also believe the schema can be evolved in the future to support other needs such as querying and filtering datasets. For the time being these use cases are left out.
We provide a GraphiQL environment to assist in the creation of GraphQL queries, conforming to our schema at the following address:
http://graphql-prototype.gss-data.org.uk/ide
This interface is wired up to query the public beta.gss-data.org.uk
service.
The GraphiQL interface provides automated tab-completion based upon our GraphQL schema, which means that most queries can be written by auto-complete, and selecting the appropriate field or argument from the drop down.
This is one of the main reasons GraphQL is easier for users to write than SPARQL.
The simplest 'useful' query we currently support is:
{
endpoint {
catalog {
id
label
}
}
}
This returns some basic metadata about the default catalog, if you're not familiar with GraphQL you will notice that the shape of the query directly mirrors the shape of the results:
{
"data": {
"endpoint": {
"catalog": {
"id": "http://gss-data.org.uk/catalog/datasets",
"label": "Datasets"
}
}
}
}
What is endpoint
and why is it in the schema? endpoint
represents
the endpoint we're currently connected to, this exists to allow
appropriately authenticated users to query things out of specific
draftsets, e.g.
We plan to eventually support queries like this, which will ask the same question of the appropriate draftset, at which point you will also be able to access appropriate metadata from the draft.
Though be aware that we may wish to make some breaking schema changes around this (#18).
{
endpoint(draftset_id: "3703f62a-7c33-4d89-be60-d47aedbb9d1f") {
catalog {
id
label
}
}
}
The following query will list all datasets in the catalog and return
their id
/uri
and title
:
{
endpoint {
catalog {
catalog_query {
datasets {
id
title
# <-- Put cursor to the left of this and press `CTRL-<space>` to auto complete additional metadata fields
}
}
}
}
}
Be sure to note that if in GraphiQL you put the cursor at the
appropriate point, you can complete additional metadata fields such as
publisher
, creator
, theme
and modified
.
It's worth mentioning we have only added a subset of the possible ones to the prototypes graphql schema and can introduce more as needed.
We also have a proposal for providing access to arbitrary RDF predicates without modifying the GraphQL schema should it be required for us to support arbitrary extensibility.
This is easily done by providing a search_string
argument to the
catalog_query
field:
{
endpoint {
catalog {
catalog_query(search_string: "climate change") {
datasets {
id
title
}
}
}
}
}
The prototype currently implements this with a basic snowball token
stemmer over the title
and description
fields, there is currently
no attempt to order results by relevance through a vector space
algorithm such as
tf/idf. Switching to a
lucene based index would be an obvious
way to provide this.
NOTE: We plan to offer a faceted search query interface via the GraphQL API. We have a basic schema already, which should support this, though the implementation is not yet finished/correct.
We have designed the schema for this feature, and describe it here. It is based upon Gareth's initial wireframes for what this might look like in the UI:
{
endpoint {
catalog {
catalog_query(search_string:"climate change") {
datasets {
id
title
publisher
creator
theme
}
facets {
themes {
id
label
enabled
}
publishers {
id
label
enabled
}
creators {
id
label
enabled
}
}
}
}
}
}
The facets
fields, themes
, publishers
, creators
, will each
return information on the enabled
state of all available facets,
given the current query constraints.
Each facet has an id
(a URI), a label
and an enabled
state. A
facet will have enabled
set to true
if selecting that facet would
return results. The purpose of this is to assist users in making
selections that only result in data. For example given a query on
climate change
you would likely not want to let users see or select
the balance of payments
as selecting it in combination with a
climate change
search string would, (for the purposes of this
example at least) be a dead end selection with no results.
A query for "climate change" within the energy theme facet would then look like this:
{
endpoint {
catalog {
catalog_query(search_string:"climate change"
themes:["http://gss-data.org.uk/def/gdp#energy"]) {
datasets {
id
title
publisher
creator
theme
}
facets {
themes {
id
label
enabled
}
publishers {
id
label
enabled
}
creators {
id
label
enabled
}
}
}
}
}
}
Below is an example query posted to the /api
endpoint.
GraphQL queries must be wrapped into a JSON object conforming to the GraphQL over HTTP specification.
Essentially they must be wrapped into a JSON object with at a minimum
a query
key:
curl 'http://graphql-prototype.gss-data.org.uk/api' \
-X POST \
-H 'content-type: application/json' \
--data '{"query": "{
endpoint {
catalog(id:\"http://gss-data.org.uk/catalog/datasets\") {
catalog_query(search_string: \"climate change\") {
datasets {
id
title
}
}
}
}
}"}'
A common use case is to have a static shape of query, which needs parameterised by one or more query variables. This can be done by supplying additional query variables as JSON.
Firstly we need to name the query and provide the parameters it takes.
We use define the $query_string
variable with the type String
, and
we bind this to the appropriate part of our schema:
query textQuery($query_string: String) {
endpoint {
catalog(id: "http://gss-data.org.uk/catalog/datasets") {
id
catalog_query(search_string: $query_string) {
datasets {
id
label
publisher
}
}
}
}
}
We can then supply this parameter in an accompanying JSON map
{"query_string": "rain"}
In the GraphQL over HTTP specification, this map is then supplied
under the top level variables
key:
curl 'http://graphql-prototype.gss-data.org.uk/api' \
-X POST \
-H 'content-type: application/json' \
--data '{"query": "{
endpoint {
catalog(id:\"http://gss-data.org.uk/catalog/datasets\") {
catalog_query(search_string: \"climate change\") {
datasets {
id
title
}
}
}
}
}"}'
For more information on parameterised queries see the GraphQL tutorial.
The following features are currently unsupported, but could be supported in future iterations of the prototype or a production implementation of it:
- Faceted search result sparseness detection
- Sorting results
- by relevance
- by modified time
- by title
- Paginated results
- Global Object Identification
- Querying arbitrary RDF metadata
- Dynamically building a JSON-LD context for GraphQL results