Skip to content

Latest commit

 

History

History
381 lines (320 loc) · 12.2 KB

guide.md

File metadata and controls

381 lines (320 loc) · 12.2 KB

GraphQL Prototype Guide

This is a guide and introduction to querying the beta.gss-data.org.uk site with a new prototype GraphQL service.

This document contains example queries, with links which demonstrate them in the GraphiQL playground environment.

Rationale

This proof of concept prototype exists primarily to prove the following concepts:

  1. That GraphQL can serve as an easier and more familiar way for developers to access some of the data we hold. For targetted use cases.
  2. That GraphQL schema can serve as a better interface to the data than SPARQL for some use cases, and be more easily aligned and optimised towards user and platform needs. SPARQL is an incredibly flexible query language, but that generality brings a performance cost, as SPARQL has to target all possible use cases, where as a given GraphQL schema can target and optimise performance for specific needs. We can then hide optimised services behind GraphQL schema/resolvers.
  3. That we can seamlessly bridge the divide between Linked Data and GraphQL.

This prototype does not replace SPARQL, internally it dynamically generates SPARQL queries to perform catalog and ultimately provide an API for faceted search.

Motivating Use Case: Catalog Search

We developed a prototype schema which we believe with some further refinement could support the proposed designs for our faceted catalog search.

We also believe the schema can be evolved in the future to support other needs such as querying and filtering datasets. For the time being these use cases are left out.

Using the GraphiQL playground to build queries

We provide a GraphiQL environment to assist in the creation of GraphQL queries, conforming to our schema at the following address:

http://graphql-prototype.gss-data.org.uk/ide

This interface is wired up to query the public beta.gss-data.org.uk service.

The GraphiQL interface provides automated tab-completion based upon our GraphQL schema, which means that most queries can be written by auto-complete, and selecting the appropriate field or argument from the drop down.

This is one of the main reasons GraphQL is easier for users to write than SPARQL.

Querying the catalog

The simplest 'useful' query we currently support is:

{
  endpoint {
    catalog {
      id
      label
    }
  }
}

Load Query into GraphiQL

This returns some basic metadata about the default catalog, if you're not familiar with GraphQL you will notice that the shape of the query directly mirrors the shape of the results:

{
  "data": {
    "endpoint": {
      "catalog": {
        "id": "http://gss-data.org.uk/catalog/datasets",
        "label": "Datasets"
      }
    }
  }
}

What is endpoint and why is it in the schema? endpoint represents the endpoint we're currently connected to, this exists to allow appropriately authenticated users to query things out of specific draftsets, e.g.

We plan to eventually support queries like this, which will ask the same question of the appropriate draftset, at which point you will also be able to access appropriate metadata from the draft.

Though be aware that we may wish to make some breaking schema changes around this (#18).

{
  endpoint(draftset_id: "3703f62a-7c33-4d89-be60-d47aedbb9d1f") {
    catalog {
      id
      label
    }
  }
}

Listing all datasets

The following query will list all datasets in the catalog and return their id/uri and title:

{
  endpoint {
    catalog {
      catalog_query {
        datasets {
          id
          title
          # <-- Put cursor to the left of this and press `CTRL-<space>` to auto complete additional metadata fields
        }
      }
    }
  }
}

Load Query into GraphIQL

Be sure to note that if in GraphiQL you put the cursor at the appropriate point, you can complete additional metadata fields such as publisher, creator, theme and modified.

It's worth mentioning we have only added a subset of the possible ones to the prototypes graphql schema and can introduce more as needed.

We also have a proposal for providing access to arbitrary RDF predicates without modifying the GraphQL schema should it be required for us to support arbitrary extensibility.

Finding datasets by keyword search

This is easily done by providing a search_string argument to the catalog_query field:

{
  endpoint {
    catalog {
      catalog_query(search_string: "climate change") {
        datasets {
          id
          title
        }
      }
    }
  }
}

Load Query into GraphIQL

The prototype currently implements this with a basic snowball token stemmer over the title and description fields, there is currently no attempt to order results by relevance through a vector space algorithm such as tf/idf. Switching to a lucene based index would be an obvious way to provide this.

Faceted Search Queries

NOTE: We plan to offer a faceted search query interface via the GraphQL API. We have a basic schema already, which should support this, though the implementation is not yet finished/correct.

We have designed the schema for this feature, and describe it here. It is based upon Gareth's initial wireframes for what this might look like in the UI:

{
  endpoint {
    catalog {
      catalog_query(search_string:"climate change") {
        datasets {
          id
          title
          publisher
          creator
          theme
        }
        facets {
          themes {
            id
            label
            enabled
          }
          publishers {
            id
            label
            enabled
          }
          creators {
            id
            label
            enabled
          }
        }
      }
    }
  }
}

Load Query into GraphIQL

The facets fields, themes, publishers, creators, will each return information on the enabled state of all available facets, given the current query constraints.

Each facet has an id (a URI), a label and an enabled state. A facet will have enabled set to true if selecting that facet would return results. The purpose of this is to assist users in making selections that only result in data. For example given a query on climate change you would likely not want to let users see or select the balance of payments as selecting it in combination with a climate change search string would, (for the purposes of this example at least) be a dead end selection with no results.

A query for "climate change" within the energy theme facet would then look like this:

{
  endpoint {
    catalog {
      catalog_query(search_string:"climate change"
      						  themes:["http://gss-data.org.uk/def/gdp#energy"]) {
        datasets {
          id
          title
          publisher
          creator
          theme
        }
        facets {
          themes {
            id
            label
            enabled
          }
          publishers {
            id
            label
            enabled
          }
          creators {
            id
            label
            enabled
          }
        }
      }
    }
  }
}

Querying programmatically

Below is an example query posted to the /api endpoint.

GraphQL queries must be wrapped into a JSON object conforming to the GraphQL over HTTP specification.

Essentially they must be wrapped into a JSON object with at a minimum a query key:

curl 'http://graphql-prototype.gss-data.org.uk/api' \
  -X POST \
  -H 'content-type: application/json' \
  --data '{"query": "{
  endpoint {
    catalog(id:\"http://gss-data.org.uk/catalog/datasets\") {
      catalog_query(search_string: \"climate change\") {
        datasets {
          id
          title
        }
      }
    }
  }
}"}'

Parameterised Queries

A common use case is to have a static shape of query, which needs parameterised by one or more query variables. This can be done by supplying additional query variables as JSON.

Firstly we need to name the query and provide the parameters it takes. We use define the $query_string variable with the type String, and we bind this to the appropriate part of our schema:

query textQuery($query_string: String) {
  endpoint {
    catalog(id: "http://gss-data.org.uk/catalog/datasets") {
      id
      catalog_query(search_string: $query_string) {
        datasets {
          id
          label
          publisher
        }
      }
    }
  }
}

We can then supply this parameter in an accompanying JSON map

{"query_string": "rain"}

In the GraphQL over HTTP specification, this map is then supplied under the top level variables key:

curl 'http://graphql-prototype.gss-data.org.uk/api' \
  -X POST \
  -H 'content-type: application/json' \
  --data '{"query": "{
  endpoint {
    catalog(id:\"http://gss-data.org.uk/catalog/datasets\") {
      catalog_query(search_string: \"climate change\") {
        datasets {
          id
          title
        }
      }
    }
  }
}"}'

For more information on parameterised queries see the GraphQL tutorial.

Unsupported Features

The following features are currently unsupported, but could be supported in future iterations of the prototype or a production implementation of it: