Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solve N+1 issue for remote relationships to connector #7

Open
sordina opened this issue Jul 31, 2024 · 2 comments
Open

Solve N+1 issue for remote relationships to connector #7

sordina opened this issue Jul 31, 2024 · 2 comments
Assignees

Comments

@sordina
Copy link
Contributor

sordina commented Jul 31, 2024

When a remote relationship to this connector the classic N+1 issue arrises where N queries need to be performed for N records in a relationship.

For example if you distributed the chinook dataset across two postgres instances with one fronted by the PG connector and one fronted by a HasuraV2 instance and the GraphQL connector:

  • Subgraph 1
    • PG
      • Albums
    • PG Connector
  • Subgraph 2
    • PG
      • Artists
    • HasuraV2
    • GraphQL Connector

Then querying across the relationship like so, if there were 100 albums:

query {
    sg1_albums {
        title
        artist {
            name
        }
    }
}

would query the graphQL connector 100 times.

The solution to this issue for other connectors has been to implement the ForEach capability which then allows a query against the full set of records in the relationship at once, deferring the choice of what to do to the connector.

The current implementation of the GraphQL Connector does not support the ForEach capability and associated variables. However we can't simply add this and implement an optimised query because unlike SQL databases, upstream GraphQL schemas don't share singular and plural records behind the same interface (tables), and don't indicate relationships involved between fields.

In this example, imagine that the NDC query was sent to the connector:

{
    "query": {
         ... "table": "artist",
         ... "fields": ["name"]
    },
    "variables": [{
        "artist_id": 1,
        ...
    }]
}

We would hope that the upstream query would be something like:

query {
    artists(where: {artist_id: {in: [1,...]}}) {
        artist_id,
        name
    }
}

but how does the connector know that artists corresponds to artist and that the artist_id variable set should be put into the where: {artist_id: {in: filter.

In addition to this, once the query is performed it has to be decomposed into a resultset rather than just forward the results of the naive query.

So,

  • The core issue is how to represent the relationships between fields and variables.
  • A secondary issue is where to perform the translation logic.

There are many options for this, however my preference is:

  • We should ideally be able to encode these relationships in the graphQL schema directly with directives
  • This can then be extracted for use in queries, however it can also be added in the connector config independently of the schema
  • The translation and decomposition logic should be put into the engine so that it can be reused across other connectors without each connector having to implement it
  • Connectors can deliver this information to the engine via the NDC schema
  • If the engine implementation isn't feasible the it can be implemented via ForEach in the GraphQL connector to solve the immediate issue and act as a proof of concept

See Also

@sordina sordina self-assigned this Jul 31, 2024
@BenoitRanque
Copy link
Collaborator

Compromise proposal:

We implement the foreach capability for queries.

We generate queries that look like this:

query (n1_where: table_bool_expr, n2_where: table_bool_expr) {
  n1: table(where: $n1_where) { foo bar }
  n2: table(where: $n2_where) { foo bar }
}

Where each variable set gets a root field.
This reduces network calls from engine -> connector and connector -> underlying GraphQL server from N to 1.
The underlying GraphQL server is free to optimize from there, although most (including HGE v2) probably won't besides parallelizing execution.

I believe this is how v2 works for remote schemas.


The above proposal is admittedly a compromise, but GraphQL is poorly suited to solve this issue. V2 hasura does expose an api with collections where you could build giant where clauses and then split up the results, but that means we would have to

  • compose the where clause (easy, but v2 hasura specific)
  • split up the result, for which we would need to
  • be able to evaluate the where clauses, to split the results, for which we would need to
  • request all the fields needed by the where clause

This doesn't get into the specifics of relationships and other issues.

To me this feels like a lot of added complexity for minimal benefit, plus the added cost of making the entire thing more brittle.
IMHO we should favor a robust compromise solution. Not perfect but good enough.

@kenstott
Copy link

kenstott commented Jul 31, 2024

@sordina I'm not sure I understand the proposal. But I'll dive in anyway.

If, by custom directive, you mean to require the remote GraphQL schema to provide details on the remote relationship capability, then I think that's not a strong idea. Adding constraints or requirements on the remote GraphQL endpoint would reduce adoption. (Ignore if I misunderstood the proposal.) In addition, It might be better to make it a more general solution that could work across REST/OpenAPI, GraphQL, gRPC, etc.

Adding the details in the connector config (which defines the pattern for the in: style predicate or an equivalent API endpoint) seems like a better answer, which you could then adapt to any API style.

Meaning...

but how does the connector know that artists corresponds to artist and that the artist_id variable set should be put into the where: {artist_id: {in: filter.}}

The pattern required to do that has to be defined in the config, and it would be in some new, expanded properties in the existing remote relationship definition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants