Change algorithm of UNION-based bind join #344

hartig · 2024-07-05T14:31:28Z

Our current UNION-based bind join algorithm (see PhysicalOpBindJoinWithUNION and ExecOpBindJoinSPARQLwithUNION) creates a UNION pattern with FILTERs added into each UNION part. That's unnecessarily complex for the accessed SPARQL endpoint because the given graph pattern before the FILTER is repeated within each of the UNION parts.

A better idea is to create a UNION pattern where each part of the UNION is a version of the given graph pattern in which the join variables have been substituted by applying one of the solutions of the current input batch. To be able to figure out which of the UNION parts a retrieved solution mapping comes from and, thus, to be able to figure out which of the input solutions the retrieved solution has to be joined with, each UNION part needs to be extended with a BIND clause of the form BIND( x AS ?cnt ) where ?cnt is a new variable (needs to be the same in all UNION parts) and x is an integer that is different for each UNION part.

The text was updated successfully, but these errors were encountered:

hartig · 2024-11-21T10:43:28Z

Actually, instead of adding BIND clauses, one of the variables that remains after substituting (which, thus, is not a join variable) should all be renamed in each of the UNION parts such that it is a different variable in each UNION part (e.g., ?x is renamed to ?x1, ?x2, etc. -- but careful! there shouldn't be another variables with any of these new variable names). Then, depending on which of these new variables is bound in a given solution mapping retrieved from the SPARQL endpoint, it is possible to figure out which of the input solution mappings is the corresponding join partner. This idea is called bound join in the FedX paper and, in comparison to both unions with filters and unions with BIND clauses, it reduces both the size of the request queries and the size of the responses to these queries (i.e., the amount of data that is shipped both ways between HeFQUIN and the SPARQL endpoint).

hartig added the good first issue Good for newcomers label Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change algorithm of UNION-based bind join #344

Change algorithm of UNION-based bind join #344

hartig commented Jul 5, 2024

hartig commented Nov 21, 2024

Change algorithm of UNION-based bind join #344

Change algorithm of UNION-based bind join #344

Comments

hartig commented Jul 5, 2024

hartig commented Nov 21, 2024