Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change algorithm of UNION-based bind join #344

Open
hartig opened this issue Jul 5, 2024 · 1 comment
Open

Change algorithm of UNION-based bind join #344

hartig opened this issue Jul 5, 2024 · 1 comment
Labels
good first issue Good for newcomers

Comments

@hartig
Copy link
Member

hartig commented Jul 5, 2024

Our current UNION-based bind join algorithm (see PhysicalOpBindJoinWithUNION and ExecOpBindJoinSPARQLwithUNION) creates a UNION pattern with FILTERs added into each UNION part. That's unnecessarily complex for the accessed SPARQL endpoint because the given graph pattern before the FILTER is repeated within each of the UNION parts.

A better idea is to create a UNION pattern where each part of the UNION is a version of the given graph pattern in which the join variables have been substituted by applying one of the solutions of the current input batch. To be able to figure out which of the UNION parts a retrieved solution mapping comes from and, thus, to be able to figure out which of the input solutions the retrieved solution has to be joined with, each UNION part needs to be extended with a BIND clause of the form BIND( x AS ?cnt ) where ?cnt is a new variable (needs to be the same in all UNION parts) and x is an integer that is different for each UNION part.

@hartig hartig added the good first issue Good for newcomers label Jul 5, 2024
@hartig
Copy link
Member Author

hartig commented Nov 21, 2024

Actually, instead of adding BIND clauses, one of the variables that remains after substituting (which, thus, is not a join variable) should all be renamed in each of the UNION parts such that it is a different variable in each UNION part (e.g., ?x is renamed to ?x1, ?x2, etc. -- but careful! there shouldn't be another variables with any of these new variable names). Then, depending on which of these new variables is bound in a given solution mapping retrieved from the SPARQL endpoint, it is possible to figure out which of the input solution mappings is the corresponding join partner. This idea is called bound join in the FedX paper and, in comparison to both unions with filters and unions with BIND clauses, it reduces both the size of the request queries and the size of the responses to these queries (i.e., the amount of data that is shipped both ways between HeFQUIN and the SPARQL endpoint).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant