-
Notifications
You must be signed in to change notification settings - Fork 2
Query Processing in HeFQUIN
Queries to be processed by the HeFQUIN engine are SPARQL queries. Given such a query, the query processor component of the engine first invokes the query planner which consists of a source planner and a query optimizer. The purpose of the source planner is to create a logical query plan that presents a source assignment for the given query, and the query optimizer selects a corresponding physical query plan for the logical one, with the aim to find a physical plan based on which the query result can be produced efficiently. Once the query processor component has obtained the selected physical plan, it invokes the plan compiler which converts the physical plan into an executable plan that can then be passed to the query execution component. This component takes care of running the executable plan and having it write the query result to a given result sink from which the result can be consumed.
The remainder of this page describes the steps of the whole process and the corresponding types of query plans in more detail, and it also includes pointers to the corresponding components in the source code of HeFQUIN.
HeFQUIN's query processor component as a whole is captured by an implementation of the QueryProcessor
interface. Every SPARQL query to be processed is passed to this component in the form of a SPARQLGraphPattern
object, which currently is just a wrapper of an Apache Jena Op
object that captures an algebraic representation of the SPARQL query pattern.
The query processor passes the given query to its query planner (QueryPlanner
) which, in turn, passes it on to the source planner (SourcePlanner
). The purpose of the source planner would be to produce a logical query plan that presents a source assignment for the given query. However, the source planner is not actually implemented yet and, instead, we assume that the source assignment is already specified in the given queries by means of the SERVICE feature of SPARQL. Therefore, the only task of the source planner for the moment is to create a logical plan, which is currently implemented in the SourcePlannerImpl
class.
Logical query plans are captured by the LogicalPlan
interface and consist of logical operators.
Each such logical plan has a root operator, which can be one of the operators defined by our FedQPL language (see our FedQPL paper for the details). The main interface for these operators is called LogicalOperator
and some of implementing classes are LogicalOpRequest
, LogicalOpTPAdd
, LogicalOpBGPAdd
, LogicalOpJoin
, LogicalOpUnion
, LogicalOpMultiwayJoin
, and LogicalOpMultiwayUnion
(the latter two represent operators to join/union multiple inputs, not just two).
Additionally, logical plans may have subplans where each of them represents a logical plan to produce the intermediate result that is one of the inputs to the root operator. Consequently, the number of subplans that a logical plan has depends on the arity of the root operator and, thus, we distinguish the following subtypes of the LogicalPlan
interface: LogicalPlanWithNullaryRoot
, LogicalPlanWithUnaryRoot
, LogicalPlanWithBinaryRoot
, and LogicalPlanWithNaryRoot
.
Physical query plans are captured by the PhysicalPlan
interface and, exactly as described for the logical plans, each physical plan has a root operator and may have subplans.
The main interface for the physical operators in this case is the PhysicalOperator
interface. In contrast to the logical operators, each type of physical operator has a concrete algorithm for producing its output from its input(s). This algorithm is captured in the form of an ExecutableOperator
that can be obtained via the createExecOp
method of the PhysicalOperator
interface and that will be used when creating an executable plan (see below).
While, in principle, there does not need to be a direct mapping between the physical operators and the logical operators for which they are used, in HeFQUIN at the moment we only have physical operators that map directly to one of the logical operators (which we have captured by means of a sub-interface called PhysicalOperatorForLogicalOperator
). The actual mapping from the logical operators to the respective default type of physical operator is implemented in a helper class called LogicalToPhysicalOpConverter
. Another relevant helper class in this context is PhysicalPlanFactory
which provides convenience methods for creating physical plans based on any given logical operator; this includes methods not only for creating physical plans with the corresponding default physical operator as root, but also for physical plans with any other possible physical operator as root. Finally, the conversion of a complete logical plan into a corresponding physical plan is provided by an implementation of the LogicalToPhysicalPlanConverter
interface, which uses the PhysicalPlanFactory
with the default mapping options.
[[ TODO: refer to the different optimizers (implementations of the QueryOptimizer
interface) and emphasize that many of them rely on a cost model (see the CostModel
interface) and on rewriting rules (see the RewritingRule
interface and the collection of rewriting rules that we currently have in the RuleInstances
class) ]]
[[ TODO: refer to the plan compiler (QueryPlanCompiler
), executable plans and executable operators (relevant interfaces: ExecutablePlan
and ExecutableOperator
), and the query execution component (ExecutionEngine
) ]]