Research Project Pathfinder
XQuery Join Graph Isolation Print

Join graphs in databases

SQL query optimizers strive to produce query plans whose primary components are join graphs—bundles of relations interconnected by join predicates—while a secondary, peripheral plan tail performs further filtering, grouping, and sorting. Plans of this particular type are subject to effective optimization strategies that, taking into account the available indexes and applicable join methods, derive equivalent join trees, ideally with a left-deep profile to enable pipelining. For more than 30 years now, relational query processing infrastructure has been tuned to excel at the evaluation of plans of this shape.

The approach

Based on the loop lifting compilation scheme, we compile XQuery queries into logical algebra plans. Both the compositional language and compilation scheme lead to stacked plans whose shapes differ considerably from the ideal join graph + plan tail. Instead, join operators occur in sections distributed all over the plan. A similar distribution can be observed for the blocking operators δ and ρ (duplicate elimination and row ranking). This is quite unlike the algebraic plans produced by SQL SELECT-FROM-WHERE block compilation.

The omnipresence of blocking operators obstructs join operator movement and planning and leads industrial-strength optimizers, e.g., IBM DB2 UDB V9, to execute the plan in stages that read and then again materialize temporary tables.

Here, we propose a plan rewriting procedure that derives join graphs from the initial plans. The resulting plan may be expressed as a single SELECT-DISTINCT-FROM-WHERE-ORDERBY block to be submitted for execution by an off-the-shelf RDBMS. The database system then evaluates this query over a schema-oblivious tabular encoding of XML documents to compute the encoding of the resulting XML node sequence (which may then be serialized to yield the expected XML text). Alternatively the isolated join graph plans represent the input for ROX: Run-time Optimization of XQueries.

Join graph isolation

The join graph isolation operates on a logical table algebra, with two non-standard operators ρ and # (row ranking and row numbering). The row ranking operator ρ enables the correct representation of XQuery’s pervasive sequence order notion on a logical algebra level. In any SQL:1999 database ρ and # can be expressed by means of the RANK() and ROW_NUMBER() window functions.

Example of Join Graph Isolation

We account for the unusual tall shape and substantial size (of the order of 100 operators and beyond for typical benchmark-type queries) of the initial plan DAGs by a peep-hole-style rewriting process. For all operators op, a property inference collects relevant information about the plan vicinity of op. The applicability of a rewriting step may then decided by inspection of the properties of a single operator (and its closer neighborhood) at a time.

Properties

For all operators op in the plan, we infer the following four properties:

  • icols This property records the set of input columns strictly required to evaluate op and its upstream plan. The icols columns are inferred during a top-down DAG walk. Among other uses, property icols facilitates projection pushdown
  • const A set with elements of the form a = c, indicating that all rows in the table output by op hold value c in column a. This property is seeded at the plan leaves and inferred bottom-up.
  • key The set of candidate keys generated by op (bottom-up).
  • set Boolean property set communicates whether the output rows of op will undergo duplicate elimination in the upstream plan. Inferred top-down.

Rewrites

The rewrite rules operate on either a single or at most two-adjacent operators and take into account the inferred properties. The rewrite process is guided by three goals, that:

  • establish a single ρ operator in the plan tail,
  • push-down and remove join operators, and
  • establish a single δ operator in the plan tail.

For the above example query doc("auction.xml")/descendant::open_auction[bidder] a rewrite movie (1.3 MB) demonstrates the join graph isolation process.

The effect of join graph isolation

Once cast into the shape of join graphs, we have found off-the-shelf relational query optimizers—the B-tree indexing subsystem and join tree planner, in particular—to cope and even be autonomously capable of “reinventing” advanced processing strategies that have originally been devised specifically for the XQuery domain. In DB2 we, e.g., observed concepts like:

  • element tag streams,
  • branching nodes (discussed in the context of holistic twig joins),
  • XPath step reordering and axis reversal, and
  • existential semantics.

Related material