The XPath accelerator paper, describing the efficient evaluation of XPath axes based on a pre/post encoding, marks the beginning of the research project Pathfinder in 2002. Since then a lot has happened. The article Pathfinder: XQuery Off the Relational Shelf gives a good overview of the ideas that lead to the efficient evaluation of XQuery queries on top of SQL database systems.
In 2005 one important milestone was the first release of MonetDB/XQuery. By then it was based on the loop lifting XQuery compilation technique, the staircase join algorithms, and a first syntactic join recognition.
While the earlier versions of MonetDB/XQuery lacked the logical algebra representation, various additional efforts were based on it. Among these are support for XQuery updates, XRPC, the integration of the full text engine Tijah, and Stand-Off path steps.
With the intermediate logical algebra in place, we integrated the algebraic optimization of XQuery queries into Pathfinder. The optimizations led among others to the removal of a large share of XQueries inherent order constraints, a stable join detection, and the isolation of join graphs. These optimizations turned SQL database systems into efficient XQuery back-end and also enabled the runtime query optimization of join graphs. |
Join graphs in databases
SQL query optimizers strive to produce query plans whose primary components are join graphs—bundles of relations interconnected by join predicates—while a secondary, peripheral plan tail performs further filtering, grouping, and sorting. Plans of this particular type are subject to effective optimization strategies that, taking into account the available indexes and applicable join methods, derive equivalent join trees, ideally with a left-deep profile to enable pipelining. For more than 30 years now, relational query processing infrastructure has been tuned to excel at the evaluation of plans of this shape.
The approach
Based on the loop lifting compilation scheme, we compile XQuery queries into logical algebra plans. Both the compositional language and compilation scheme lead to stacked plans whose shapes differ considerably from the ideal join graph + plan tail. Instead, join operators occur in sections distributed all over the plan. A similar distribution can be observed for the blocking operators δ and ρ (duplicate elimination and row ranking). This is quite unlike the algebraic plans produced by SQL SELECT-FROM-WHERE block compilation.
The omnipresence of blocking operators obstructs join operator movement and planning and leads industrial-strength optimizers, e.g., IBM DB2 UDB V9, to execute the plan in stages that read and then again materialize temporary tables.
Here, we propose a plan rewriting procedure that derives join graphs from the initial plans. The resulting plan may be expressed as a single SELECT-DISTINCT-FROM-WHERE-ORDERBY block to be submitted for execution by an off-the-shelf RDBMS. The database system then evaluates this query over a schema-oblivious tabular encoding of XML documents to compute the encoding of the resulting XML node sequence (which may then be serialized to yield the expected XML text). Alternatively the isolated join graph plans represent the input for ROX: Run-time Optimization of XQueries.
|
|
Read more...
|
|
|
|
|
|
|