We present Habitat, a declarative observational debugger for SQL. Habitat facilitates true language-level (not: plan-level) debugging of, probably flawed, SQL queries that yield unexpected results. Users may mark arbitrary SQL subexpressions—ranging from literals, over fragments of predicates, to entire subquery blocks—to observe whether these evaluate as expected.
From the marked SQL text, Habitat’s algebraic compiler derives a new query whose result represents the values of the desired observations. These observations are generated by the target SQL database host itself. Prior data extraction or extra debugging middleware is not required.
Habitat merges multiple observations into a single (nested) tabular display, letting a user explore the relationship of various observations. Filter predicates furthermore ease the interpretation of large results.
The Basic Idea: Mark & Observe
We built the observational SQL debugger Habitat that helps users to identify errors, or “bugs”, buried in queries. With Habitat, we pursue language-level debugging of logical flaws—that lead SQL queries to yield unexpected results or even runtime errors—and do not consider query engine or performance debugging.
Habitat enables users to mark arbitrary suspect (or interesting) SQL subexpressions of a buggy query. Given such markings, Habitat crafts new SQL queries that let the target RDBMS compute the value of the suspect subexpressions. Users observe and correlate these values, then narrow or widen their markings to hunt down the bug in an iterative, interactive process.
Habitat uses a context-free grammar for SQL to automatically extend arbitrary user-marked query text fragments to minimal syntactically complete subexpressions. The current version of Habitat accepts a SQL dialect depicted by the grammar below. Whereas non-terminal FS, defining a SQL fullselect, is the start symbol of this grammar, any subexpression that is derivable from the non-terminals FS, SC, P, or TBL is considered observable by the debugger. Non-terminal SC, for example, derives any scalar SQL expression, ranging from parenthesized nested fullselects to individual column references or literals.
FS ::= SELECT [ DISTINCT ] SC AS ID [ , SC AS ID ]*
FROM TBL [ AS ID] [ , TBL [ AS ID] ]*
[ GROUP BY COL [ , COL ]*
[ HAVING P ]?
[ ORDER BY SC [ , SC ]*
SC ::= COL
| SC + SC | SC * SC | ...
| CASE [ WHEN P THEN SC ]+ [ ELSE SC ] END
| CASE SC [ WHEN SC THEN SC ]+ [ ELSE SC ] END
| COUNT(*) | MAX(SC) | ...
P ::= (P)
| P AND P
| P OR P
| SC CMP SC
| SC [ NOT ] IN (FS)
| EXISTS (FS)
| SC CMP ANY (FS)
| SC CMP ALL (FS)
TBL ::= ID
| (FS) AS ID
CMP ::= < | <= | = | >= | > | <>
COL ::= [ ID. ] ID
ID ::= SQL identifier
Habitat makes use of a fine-grained query translation strategy very similar to the query compilation in the projects Pathfinder and Ferry. Instead of extracting queries from the original SQL query we compile the query into a very detailed algebraic query plan. This plan encodes any possible observation. A fragment of the algebra plan might be used to answer a single observation. To issue a SQL query on the database host we simplify the extracted algebra plans and generate a new SQL query.
Fabian Kliebhan and Hans-Joachim Ruscheweyh—two of our students—designed and built the first version of Habitat as a Java prototype. It compiles the markings into algebra plans. Subsequently, Pathfinder consumes these plans, optimizes them, and generates SQL code for them. Habitat then reads back the generated SQL queries, dispatches the queries to the database system, and renders the resulting observation display.
Up until now we haven't prepared a convenient package to get started with Habitat.
If you are willing to invest some time to first install Pathfinder and then Habitat please contact Jan Rittinger. For all others we hope to provide a more convenient way to try out Habitat in the months to come.