Extract and Sanitize PostgreSQL Query Parse Trees

The majority of the research projects in our group analyze, transform, and compile SQL queries. These projects vary widely but all of them rely on an internal representation of the incoming SQL query text.

PostgreSQL comes with a sophisticated SQL parser — that also incorporates a large number of semantic checks, type inference, and query simplifications/normalizations — and the output of this parser would be the ideal input for the research projects just mentioned.

It is the goal of this student project to extract and sanitize the PostgreSQL query parser output (available in PostgreSQL log files) and transform it into a rich and well-structured SQL query representation. The resulting piece of software will make for a crucial bit of infrastructure for our ongoing research efforts.

Students will

  • work with the members of the DB group to design a suitable query representation,
  • need to work with parsing tools (parser generators and/or combinator libraries),
  • create elaborate (tree-shaped) data structures, and
  • get to know some PostgreSQL internals.

Implementation languages: Python or Haskell.

Contact

Torsten Grust