News
DSH — Database-Supported Haskell
Database-Supported Haskell, DSH for short, is a Haskell library for database-supported program execution. Using the DSH library, a relational database management system (RDBMS) can be used as a coprocessor for the Haskell programming language, especially for those program fragments that carry out data-intensive and data-parallel computations. Rather than embedding a relational language into Haskell, DSH turns idiomatic Haskell programs into SQL queries.
DSH in the Real World.
We have used DSH for large scale data analysis. Specifically, in collaboration with researchers working in social and economic sciences, we used DSH to analyse the entire history of Wikipedia (terabytes of data) and a number of online forum discussions (gigabytes of data).
Because of the scale of the data, it would be unthinkable to conduct the data analysis in Haskell without using the database-supported program execution technology featured in DSH. We have formulated several DSH queries directly in SQL as well and found that the equivalent DSH queries were much more concise, easier to write and maintain (mostly due to DSH’s support for nesting, Haskell’s abstraction facilities and the monad comprehension notation, see below).
One long-term goal is to allow researchers who are not necessarily expert programmers or database engineers to conduct large scale data analysis themselves.
Towards a New Compilation Strategy.
As of today, DSH relies on a query com- pilation strategy coined loop-lifting. Loop-lifting comes with important and desirable properties (e.g., the number of SQL queries issued for a given DSH program only depends on the static type of the program’s result). The strategy, however, relies on a rather complex and monolithic mapping of programs to the relational algebra. To remedy this, we are currently exploring a new strategy based on the flattening transformation as conceived by Guy Blelloch. Originally designed to implement the data-parallel declarative language NESL, we revisit flattening in the context of query compilation (which targets database kernels, one particular kind of data-parallel execution environment). Initial results are promising and DSH might switch over in the not too far future. We hope to further improve query quality and also address the formal correctness of DSH’s program-to-queries mapping.
Related Work.
Motivated by DSH we reintroduced the monad comprehension notation into GHC and also extended it for parallel and SQL-like comprehensions. The extension is available in GHC 7.2.
Get DSH
The DSH library and the FerryCore package it uses are available on Hackage (http://hackage.haskell.org/package/DSH). If you have cabal installed on your system you can also install DSH by typing "cabal install DSH" in your terminal.
Publications
- Bringing Back Monad Comprehensions (also available: Slides and Video).
George Giorgidze, Torsten Grust, Nils Schweinsberg, and Jeroen Weijers. In Proceedings of the ACM SIGPLAN Haskell Symposium (Haskell 2011), Tokyo, Japan. ACM, 2011.
- Haskell Boards the Ferry: Database-Supported Program Execution for Haskell (also available: Slides).
George Giorgidze, Torsten Grust, Tom Schreiber, Jeroen Weijers. Proceedings of the 22nd Symposium on Implementation and Application of Functional Languages (IFL 2010), Alphen aan den Rijn, Netherlands, September 2010, To be published by Springer LNCS (to appear in 2011), Best Paper Award.
Talks
Team
|