Research Project Ferry
Nautilus Print

News

  • 11/07/25: Paper got accepted at the QDB Workshop colocated with VLDB 2011.
  • 11/04/04: Tim Belhomme has joined the Nautilus team as a research assistant.
  • 11/03/01: Hanno Eichelberger has joined the Nautilus team as a student member.
  • 10/12/13: Project receives funding from the Baden-Württemberg Stiftung.
  • 10/06/14: Paper got accepted at the VLDB 2010 conference.

 

 

"The deepest parts of [data transformation] are totally unknown to us. No soundings have been able to reach them. What goes on in those distant depths? What [data and transformations] inhabit, or could inhabit, those regions [...]? What is the constitution of these [...]? It's almost beyond conjecture.” A slightly altered version of an excerpt of Jules Verne's 20,000 Leagues under the Sea, Chapter II.

Overview

Database developers may formulate a thought similar to the one above when designing or modifying database applications that derive data by applying queries or, in general, transformations to source data. Indeed, using declarative languages such as SQL to specify queries, developers often face the problem that they cannot properly inspect or debug their query or transformation code. All they see is the tip of the iceberg once the result data is computed. If it does not comply with the developers’ expectation, they usually perform one or more tedious and mostly manual analyze-fix-test cycles until the expected result occurs. The goal of Nautilus is to support developers in this process by providing a suite of algorithms and tools to accompany the process.

The high-level architecture of Nautilus depicted below shows the main components that support each phase of the test-analyze-fix development cycle. Users interact with Nautilus via a graphical user interface (GUI). The current implementation of Nautilus is in form of an Eclipse plugin. In the following, we briefly discuss the goal of each component within the test-analyze-fix cycle.

Nautilus Architecture

 

Analyze (Explanation Manager)

Given a data transformation, Nautilus allows developers to analyze what is going on below the surface, and thus serves as a debugging tool for queries or data transformation processes. To this end, Nautilus generates so called explanations (explanation generator). Intuitively, explanations describe why result data of a transformation exists (related to data lineage/data provenance), or why expected data may be missing. Explanations are computed based on the data transformation itself as well as on the actual instance data.

Currently, Nautilus allows developers to analyze SQL queries by asking the question of why data a developer expected is not in the result of a query. The explanations Nautilus produces using the Artemis algorithm [1] explain missing data in terms of the source data (instance-based explanations), but in the future we also plan to provide explanations that consist of data transformation operators (query-based explanations), and explanations that combine both (hybrid explanations).

When returning explanations to a user, these are ranked so that, intuitively, the most interesting explanations are displayed first (explanation ranker). The developer can then mark explanations as relevant and irrelevant (explanation annotator) and based on these annotations, Nautilus infers further annotations to avoid computing and displaying unnecessary explanations to the developer, or to improve ranking (explanation annotation analyzer).

 

Analysis configuration screenshot Explanation Navigator screenshot
Input configuration (click image for larger version): what queries and missing tuples should be explained by instance-based explanations? Explanation navigator (click image for larger version): through different views, navigate through the set of returned instance-based explanations.

 

 

 

 

 

Fix (Query Modification Manager)

Explanations allow developers to better understand the queries and transformations they formulate. But Nautilus will go even further as we plan to generate sensible suggestions to repair the analyzed transformations, thus supporting the fixing phase of the previously manual analyze-fix-test cycle.

The computation of query modifications (modification generator) is based on explanation and their annotations, as determined during the analysis phase. Given a set of explanations and annotations, many modifications are in principal possible, but some make more sense than others. For instance, correcting a join into an outer join is more likely to be correct than replacing a join by a cross product. Based on such heuristics, Nautilus ranks query modifications (modification ranker) before displaying them to the user. Similarly to explanations, a developer can review and annotate query modifications (modification annotator). Based on these annotations, further knowledge is inferred (modification annotation analyzer).

 

Test (Development Cycle Manager)

Based on suggested modifications, the developer devises a new query that becomes the new version of the data transformation. However, it is possible that further errors in the query remain, or that the last modification was in fact not a correct one. Therefore, Nautilus has to keep track of changes that occur during the development cycle and needs to notify the user of the impact his query modifications have (modification impact analyzer). The developer can then decide whether the observed impact is acceptable or not (modification impact annotator).

Based on annotations a developer provides during the development process, the initially stated problem, the debugging scenario may change over time (debugging scenario manager). Also, as a developer follows the analysis-fix-test process, new global knowledge may be acquired by Nautilus that influences different components (AFT-inference engine).

 

Publications

  1. Transformation Lifecycle Management with Nautilus
    Melanie Herschel, Torsten Grust.
    In Proceedings of the 9th International Workshop on Quality in Databases (QDB 2011), collocated with VLDB 2011. Seattle, USA.
  2. Explaining Missing Answers to SPJUA Queries (also available: slides)
    Melanie Herschel, Mauricio A. Hernández.
    Proceedings of the VLDB Endowment, Volume 3, September 2010.
  3. Artemis: A System for Analyzing Missing Answers (also available: poster)
    Melanie Herschel, Mauricio A. Hernández, Wang Chiew Tan.
    Proceedings of the VLDB Endowment, Volume 2, August 2009.
    This work was done while Melanie Herschel was a post-doc researcher at the IBM Almaden Research Center.

 

Talks

  • Transformation Lifecycle Management with Nautilus (Slides)
    • QDB 2011 in conjunction with VLDB 2011, Seattle, 29. August 2011
  • TLM - Transformation Lifecycle Management (Slides)
    • Technical University Dresden, 30. May 2011
    • Leo Seminar, INRIA Saclay, 28. March 2011
  • Transformation Lifecycle Management mit Nautilus (Slides in German), IBM Böblingen, 17. Feb. 2011
  • Query Analysis with Nautilus (Movie*)
    • Max-Planck-Institute for Biological Cybernetics, Tübingen, 25. Nov. 2009
    • DIMA Kolloquium, Technical University Berlin, 23. Nov. 2009

*Movies are QuickTime exports of the presentation slides and include the original slide animations. When viewed with the QuickTime Player, viewers can advance through the slideshow by clicking the mouse or pressing Play (in the QuickTime controls), or by pressing the Space bar on the keyboard.

 

Team

 

Research on Nautilus is supported by
the Baden-Württemberg Stiftung