We explore new ways to derive the provenance (or lineage) of data items that flow through programs or queries. Once this provenance information has been derived, we know
exactly which input items led the program (or query) to emit which output items (Why and Where Provenance), as well as
which program parts were involved in the computation of each single item (How Provenance).
Our exploration started with the analysis and instrumentation of Python programs used in Scientific Data Processing (in the context of the ScienceCampus Tübingen). We now tweak and transfer the resulting techniques such that they apply to the derivation of data provenance for relational queries, SQL in particular. There is the potential to derive very fine-grained provenance information for substantially larger SQL dialects than were considered up to now.
Proceedings of the 14th International Workshop on Theory and Practice of Provenance (TaPP 2022), collocated with SIGMOD 2022, Philadelphia, PA, USA, June 2022. To be published.
Proceedings of the 38th IEEE Int'l Conference on Data Engineering (ICDE 2022), Kuala Lumpur, Malaysia, May 2022. To be published.
PhD Thesis, Universität Tübingen, 2020.
Proceedings of the 44th Int'l Conference on Very Large Databases. PVLDB 11(11), pages 1536–1549. Rio de Janeiro, Brazil, August 2018.
10th USENIX Workshop on Theory and Practise of Provenance (TaPP 2018), London, UK, July 2018.
Proceedings of the VLDB 2016 PhD Workshop, New Delhi, India, September 2016.
Proceedings of the 19th Int'l Conference on Extending Database Technology (EDBT 2016), Bordeaux, France, March 2016.
Proceedings of the 41st Int'l Conference on Very Large Databases (VLDB 2015), Kohala Coast, Hawaii, USA, August 2015.
Proceedings of the 27th GI-Workshop Grundlagen von Datenbanken, Gommern, Germany, May 26-29, 2015.