Development of a Data Provenance Analysis Tool for Python Bytecode

Data Provenance describes the origins of data. In this thesis we propose an approach of computing data provenance when data of interest is created by a Python program and present a tool implementing this approach. In imple- menting the tool, we adopt the following idea. In the first step, the program is instrumented to log data. In the second step, the code is analyzed with respect to the logged data and provenance information is computed. A novel aspect of this work is instrumentation and analysis of the bytecode of the given Python program with the aim of computing data provenance.

Contact

Tobias Müller