Eliminating NULLs with Subsumption and Complementation

Jens Bleiholder • Melanie Herschel • Felix Naumann

Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, September 2011, Vol. 34, No. 3.

In a data integration process, an important step after schema matching and duplicate detection is data fusion. It is concerned with the combination or merging of different representations of one real-world object into a single, consistent representation. In order to solve potential data conflicts, many different conflict resolution strategies can be applied. In particular, some representations might contain missing values (NULL-values) where others provide a non-NULL-value. A common strategy to handle such NULL-values, is to replace them with the existing values from other representations. Thus, the conciseness of the representation is increased without losing information.

Two examples for relational operators that implement such a strategy are minimum union and complement union and their unary building blocks subsumption and complementation. In this paper, we define and motivate the use of these operators in data integration, consider them as database primitives, and show how to perform optimization of query plans in presence of subsumption and complementation with rule-based plan transformations.