Data Cleaning: Problems and Current Approaches, Erhard Rahm, Hong Hai Do, 2000IEEE Bulletin of the Technical Committee on Data Engineering, Vol. 23 (IEEE Computer Society) - Defines data cleaning problems and their handling, including how to identify and resolve duplicate records, setting a conceptual framework.
Data Quality and Entity Resolution, Larisa Gheorghe and Erhard Rahm, 2019Encyclopedia of Big Data Technologies (Springer, Cham)DOI: 10.1007/978-3-319-76840-5_37-1 - Provides an updated academic perspective on data quality and entity resolution, focusing on methods for detecting duplicates in large datasets.