A Survey of Synthetic Data Generation Methods for Tabular Data, Viktoriia Borisov, Tobias Leemann, Jörg Huber, Marc Fischer, Tobias Böhm, Michele F. Sutter, and Stefan Feuerriegel, 2022IEEE Transactions on Knowledge and Data Engineering (IEEE)DOI: 10.1109/TKDE.2022.3214876 - This comprehensive survey provides a detailed overview of synthetic data generation methods for tabular data, a primary focus of the course. It covers various approaches and discusses their strengths and weaknesses.
What makes good synthetic data? Statistical and data utility metrics for generative models, Kyle M. O’Donoghue, Ryan D. Hays, Andrew A. Kress, and Daniel E. Casey, 2022Journal of the American Medical Informatics Association, Vol. 29 (Oxford University Press)DOI: 10.1093/jamia/ocac014 - This paper specifically examines and categorizes various metrics for evaluating the quality of synthetic data, focusing on both statistical fidelity and practical utility for machine learning tasks.
A Survey on Privacy-Preserving Synthetic Data Generation, Sangmin Kim, Tuan Anh Nguyen, and Yansong Li, 2021ACM Computing Surveys, Vol. 54 (Association for Computing Machinery)DOI: 10.1145/3478546 - This survey reviews methods and considerations for generating synthetic data while preserving privacy, including discussions on techniques relevant to anonymization and advanced privacy mechanisms.
Synthetic Data Generation for Machine Learning: A Practical Guide, Alistair Jordon, Paul van der Putten, Jinsung Yoon, Mihaela van der Schaar, and Richard Schmarzo, 2022 (IBM Research) - This practical guide from IBM Research offers a clear introduction to synthetic data generation for machine learning, explaining fundamental concepts and their applications.