Synthetic Data Generation: A Review, Shanshan Wang, Yuze Li, Haifeng Wang, Xing Xie, Yong Li, 2021IEEE Transactions on Big Data, Vol. 9 (IEEE)DOI: 10.1109/TBDATA.2021.3134976 - This review article details various synthetic data generation methods and evaluates their limitations, including data fidelity, privacy concerns, and potential biases.
The Reality Gap in Synthetic Data Generation, Avi Frid, Miri Dagan, 2021IEEE Transactions on Knowledge and Data Engineering, Vol. 34 (IEEE)DOI: 10.1109/TKDE.2021.3129486 - This paper focuses on the 'reality gap' challenge, where synthetic data fails to fully capture the subtle features of real data, potentially affecting downstream model performance.
How to Measure the Quality of Synthetic Data: A Survey, Eunsun Kim, Jinsuk Kim, Seungjoo Lee, Minsun Kim, 2021Applied Sciences, Vol. 11DOI: 10.3390/app112110186 - This survey offers an overview of metrics and methods for assessing the fidelity and utility of synthetic data, directly addressing the complexities of its validation.