GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Günter Schmidhuber, 2017Advances in Neural Information Processing Systems, Vol. 30DOI: 10.5591/978-1-57766-311-6.196 - Introduces the Fréchet Inception Distance (FID), a widely used metric for evaluating the quality and diversity of samples generated by generative models, often adapted for conditional assessment.
Learning Transferable Visual Models From Natural Language Supervision, Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever, 2021Proceedings of the 38th International Conference on Machine Learning (ICML), Vol. 139 - Presents the CLIP model, a foundation for text-to-image and image-to-text tasks, and the basis for the CLIP score used to measure conditional consistency in text-to-image generation.
Generative Adversarial Networks: A Survey and Taxonomy, Junxian Gui, Zhentao Liu, Jia-Wang Bian, Zi-Yi Dou, Jun-Yan He, Xun Gong, Xiao-Fei Zhang, Wen-Da Qiu, Yong-Wei De, Yi-Hang Shen, Ding-Han Shen, Jian-Jun Li, Zhi-Yan Liu, Li-Fu Zheng, Rui-Shan Liu, De-Wei Kong, Xiao-Jie Jin, Wei-Ming Li, Jing-Hao Zhou, Rui Xu, 2022ACM Computing Surveys, Vol. 54DOI: 10.1145/3459676 - A comprehensive survey covering various aspects of Generative Adversarial Networks, including a dedicated section on evaluation metrics that provides a broad context for assessing generative model quality and diversity, relevant to conditional models.
Perceptual Losses for Real-Time Style Transfer and Super-Resolution, Justin Johnson, Alexandre Alahi, Li Fei-Fei, 2016European Conference on Computer Vision – ECCV 2016: Amsterdam, The Netherlands, October 8–16, 2016, Proceedings, Part IV (Springer, Cham)DOI: 10.1007/978-3-319-46475-6_43 - Introduces perceptual loss functions, which use features from pre-trained deep neural networks to measure image similarity. This approach is relevant for evaluating conditional generation tasks like style transfer or image-to-image translation where structural and stylistic fidelity are key.