Explaining and Harnessing Adversarial Examples, Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy, 2014arXiv preprint arXiv:1412.6572DOI: 10.48550/arXiv.1412.6572 - Introduces the Fast Gradient Sign Method (FGSM) and the linearity hypothesis, providing an early explanation for adversarial examples and their transferability.
Practical Black-Box Attacks against Machine Learning Systems using Adversarial Examples, Nicolas Papernot, Patrick McDaniel, Arushi Sinha, Z. Berkay Celik, Somesh Jha, 2017Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security (AsiaCCS '17) (ACM)DOI: 10.1145/3052973.3053009 - Details the methodology for black-box attacks that rely on the transferability of adversarial examples through substitute models, a core method described in the section.
Ensemble Adversarial Training: Attacks and Defenses, Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, Patrick McDaniel, 2018International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1705.07204 - Explores the transferability of adversarial examples, particularly in the context of improving adversarial training and evaluating defense mechanisms.
Adversarial Examples Are Not Bugs, They Are Features, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, Aleksander Madry, 2019Advances in Neural Information Processing Systems 32 (NeurIPS 2019)DOI: 10.48550/arXiv.1905.02175 - Provides a perspective on the existence and transferability of adversarial examples by linking them to non-robust yet predictive features learned by models.