Towards Deep Learning Models Resistant to Adversarial Attacks, Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu, 2018International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1706.06083 - This seminal paper formalizes adversarial training as a minimax optimization problem and introduces Projected Gradient Descent (PGD) based adversarial training, establishing it as a highly effective defense strategy.
Explaining and Harnessing Adversarial Examples, Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy, 2014International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1412.6572 - Introduces the Fast Gradient Sign Method (FGSM) for generating adversarial examples and proposes FGSM adversarial training, a simpler and faster, albeit often less robust, alternative to PGD-AT.
Theoretically Principled Trade-off between Robustness and Accuracy, Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, Michael I. Jordan, 2019International Conference on Machine Learning (ICML)DOI: 10.48550/arXiv.1901.08573 - Proposes TRADES, a method that explicitly addresses the accuracy-robustness trade-off in adversarial training by introducing a regularized loss function based on KL divergence.