Improved Training of Wasserstein GANs, Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville, 2017Advances in Neural Information Processing Systems (NeurIPS) (Curran Associates, Inc.)DOI: 10.5555/3157382.3157640 - The foundational paper introducing the gradient penalty as a method to enforce the Lipschitz constraint in Wasserstein GANs, significantly improving training stability over weight clipping.
Wasserstein GAN, Martin Arjovsky, Soumith Chintala, Léon Bottou, 2017International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1701.07875 - This paper introduced the Wasserstein GAN and the Earth Mover's distance for GAN training, highlighting the issues with original GAN loss and proposing weight clipping, which WGAN-GP later improved upon.
Autograd mechanics, PyTorch Team, 2024 (PyTorch) - Essential documentation for understanding how automatic differentiation, including higher-order gradients required for the gradient penalty, works in PyTorch.