Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook covering optimization algorithms in deep learning, with a detailed discussion of Nesterov Momentum.
torch.optim.SGD, PyTorch Contributors, 2024 (PyTorch Foundation) - Official documentation for the Stochastic Gradient Descent optimizer in PyTorch, detailing the nesterov parameter for enabling NAG.
Lecture 6: Optimization Part 1, Fei-Fei Li, Yunzhu Li, Ruohan Gao, 2023 (Stanford University) - Lecture notes from a highly-regarded deep learning course, offering clear explanations and visualizations of optimization algorithms, including Nesterov Accelerated Gradient.