Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - An authoritative textbook covering fundamental concepts of neural networks, including layers, activation functions, loss functions, and optimizers.
Adam: A Method for Stochastic Optimization, Diederik P. Kingma, Jimmy Ba, 2014International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1412.6980 - Introduces the Adam optimizer, a widely adopted algorithm for training neural networks with adaptive learning rates.