Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook covering fundamental concepts of deep learning, including various loss functions, optimization algorithms like SGD, Adam, and RMSProp, and the underlying mathematics of gradient-based learning.
Adam: A Method for Stochastic Optimization, Diederik P. Kingma and Jimmy Ba, 20153rd International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1412.6980 - The original paper introducing the Adam optimizer, which is widely used in deep learning for its adaptive learning rate capabilities.
Flux.jl Documentation: Losses, Flux.jl Contributors, 2025 - Official documentation for loss functions available in Flux.jl, including mse, binarycrossentropy, and crossentropy, with usage examples.
Flux.jl Documentation: Optimizers, Flux.jl Contributors, 2023 - Official documentation for optimizers provided by Flux.jl, detailing their functionality and usage within the framework, including SGD, Adam, and RMSProp.