Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - This comprehensive book details neural network theory, covering the necessity of non-linearity and various activation functions like ReLU, sigmoid, and tanh, along with their properties.
Deep Sparse Rectifier Networks, Xavier Glorot, Antoine Bordes, and Yoshua Bengio, 2011Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS), Vol. 15 (Microtome Publishing)DOI: 10.55982/aistats.2011.27 - A seminal paper that introduced and evaluated the Rectified Linear Unit (ReLU) activation function, showing its advantages for training deep neural networks.
Activations, Keras team, 2024 - Official Keras documentation providing details and usage examples for various activation functions available in the Keras API.