Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Comprehensive coverage of activation functions, their theoretical basis, common types, and role in deep networks.
Deep Sparse Rectifier Networks, Xavier Glorot, Antoine Bordes, and Yoshua Bengio, 2011Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS), Vol. 15 (PMLR (Proceedings of Machine Learning Research)) - Foundational paper introducing and popularizing Rectified Linear Units (ReLU) and demonstrating their benefits.