Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A definitive textbook on deep learning, providing detailed discussions on regularization techniques, including L1 and L2 weight decay, within the context of neural networks.
Regression Shrinkage and Selection via the Lasso, Robert Tibshirani, 1996Journal of the Royal Statistical Society. Series B (Methodological), Vol. 58 (Royal Statistical Society)DOI: 10.1111/j.2517-6161.1996.tb02080.x - The seminal paper introducing the Least Absolute Shrinkage and Selection Operator (LASSO), demonstrating its ability to perform both regularization and automatic feature selection through L1 penalty.