Regression Shrinkage and Selection Via the Lasso, Robert Tibshirani, 1996Journal of the Royal Statistical Society, Series B (Methodological), Vol. 58 (Wiley for the Royal Statistical Society)DOI: 10.1111/j.1467-9868.1996.tb00623.x - This foundational paper introduces the Least Absolute Shrinkage and Selection Operator (LASSO), providing the theoretical basis for L1 regularization and its sparsity-inducing properties.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A standard textbook in deep learning, Chapter 7 discusses regularization, including L1 regularization's role in weight shrinkage and sparsity.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Trevor Hastie, Robert Tibshirani, and Jerome Friedman, 2009 (Springer) - The second edition presents L1 regularization (LASSO) in detail in Section 3.4, including its geometric interpretation and comparison with L2 (ridge regression).
CS229 Lecture Notes, Part V: Regularization and Model Selection, Andrew Ng, Tengyu Ma, 2023 - These Stanford lecture notes provide an accessible introduction to regularization, explaining L1 and L2 penalties, including their geometric properties and effects on model weights.