Greedy Function Approximation: A Gradient Boosting Machine, Jerome H. Friedman, 2001The Annals of Statistics, Vol. 29 (Institute of Mathematical Statistics)DOI: 10.1214/aos/1013203451 - This foundational paper introduces the Gradient Boosting Machine (GBM) algorithm, detailing its mathematical derivation and the general framework of functional gradient descent.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Trevor Hastie, Robert Tibshirani, Jerome Friedman, 2009 (Springer) - A comprehensive textbook providing an in-depth statistical explanation of various machine learning algorithms, including a detailed chapter on gradient boosting and its theoretical underpinnings. (2nd edition)
Lecture Notes: Boosting (CS229, Machine Learning), Andrew Ng, 2009 (Stanford University) - These lecture notes provide a clear and concise explanation of boosting algorithms, including a pedagogical derivation of Gradient Boosting, suitable for an introductory but rigorous understanding.