XGBoost: A Scalable Tree Boosting System, Tianqi Chen and Carlos Guestrin, 2016Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16) (Association for Computing Machinery)DOI: 10.1145/2939672.2939785 - This foundational paper introduces the XGBoost algorithm, detailing the sparsity-aware split finding mechanism as a core optimization for handling missing data efficiently.
Handling Missing Values, XGBoost Contributors, 2023 - The official documentation provides practical details and current explanations of how XGBoost implements sparsity-aware split finding and handles missing values.
Classification and Regression Trees, Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone, 1984 (Chapman and Hall/CRC)DOI: 10.1201/9781315139470 - This foundational book introduces the CART algorithm, including early methods like surrogate splits for handling missing values in decision trees, providing context for XGBoost's unique approach.