Chapter 5: Advanced Gradient Boosting: LightGBM and CatBoost

While XGBoost provides a significant improvement over standard gradient boosting, other specialized libraries have emerged to address specific performance bottlenecks. This chapter focuses on two prominent frameworks, LightGBM and CatBoost, which offer unique optimizations for training speed and handling categorical data.

We will first examine LightGBM and its methods for accelerating training on large datasets, such as Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB). Next, we will cover CatBoost and its sophisticated internal mechanisms for processing categorical features, including an ordered boosting strategy that helps prevent target leakage. The chapter concludes with a direct comparison of the performance characteristics of XGBoost, LightGBM, and CatBoost, followed by a practical exercise where you will implement models using these libraries.

Sections

5.1 Introduction to LightGBM: Gradient-based One-Side Sampling
5.2 LightGBM's Exclusive Feature Bundling
5.3 Introduction to CatBoost: Handling Categorical Features
5.4 CatBoost's Ordered Boosting and Symmetric Trees
5.5 Performance Comparison: XGBoost vs. LightGBM vs. CatBoost
5.6 Hands-on Practical: Implementing LightGBM and CatBoost