Preprocessing data, scikit-learn developers, 2023 (scikit-learn) - Official guide to Scikit-learn's preprocessing modules and pipelines, illustrating correct usage patterns for data transformation and preventing data leakage.
An Introduction to Statistical Learning: With Applications in R, Gareth James, Daniela Witten, Trevor Hastie, Rob Tibshirani, 2021 (Springer) - Foundational text on statistical learning methods, covering the principles of data splitting and the necessity of independent evaluation to prevent over-optimistic model assessment.