Gaussian Processes for Machine Learning, Carl Edward Rasmussen and Christopher K. I. Williams, 2006 (The MIT Press) - This foundational textbook provides comprehensive coverage of Gaussian Processes, including the detailed derivation and interpretation of the marginal likelihood for hyperparameter optimization, and its gradient.
Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006 (Springer)DOI: 10.1007/978-0-387-44938-4 - A widely-used textbook that offers a thorough introduction to Bayesian machine learning. It covers Gaussian Processes and the principle of evidence (marginal likelihood) in a broader context of model selection.
GPflow Documentation, The GPflow Team, 2023 (The GPflow Team) - Provides practical guidance and examples for implementing Gaussian Processes and optimizing their hyperparameters using the GPflow library in Python, illustrating the computational aspects discussed.