Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Comprehensive treatment of autoencoders and common loss functions like MSE and BCE, including their probabilistic interpretations.
Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006 (Springer Science+Business Media, LLC)DOI: 10.1007/bpa2825 - Classic textbook providing a strong statistical foundation for understanding maximum likelihood estimation and the probabilistic basis of loss functions.