A Fast Learning Algorithm for Deep Belief Nets, Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh, 2006Neural Computation, Vol. 18 (MIT Press)DOI: 10.1162/neco.2006.18.7.1527 - This paper introduced a groundbreaking greedy layer-wise training approach for deep belief networks, which set the stage for effectively training deep neural architectures like stacked autoencoders.
Greedy Layer-Wise Training of Deep Networks, Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle, 2007Advances in Neural Information Processing Systems 19, Vol. 19 - This work further elaborated on and generalized the greedy layer-wise training methodology, demonstrating its effectiveness for pre-training various deep network architectures, including stacked autoencoders.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - This book provides an comprehensive treatment of deep learning, offering detailed explanations of stacked autoencoders, greedy layer-wise pre-training, and their historical context and significance.