Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing SystemsDOI: 10.48550/arXiv.1706.03762 - 提出了Transformer架构并描述了其中Dropout的应用。
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - 全面解释了正则化技术,包括Dropout、层归一化和权重衰减。