Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio, 2014Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)DOI: 10.3115/v1/D14-1179 - Introduces the Gated Recurrent Unit (GRU) architecture, detailing the reset and update gates and the candidate hidden state computation.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - A comprehensive textbook covering GRUs, including the candidate hidden state formulation, within the broader context of recurrent neural networks.
Dive into Deep Learning, Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola, 2024 (Cambridge University Press) - An interactive and detailed online textbook with clear explanations and diagrams of GRU components, including the candidate hidden state.