Finding structure in time, Jeffrey L. Elman, 1990Cognitive Science, Vol. 14DOI: 10.1207/s15516709cog1402_1 - A seminal paper introducing Simple Recurrent Networks (often called Elman networks), which established the concept of using a 'context layer' (hidden state) to process sequential information.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook offering detailed theoretical and mathematical foundations of deep learning, including a dedicated section that covers recurrent neural networks and their mechanisms.