Efficient Estimation of Word Representations in Vector Space, Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, 2013International Conference on Learning Representations (ICLR) WorkshopDOI: 10.48550/arXiv.1301.3781 - This paper introduces Word2Vec, a method for learning distributed word representations that capture semantic and syntactic relationships, forming a basis for vector embeddings.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NIPS 2017)DOI: 10.48550/arXiv.1706.03762 - This seminal paper presents the Transformer architecture, a core component of many advanced embedding models used today, by introducing the self-attention mechanism.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook covering the theoretical foundations and practical aspects of deep learning, including how neural networks learn meaningful data representations (embeddings).
Stanford CS224N: Natural Language Processing with Deep Learning, Diyi Yang, Tatsunori Hashimoto, Winter 2025 - A university course offering lecture materials and resources on modern NLP, covering word vectors, neural networks, and transformer models for generating embeddings.