Graph Attention Networks, Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio, 2018International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1710.10903 - This paper presents the foundational Graph Attention Network (GAT) architecture, detailing the original proposal and the use of multi-head attention for aggregating node features.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS), Vol. 30DOI: 10.48550/arXiv.1706.03762 - This seminal paper introduces the Transformer model and the multi-head attention mechanism, which significantly influenced the design of attention-based models in various domains, including GNNs.