Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30 (Curran Associates, Inc.)DOI: 10.55989/tq276h3k - Introduces the Transformer architecture and the scaled dot-product attention mechanism, serving as the base for Graph Transformers.
Graph Attention Networks, Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio, 2018International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1710.10903 - Presents a localized attention mechanism for graph neural networks, useful for understanding the distinction between local and global attention in graphs.
Do Transformers Really Perform Bad for Graph Representation?, Chengxuan Ying, Tianle Cai, Shengjie Zhong, Shuxin Zheng, Kaijie Xu, Wenzheng Feng, Mingxuan Wang, Bo Ren, 2021Advances in Neural Information Processing Systems (NeurIPS), Vol. 34 (NeurIPS)DOI: 10.48550/arXiv.2106.05234 - Introduces Graphormer, a prominent Graph Transformer that integrates graph structure through various encodings, including spatial and centrality biases.