Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Nils Reimers and Iryna Gurevych, 2019Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP)DOI: 10.48550/arXiv.1908.10084 - Introduces a method for generating semantically meaningful sentence embeddings, which is directly relevant to how LLMs process and compare the meaning of different prompts and to the technique of 'exploiting the LLM's own semantic space.'
Training language models to follow instructions with human feedback, Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe, 2022arXiv preprint arXiv:2203.02155DOI: 10.48550/arXiv.2203.02155 - Describes InstructGPT and the use of Reinforcement Learning from Human Feedback (RLHF) to align LLMs with human instructions and preferences, serving as a foundational reference for the advanced safety mechanisms that semantic evasion attempts to circumvent.