To effectively secure Large Language Models, we first need a clear understanding of how they can be targeted. This chapter focuses on identifying and analyzing the various attack surfaces present in LLMs and their surrounding systems. An attack surface represents any point where an unauthorized user, an attacker, can try to input data to or extract data from an environment, or otherwise interact with the system in an unintended way.
You will learn about several common attack vectors. We will cover techniques such as prompt injection, where malicious instructions are embedded within user inputs to manipulate the model's behavior. We will also look at data poisoning, which involves corrupting the training data to introduce vulnerabilities or biases. Further topics include model evasion, jailbreaking methods to bypass safety controls, and how attackers might attempt to extract sensitive information. We will also consider issues like denial of service, the generation of misinformation, and vulnerabilities within LLM APIs. By the end of this chapter, you will be able to recognize these key attack surfaces and understand their potential impact on LLM security.
2.1 Prompt Injection: Direct and Indirect Techniques
2.2 Data Poisoning: Training Data and Fine-tuning Attacks
2.3 Model Evasion and Obfuscation Tactics
2.4 Jailbreaking and Role-Playing Attacks
2.5 Extracting Sensitive Information from LLMs
2.6 Denial of Service and Resource Exhaustion in LLMs
2.7 Over-reliance and Misinformation Generation
2.8 Identifying Attack Vectors in LLM APIs and Interfaces
2.9 Practice: Analyzing LLM APIs for Potential Weaknesses
© 2025 ApX Machine Learning