Now that you've learned about the foundational elements of LLM red teaming, including its lifecycle, the importance of setting clear objectives, and understanding an attacker's perspective, it's time to put that knowledge into practice. This hands-on exercise will guide you through defining the scope for a mock LLM red team operation. A well-defined scope is fundamental for a successful and focused engagement, ensuring that everyone involved understands the targets, limitations, and goals.
The Scenario: "GenieQuery" - Innovatech Corp's Internal Assistant
Imagine you're part of the newly formed AI red team at Innovatech Corp, a mid-sized technology company. Innovatech has recently deployed "GenieQuery," an LLM-powered internal assistant.
- Purpose: GenieQuery is designed to help employees by answering questions based on a vast repository of internal company documents. This includes HR policies, project documentation (past and present), technical guides, and meeting summaries.
- Access: Employees access GenieQuery via a web-based chat interface.
- Underlying Technology: It uses a proprietary LLM fine-tuned by Innovatech's AI team on their internal documents. The system has an API that the web interface calls.
- Data Sensitivity: The documents GenieQuery can access contain a mix of information, from publicly shareable HR benefits to highly confidential details about unreleased products ("Project Chimera"), internal financial forecasts, and sensitive employee information snippets that might be present in meeting notes.
- Management's Concerns:
- Confidentiality Breach: The primary concern is the leakage of sensitive information, especially details about "Project Chimera" or internal financials, to unauthorized employees.
- Misinformation: Providing incorrect or misleading information regarding HR policies or critical project details.
- Abuse: Employees attempting to jailbreak the system for non-work purposes or to uncover information they aren't privy to.
- Reputational Damage (Internal): If the system is easily manipulated or provides harmful outputs, it could erode trust in AI initiatives within the company.
Your Task: Draft an Initial Scope Document
Your task is to draft an initial scope document for a red team engagement against GenieQuery. This document will serve as a foundational agreement on what will be tested, how, and within what boundaries. Remember the principles discussed in "Setting Objectives and Scope for LLM Red Teaming" and "LLM Vulnerabilities: An Introduction" earlier in this chapter.
Main Elements for Your Scope Document
Structure your scope document around the following key elements. Think critically about each one in the context of GenieQuery.
-
Objectives of the Red Team Engagement
- What are the primary goals of this assessment? Be specific.
- What questions are you trying to answer for Innovatech management?
- Example thinking: Given management's concern about "Project Chimera," a key objective might be: "Assess the risk of GenieQuery inadvertently disclosing confidential information related to 'Project Chimera' through targeted promptin_g techniques."_
-
Target System Definition
- Clearly define the boundaries of the "GenieQuery" system.
- In-Scope Components: List all parts of the GenieQuery system that ARE part of this engagement.
- Consider: Web UI, API endpoints, the LLM model itself, any specific databases or document repositories it directly interfaces with for its knowledge.
- Out-of-Scope Components: List specific systems, infrastructure, or areas that ARE NOT part of this engagement.
- Consider: The general corporate network, employee workstations, physical security of the data center, the underlying cloud provider's infrastructure (unless specific misconfigurations of Innovatech's services on it are relevant).
-
Critical Assets to Protect
- Identify the most important assets related to GenieQuery that the red team will try to impact.
- Consider:
- Confidentiality of specific datasets (e.g., "Project Chimera" documents, PII).
- Integrity of information provided by the LLM (e.g., accuracy of HR policy responses).
- Availability of the GenieQuery service (though usually, disruptive testing is limited).
- Reputation of the system and the AI team.
-
Threats to Investigate (Attack Vectors)
- Based on the LLM vulnerabilities discussed earlier (e.g., prompt injection, jailbreaking, data poisoning, sensitive information extraction), list the types of attacks or threat scenarios that will be explored.
- Tailor this to GenieQuery. For example, data poisoning of its training data might be out of scope if you're only testing the deployed system, but attempting to influence its behavior through its immediate input (prompt injection) would be in scope.
- Example: "Investigate susceptibility to direct and indirect prompt injection attacks aimed at exfiltrating information about unannounced projects."
- Example: "Test for jailbreaking techniques that bypass safety filters to elicit inappropriate or non-work-related responses."
-
Rules of Engagement (Constraints & Limitations)
- Timeframe: Specify a realistic duration for the active testing phase (e.g., "2 weeks, from YYYY-MM-DD to YYYY-MM-DD").
- Allowed Techniques: What methods are permissible? Are there any restrictions? For instance, "No denial-of-service (DoS) attacks that could significantly impact GenieQuery's availability for regular employees." "Social engineering of Innovatech employees is out of scope for this engagement."
- Testing Accounts/Access: Will the red team use standard employee accounts, specially provisioned test accounts, or attempt unauthenticated attacks?
- Incident Handling: If a critical vulnerability is discovered, what is the immediate reporting protocol?
- Data Handling: How will any sensitive data discovered by the red team be handled, stored, and reported?
-
Assumptions
- List any assumptions made during the scope definition.
- Example: "The red team assumes the provided test environment is a faithful representation of the production GenieQuery system."
- Example: "It is assumed that the core LLM model will not be updated during the engagement period."
Visualizing Scope Boundaries
Understanding what's in and out of scope is very important. A simple diagram can often clarify this for all stakeholders.
The diagram shows the main components of the GenieQuery system considered in-scope for the red team assessment, such as its web interface, API, the LLM, and the document database it uses. It also delineates elements like employee laptops and general corporate infrastructure as out-of-scope. Both regular employees and red team operators interact with the system, typically via its UI or API.
Putting It All Together: Your Turn
Now, take the scenario details and the elements above and draft your own scope document for the GenieQuery red team engagement. Don't worry about making it perfect; the goal is to practice the thought process. Focus on being clear and specific.
For example, when defining Objectives, you might write:
- Objective 1: Identify and document vulnerabilities in GenieQuery that could lead to the unauthorized disclosure of confidential information pertaining to "Project Chimera."
- Objective 2: Assess GenieQuery's susceptibility to prompt injection attacks aimed at bypassing safety mechanisms or generating responses that violate Innovatech's internal communication policies.
- Objective 3: Determine if GenieQuery can be manipulated to provide verifiably false or misleading information regarding HR policies, and evaluate the potential impact of such misinformation.
Continue this for all sections.
A Note on Iteration
Remember, a scope document is often a living document, especially in the early stages. It might be drafted, then discussed with stakeholders (like the AI development team, management, and legal/compliance if necessary), and then refined based on feedback or new information gathered during initial, non-intrusive reconnaissance.
Final Check
Before you consider your mock scope definition complete, review it. Is your defined scope:
- Specific: Are the objectives, targets, and constraints clearly defined?
- Measurable: Can you determine if the objectives have been met?
- Achievable: Is the scope realistic given potential constraints (like time, resources, allowed methods)?
- Relevant: Does the scope address the primary risks and concerns of Innovatech?
- Time-bound: Is there a clear timeframe for the engagement?
This exercise provides a solid foundation for planning real-world LLM red team operations. As you progress through this course, you'll learn the techniques to execute the activities defined within such a scope.