Large Language Models (LLMs) sometimes generate responses that sound plausible but are factually incorrect, nonsensical, or unrelated to the provided context. These fabrications are often called "hallucinations." As we've established, the prompt heavily influences the LLM's output, and careful prompt engineering is a primary defense against these inaccurate generations. While completely eliminating hallucinations is challenging, several techniques implemented within your Python code can significantly reduce their frequency and impact.
One of the most effective ways to prevent hallucinations is to provide the LLM with the specific information it needs to answer a query, rather than relying solely on its internal training data. This technique, known as grounding, often involves Retrieval-Augmented Generation (RAG), which we explored in Chapter 7.
The core idea is to retrieve relevant documents or data snippets based on the user's query and include them directly within the prompt. This gives the model explicit source material to base its answer on.
# Assume 'retrieve_relevant_docs' fetches text based on the query
# Assume 'llm_client' is an initialized client for an LLM API
def answer_query_with_context(query: str, llm_client) -> str:
"""Answers a query using retrieved context to reduce hallucinations."""
context_docs = retrieve_relevant_docs(query) # Fetch relevant info
context_str = "\n\n".join(context_docs)
prompt = f"""
Based *only* on the following context, please answer the question.
Do not use any prior knowledge. If the answer is not found in the context, state that you cannot answer based on the provided information.
Context:
{context_str}
Question: {query}
Answer:
"""
# Make the API call (details depend on the specific client library)
response = llm_client.generate(prompt=prompt, max_tokens=150)
return response.text
# Example Usage (Illustrative)
# query = "What is the capital of Flobnar?"
# Assuming retrieve_relevant_docs finds no info on "Flobnar":
# answer = answer_query_with_context(query, my_llm_client)
# print(answer)
# Expected Output might be: "I cannot answer based on the provided information."
By explicitly instructing the model to use only the provided context, you steer it away from inventing answers.
Flow diagram illustrating how Retrieval-Augmented Generation (RAG) grounds an LLM response by incorporating relevant context retrieved based on the user query, thereby reducing the likelihood of hallucination compared to a direct query.
Beyond providing context, you can explicitly tell the model how to behave regarding factual accuracy.
NOT_FOUND
.# Example using LangChain's PromptTemplate for structure
from langchain_core.prompts import PromptTemplate
template_str = """
Based strictly on the context below, answer the user's question.
If the information is not available in the context, respond with "Information not available".
Do not add any information that is not explicitly stated in the text.
Context:
{context}
Question: {question}
Answer: """
prompt_template = PromptTemplate(
input_variables=["context", "question"],
template=template_str
)
# You would then format this prompt with actual context and question
# formatted_prompt = prompt_template.format(context="...", question="...")
These instructions constrain the model's tendency to fill gaps with fabricated details.
Most LLM APIs provide parameters to control the randomness and creativity of the output. The temperature
parameter is particularly relevant for managing hallucinations.
For tasks requiring high factual accuracy, setting a low temperature is generally recommended.
# Example using a hypothetical OpenAI client
# Ensure you have the 'openai' library installed and API key configured
# from openai import OpenAI
# client = OpenAI() # Assumes OPENAI_API_KEY is set in environment
# response = client.chat.completions.create(
# model="gpt-4o",
# messages=[
# {"role": "system", "content": "Answer based only on provided context."},
# {"role": "user", "content": "Context: The sky is blue. Question: What color is the sky?"}
# ],
# temperature=0.1 # Low temperature for fact-based response
# )
# print(response.choices[0].message.content)
# Expected Output: The sky is blue.
Experiment with different temperature values to find the right balance for your specific application.
As discussed previously, providing examples within the prompt (few-shot learning) can guide the model's behavior. You can use this technique specifically to discourage hallucinations by including examples where the model correctly identifies missing information.
prompt = """
Answer the question based *only* on the provided text snippet. If the answer isn't there, say "Information not found".
Text: The report discusses project Alpha and project Beta. Project Alpha focuses on renewable energy.
Question: What is the focus of project Alpha?
Answer: Project Alpha focuses on renewable energy.
Text: The manual covers installation and troubleshooting for model X1.
Question: What is the warranty period for model X1?
Answer: Information not found.
Text: {provided_text}
Question: {user_question}
Answer:"""
# Fill {provided_text} and {user_question} before sending to the LLM
These examples demonstrate the desired behavior: answer factually when possible, and explicitly state when information is missing.
Instruct the model to cite its sources, even if the "sources" are just segments of the context you provided. This forces the model to link its assertions back to specific pieces of information.
prompt = f"""
Read the following text segments and answer the question. For each statement in your answer, cite the number of the text segment it came from in square brackets, like [1]. If the information is not present, say so.
[1] The Peregrine Falcon is the fastest animal, reaching speeds over 240 mph during its dive.
[2] Cheetahs are the fastest land animals, capable of bursts up to 70 mph.
[3] The headquarters is located in Springfield.
Question: What is the fastest animal and where is the headquarters located?
Answer: The fastest animal is the Peregrine Falcon [1]. The headquarters is located in Springfield [3].
"""
While the model might still occasionally misattribute information, this technique adds a layer of accountability and makes hallucinations easier to spot during review or automated checking.
Requesting the output in a structured format like JSON can sometimes implicitly reduce hallucinations, especially for information extraction tasks. By defining a schema, you constrain the model's output space. If the model cannot find information for a required field, it's more likely to omit it or use a null value (if instructed) rather than inventing a value.
prompt = """
Extract the requested information from the text into a JSON object.
If a piece of information is not found, use `null` as the value.
Text: The event will be held on October 26th, 2024, at the Grand Hall. Contact person is Jane Doe.
Required JSON format:
{
"event_date": "YYYY-MM-DD format or null",
"location": "string or null",
"contact_email": "string or null"
}
Extracted Information:
"""
# Expected LLM output (if temperature is low and instructions are followed):
# {
# "event_date": "2024-10-26",
# "location": "Grand Hall",
# "contact_email": null
# }
Reducing hallucinations is not a one-time fix but an ongoing process. As you develop your application:
Remember, these techniques significantly reduce the likelihood of hallucinations but may not eliminate them entirely, especially with complex queries or ambiguous contexts. Continuous monitoring and a robust evaluation strategy remain important components of building reliable LLM applications.
© 2025 ApX Machine Learning