Structuring Output with Parsers

While Large Language Models excel at generating human-readable text, applications often require data in a predictable, structured format. An LLM might return a beautifully written paragraph describing a user, but your application likely needs that information as a dictionary or a custom object with distinct fields like name, age, and location. This is where Output Parsers become indispensable. They are the final component in the standard invocation sequence, responsible for transforming the model's raw string output into a structured format that your code can work with directly.

The fundamental workflow, which we've been building towards, is $Prompt \rightarrow Model \rightarrow Parser$ . The parser plays a dual role in this sequence. First, it provides instructions on how the output should be formatted, which are then inserted into the prompt sent to the model. Second, after the model generates its response, the parser takes that raw text and converts it into the desired Python data structure.

The data flow from user input to structured output. The Output Parser provides formatting instructions to the prompt and then parses the LLM's string response.

Using PydanticOutputParser for Custom Structures

For applications, you often need to parse the output into custom data classes with specific types. The PydanticOutputParser is an excellent tool for this, as it integrates with Pydantic, a popular Python data validation library. This allows you to define your desired data structure using a Pydantic model, and the parser handles both generating the format instructions and parsing the output into an instance of that model.

Let's say we want to extract information about a fictional character from a story. We can define our desired structure with Pydantic:

from typing import List

from pydantic import BaseModel, Field

class Character(BaseModel):
    name: str = Field(description="The name of the character.")
    attributes: List[str] = Field(description="A list of the character's defining attributes or traits.")
    role: str = Field(description="The character's role in the story (e.g., protagonist, antagonist, supporting).")

Now, we can create an instance of PydanticOutputParser from this model. The parser can automatically generate detailed formatting instructions for the LLM based on the Pydantic model's schema, including field names, types, and descriptions.

from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

# Set up a parser
parser = PydanticOutputParser(pydantic_object=Character)

# Define the prompt template
template = """
Extract information about a character from the following passage.
Format your response as a JSON object with the keys "name", "attributes", and "role".

{format_instructions}

Passage:
{passage}
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["passage"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

# Initialize the model and create the chain
model = ChatOpenAI(temperature=0)
chain = prompt | model | parser

# Invoke the chain
passage_text = "Elara was the brave knight, known for her unwavering loyalty and skill with a blade. She was the hero of the kingdom."
result = chain.invoke({"passage": passage_text})

print(result)

Running this code will produce not just a dictionary, but an actual Character object:

name='Elara' attributes=['brave', 'unwavering loyalty', 'skill with a blade'] role='protagonist'

Notice that the parser correctly extracted the information and populated an instance of our Character class. Your application can now access the data with type safety, for example, result.name (a string) and result.attributes (a list of strings). This integration with Pydantic makes the process of defining and enforcing output structures clean and reliable.

Other Common Output Parsers

LangChain provides a variety of pre-built parsers for different use cases. While PydanticOutputParser is very flexible, sometimes a simpler parser is all you need.

JsonOutputParser: A more general-purpose parser that converts the LLM's output string into a Python dictionary, without the type validation of Pydantic. It is useful for when you need a simple JSON object and don't want to define a Pydantic model ahead of time.
CommaSeparatedListOutputParser: As the name suggests, this parser is designed to extract a list of comma-separated items. It instructs the LLM to return a list like item1, item2, item3 and parses it into a Python list: ['item1', 'item2', 'item3'].
DatetimeOutputParser: Use this parser when you need the LLM to return a date or time. It will instruct the model on the expected format (e.g., YYYY-MM-DD HH:MM:SS) and parse the output string into a Python datetime object.

By choosing the right parser, you can ensure that the output from your language model is immediately usable in your application's logic. This step is significant for building predictable and dependable systems, moving from simple text generation to creating applications that can process and act on information. In the upcoming practical exercise, you will apply these techniques to build your own structured data extractor.

Build LLM apps faster with Kerb

Cleaner syntax. Built-in debugging. Production-ready from day one.

Built for the AI systems behind ApX Machine Learning

Was this section helpful?

References

Output Parsers, LangChain Contributors, 2024 (LangChain) - Official documentation for output parsers in LangChain, detailing their purpose and various implementations for structuring LLM responses.
Pydantic Documentation, Samuel Colvin and Pydantic Contributors, 2024 - Official documentation for Pydantic, a Python library for data parsing, validation, and serialization, integral to PydanticOutputParser.
Prompt Engineering Guide, Dair.ai, Shubham Saboo, and Contributors, 2024 - A comprehensive guide on prompt engineering techniques, including strategies for instructing Large Language Models to generate responses in structured formats.
Tool use (Function calling), OpenAI, 2024 (OpenAI) - OpenAI's official guide on function calling, demonstrating a method for Large Language Models to produce structured JSON output for interacting with external tools and APIs.