This practical exercise focuses on building a cohesive LLM application by combining models and prompts to structure data. This application demonstrates a common pattern in LLM development: taking unstructured text as input and producing structured, machine-readable data as output. The goal is to build an application that can read a short biography and extract specific details like a person's name, title, and company.This process utilizes the model's native structured output capabilities, effectively combining generation and parsing.digraph G { rankdir=TB; graph [fontname="Arial"]; node [shape=box, style="rounded,filled", fillcolor="#e9ecef", fontname="Arial"]; edge [fontname="Arial"]; input [label="Unstructured Text\n(e.g., 'Sarah is a lead...')", fillcolor="#a5d8ff"]; prompt [label="Prompt Template"]; model [label="LLM (ChatOpenAI)\n+ Structured Output", fillcolor="#bac8ff"]; output [label="Structured Data\n(PersonProfile Object)", shape=ellipse, fillcolor="#b2f2bb"]; input -> prompt [label="query"]; prompt -> model; model -> output [label="PersonProfile"]; }The data extraction workflow. Unstructured text is formatted by a prompt, and the model is configured to return a structured Python object directly.Step 1: Define the Data Schema with PydanticBefore we can extract information, we must first define the structure of the data we want. A schema acts as a contract for our output, ensuring consistency and predictability. The Pydantic library is the standard for data validation in Python and integrates well with LangChain.Let's define a PersonProfile schema that includes a name, job title, and company. We can also add descriptions to guide the LLM in correctly identifying each piece of information.from pydantic import BaseModel, Field from typing import Optional class PersonProfile(BaseModel): """A structured representation of a person's professional profile.""" name: str = Field(description="The full name of the person.") title: str = Field(description="The professional title or role of the person.") company: str = Field(description="The name of the company the person works for.") years_of_experience: Optional[int] = Field( None, description="The total number of years of professional experience." )By creating this class, we have established a clear target format for our LLM. The descriptions within Field help the model understand the semantics of each attribute.Step 2: Configure the Model and PromptWith our data schema defined, we can now set up the extractor. Modern LLMs like OpenAI's gpt-4o-mini support structured output natively, which is more reliable than prompt-based parsing.We will use the .with_structured_output() method to bind our Pydantic schema to the model. This tells the model to conform its output to the PersonProfile class. We also define a simple prompt template to pass the input text to the model.from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate # Define the prompt template prompt = ChatPromptTemplate.from_template("Extract information from the following text.\nText: {query}") # Initialize the model llm = ChatOpenAI(temperature=0, model="gpt-4o-mini") # Configure the model for structured output structured_llm = llm.with_structured_output(PersonProfile)This configuration simplifies the pipeline by handling the schema injection and parsing logic internally.Step 3: Combine Components into a ChainNow we connect our components into a processing pipeline using the LangChain Expression Language (LCEL). The pipe symbol (|) connects the elements, creating a sequence where the output of one step becomes the input to the next.# Create the chain extractor_chain = prompt | structured_llmRunning this chain is straightforward. We only need to provide the query variable.Step 4: Run the ExtractorLet's test our extractor with a sample piece of text. We will invoke the chain and inspect the output.# Input text text_input = """ Alex Thompson is the Senior Data Scientist at InnovateCorp, where he has been leading the AI research division for the past 5 years. """ # Invoke the chain result = extractor_chain.invoke({"query": text_input}) # Print the structured output print(result) print(f"\nType of result: {type(result)}")The expected output will be a PersonProfile object, not a simple string or dictionary.name='Alex Thompson' title='Senior Data Scientist' company='InnovateCorp' years_of_experience=5 Type of result: <class '__main__.PersonProfile'>Success. The chain correctly processed the unstructured sentence and returned a Pydantic object. We can now access the data reliably using standard object attributes, such as result.name or result.company. This example highlights how using a model's structured output capability creates a dependable bridge from unstructured language to structured data.