Once your LLM agent's tool successfully calls an external API, the journey isn't over. APIs return data in various formats, and this raw data is often not in an ideal state for an LLM to directly use. It might be too verbose, contain irrelevant information, or have a structure that's difficult for the model to interpret efficiently. This is where parsing and transforming the API response becomes a critical step in building effective API-based tools. Your goal is to convert the API's output into a clean, concise, and structured format that the LLM can readily understand and act upon.
Before you can transform data, you first need to parse it, which means interpreting its structure. Most modern web APIs use a few standard formats for data exchange. The Content-Type
header in the API's HTTP response usually tells you which format to expect.
JSON is the de facto standard for many web APIs due to its lightweight nature and ease of parsing by various programming languages. JSON data is represented as key-value pairs, similar to Python dictionaries, and can include nested objects and arrays (lists).
For example, a weather API might return:
{
"location": "San Francisco, CA",
"timestamp": "2023-10-27T10:00:00Z",
"data": {
"temperature": 18,
"unit": "Celsius",
"condition": "Partly Cloudy",
"humidity": 65
},
"source": "WeatherProviderX"
}
In Python, the built-in json
module is your primary tool. If you're using a library like requests
to make HTTP calls, the response object often has a handy .json()
method that directly parses the JSON response into a Python dictionary. Otherwise, json.loads(response_text)
will convert a JSON string into a Python dictionary or list.
XML is another format you'll encounter, particularly with older or enterprise-level APIs. It uses tags to define elements and structure data hierarchically.
An XML equivalent of the weather data might look like:
<weatherReport>
<location>San Francisco, CA</location>
<timestamp>2023-10-27T10:00:00Z</timestamp>
<data>
<temperature units="Celsius">18</temperature>
<condition>Partly Cloudy</condition>
<humidity>65</humidity>
</data>
<source>WeatherProviderX</source>
</weatherReport>
Python's standard library includes xml.etree.ElementTree
for parsing XML. You can parse an XML string using ET.fromstring(xml_string)
to get a root element, and then navigate the tree using methods like find()
(to find the first matching sub-element), findall()
(to find all matching sub-elements), and accessing element attributes via .attrib
or text content via .text
.
While JSON and XML cover the majority of cases, you might occasionally encounter APIs returning data as CSV (Comma-Separated Values), plain text, or even YAML. Python has a csv
module for handling CSV data, and plain text often requires custom string manipulation or regular expressions. For YAML, libraries like PyYAML
are available.
Once you've identified the format, the next step is to parse the raw response into a Python data structure. Then, you can apply transformations to prepare it for the LLM.
For JSON responses, parsing is typically straightforward:
import json
api_response_text = '{"city": "London", "details": {"temp": 15, "desc": "Cloudy"}}'
try:
data = json.loads(api_response_text)
city = data.get("city")
temperature = data.get("details", {}).get("temp") # Using .get() for safety
except json.JSONDecodeError:
print("Error: Could not decode JSON response.")
data = None
# Now 'data' is a Python dictionary
Using data.get("key", default_value)
is generally safer than direct key access (data["key"]
) as it allows you to provide a default if a key is missing, preventing KeyError
exceptions.
For XML, you'll navigate the tree structure:
import xml.etree.ElementTree as ET
xml_response_text = '<product><id>123</id><name>Wireless Mouse</name><price>25.99</price></product>'
try:
root = ET.fromstring(xml_response_text)
product_id = root.findtext("id")
product_name = root.findtext("name")
price_str = root.findtext("price")
price = float(price_str) if price_str else None
except ET.ParseError:
print("Error: Could not parse XML response.")
root = None
# product_id is '123', product_name is 'Wireless Mouse', price is 25.99
findtext()
is a convenient method that directly returns the text content of a sub-element.
Raw parsed data, even as Python objects, might still be suboptimal for an LLM. Transformation aims to:
Common transformation techniques include:
Field Selection: Extract only the specific pieces of information the LLM needs. If the API returns 50 fields but the LLM only needs 3, your tool should filter out the rest.
# Assuming 'api_data' is a parsed JSON dictionary from a complex user profile
transformed_user_info = {
"username": api_data.get("username"),
"email": api_data.get("user_contact", {}).get("primary_email"),
"last_login": api_data.get("activity", {}).get("last_seen_at")
}
# transformed_user_info is now a smaller dictionary with only relevant fields
Data Simplification and Flattening: API responses can have deeply nested structures. If the nesting isn't relevant, flatten it. Convert lists of objects with a single key-value pair into a simple list of values if that's more useful.
# api_data = {"items": [{"name": "Apple"}, {"name": "Banana"}, {"name": "Cherry"}]}
# Simplified:
# item_names = [item.get("name") for item in api_data.get("items", []) if item.get("name")]
# item_names will be: ["Apple", "Banana", "Cherry"]
Summarization or Truncation: For long text fields returned by an API (e.g., article content, product descriptions), you might truncate them or extract key sentences. Be cautious here, as sophisticated summarization is often a task for the LLM itself. Simple truncation or providing the first N characters/words can be done by the tool.
Formatting for LLM Readability:
{"temperature": 22, "unit": "C", "condition": "Sunny"}
, you might return: "Current weather: 22°C, Sunny."
This can be very effective if the LLM's role is to communicate this information to a user.Things can go wrong. The API might return an unexpected structure, a field might be missing, or data might be in an incorrect type. Your tool must handle these situations gracefully.
def process_product_api_response(response_text):
try:
data = json.loads(response_text)
# Ensure 'product_info' and 'name' keys exist
product_info = data.get("product_info")
if not product_info or "name" not in product_info:
return "Error: Essential product information (name) is missing in API response."
name = product_info["name"]
# Safely get price, convert to float, handle potential missing or invalid price
price_str = product_info.get("price_details", {}).get("amount")
price = None
if price_str:
try:
price = float(price_str)
except ValueError:
return f"Warning: Product '{name}' has an invalid price format: {price_str}. Price not available."
return {"name": name, "price": price if price is not None else "Not available"}
except json.JSONDecodeError:
return "Error: API response was not valid JSON."
except Exception as e: # Catch any other unexpected errors
return f"An unexpected error occurred during processing: {str(e)}"
# Example usage:
# result = process_product_api_response(api_response_string)
# if isinstance(result, str) and result.startswith("Error:"):
# # Handle error message for LLM
# else:
# # Use the processed dictionary
Returning informative error messages or sensible defaults allows the LLM agent to understand the issue and potentially try a different approach or inform the user, rather than simply failing.
The journey from a raw API response to data an LLM can effectively use involves several steps.
This diagram shows the flow of API response data. It starts with the raw response, goes through validation and parsing, then transformation, ultimately producing data ready for the LLM or an error message if issues arise.
By carefully considering how your tools parse and transform API responses, you significantly enhance their reliability and the overall effectiveness of your LLM agent. The aim is always to provide the LLM with information in the most direct, unambiguous, and useful form possible for the task at hand. This often involves an iterative process: design your parsing and transformation logic, test how the LLM uses the output, and refine as needed.
Was this section helpful?
© 2025 ApX Machine Learning