After successfully sending a request to an LLM API, the next critical step is processing the response. Simply assuming the request worked and the data is ready is a common source of bugs. Robust applications must gracefully handle both successful responses and various potential errors.
The first thing to check is the HTTP status code returned by the API server. Web standards define ranges for these codes:
The requests
library makes checking for success straightforward. The Response
object has a status_code
attribute and a boolean ok
attribute which is True
if the status code is less than 400 (i.e., success or redirect), and False
otherwise.
import requests
import os
import json
# Assume API_ENDPOINT and API_KEY are set appropriately
# API_ENDPOINT = "YOUR_LLM_API_ENDPOINT"
# API_KEY = os.getenv("LLM_API_KEY") # Example of getting key securely
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
data = {
"prompt": "Explain the difference between HTTP status codes 200 and 404.",
"max_tokens": 50
}
try:
# response = requests.post(API_ENDPOINT, headers=headers, json=data, timeout=10) # Example call
# For demonstration, let's simulate a response object
class MockResponse:
def __init__(self, status_code, text_content, is_json=True):
self.status_code = status_code
self.text = text_content
self._is_json = is_json
@property
def ok(self):
return self.status_code < 400
def json(self):
if not self._is_json:
raise requests.exceptions.JSONDecodeError("Expecting value", "dummy", 0)
try:
return json.loads(self.text)
except json.JSONDecodeError as e:
raise requests.exceptions.JSONDecodeError(e.msg, e.doc, e.pos)
# Simulate a successful response
response = MockResponse(200, '{"completion": "200 OK means success, 404 Not Found means the resource doesn\'t exist."}', is_json=True)
# Always check the status first
if response.ok:
print(f"Request successful (Status Code: {response.status_code})")
# Proceed to parse the body
else:
print(f"Request failed (Status Code: {response.status_code})")
print(f"Response Body: {response.text}")
# Handle specific error codes (4xx, 5xx) here
except requests.exceptions.RequestException as e:
print(f"An error occurred during the request: {e}")
Always check response.ok
or response.status_code
before attempting to process the response body. Trying to parse JSON from an error response (like a 404 page which might be HTML) will likely raise an exception.
If the status code indicates success (usually 200 OK), the LLM's output is typically contained within the response body, often formatted as JSON. The requests
library provides the response.json()
method to conveniently parse this.
# Continuing from the successful response simulation above...
if response.ok:
try:
# Parse the JSON response
response_data = response.json()
print("Successfully parsed JSON response:")
# print(json.dumps(response_data, indent=2)) # Pretty print
# Extract the relevant information (this structure varies by API)
if "completion" in response_data:
llm_output = response_data["completion"]
print(f"LLM Output: {llm_output}")
elif "choices" in response_data and len(response_data["choices"]) > 0:
# Handling structure like OpenAI's chat completions
first_choice = response_data["choices"][0]
if "message" in first_choice and "content" in first_choice["message"]:
llm_output = first_choice["message"]["content"]
print(f"LLM Output: {llm_output}")
else:
print("Could not find expected message content in the first choice.")
else:
print("Response JSON does not contain expected 'completion' or 'choices' field.")
print("Full Response:", response_data)
except requests.exceptions.JSONDecodeError:
# Handle cases where the response status was OK, but the body wasn't valid JSON
print(f"Failed to decode JSON, even though status was {response.status_code}.")
print(f"Raw Response Text: {response.text}")
except KeyError as e:
# Handle cases where the JSON is valid, but missing expected keys
print(f"Missing expected key {e} in the JSON response.")
print("Full Response:", response_data)
else:
# Handle non-OK status codes as before
print(f"Request failed (Status Code: {response.status_code})")
print(f"Response Body: {response.text}")
Notice the nested try...except
block. It's possible to get a 200 OK status but receive a response body that isn't valid JSON, or valid JSON that doesn't contain the keys you expect (e.g., completion
, choices
, message
). Your code needs to anticipate these possibilities. The exact structure of the JSON response varies significantly between different LLM providers and API endpoints, so consult the specific API documentation you are using.
Sometimes, an API might return a 200 OK status code but still indicate an error within the JSON payload itself. For example, the API might accept your request structurally but report an issue like an invalid parameter value, a prompt being flagged by a content filter, or exceeding a specific quota.
{
"error": {
"code": "invalid_prompt",
"message": "The provided prompt was rejected by the content filter.",
"type": "validation_error"
}
}
It's important to check for such error structures within the JSON after successfully parsing it, even if the HTTP status was 200 OK
.
# Assuming response.ok was True and response_data = response.json() was successful
if "error" in response_data:
error_info = response_data["error"]
print(f"API returned an error within the JSON payload:")
print(f" Code: {error_info.get('code', 'N/A')}")
print(f" Message: {error_info.get('message', 'No message provided.')}")
# Implement specific handling based on the error code or type
else:
# Process the successful data as shown previously
print("No API-specific error found in payload. Processing data...")
# ... extract completion, choices, etc. ...
Let's summarize common errors and suggest basic handling:
Network/Connection Errors (requests.exceptions.RequestException
): These occur before you even get a response (timeout, DNS failure, connection refused).
try...except requests.exceptions.RequestException
block around your requests.post
or requests.get
call. Log the error. Consider retrying the request after a short delay, potentially with exponential backoff (waiting longer after each failed attempt).Authentication Errors (401 Unauthorized, 403 Forbidden): Your API key might be invalid, expired, or lack permissions for the requested operation.
Rate Limit Errors (429 Too Many Requests): You've sent too many requests in a given time window.
Retry-After
header indicating how long to wait (in seconds). If not present, implement exponential backoff (e.g., wait 1s, then 2s, 4s, 8s, etc.) before retrying.Client Errors (Other 4xx, e.g., 400 Bad Request): Your request might be malformed (invalid JSON, missing required parameters).
Server Errors (5xx, e.g., 500 Internal Server Error, 503 Service Unavailable): The API provider is having temporary issues.
JSON Decode Errors (requests.exceptions.JSONDecodeError
): The response body couldn't be parsed as JSON.
response.text
. This might indicate an unexpected response format (like HTML from an error page) or a genuine server issue producing malformed JSON.API-Specific Errors (in JSON Payload): Errors reported within a 200 OK response.
A typical flow for handling API responses and potential errors.
Effectively handling API responses and errors is essential for building reliable LLM applications. By checking status codes, carefully parsing expected data formats, and anticipating various failure modes, you can create more resilient Python code that interacts predictably with LLM services. Remember to consult the specific API documentation for details on response structures and error codes.
© 2025 ApX Machine Learning