Interacting with external services like LLM APIs introduces uncertainties. Network issues, server-side problems, invalid requests, or simply sending too many requests too quickly can lead to failures. Robust applications anticipate these issues and include mechanisms to handle them gracefully. Failing to do so can result in application crashes, poor user experience, and potentially lost data.
When you make an API call using libraries like Python's requests
, the response object contains valuable information, most notably the HTTP status code. These codes signal the outcome of your request. While you ideally want a 200 OK
status, you need to be prepared for others:
4xx Client Errors: These indicate a problem with your request.
400 Bad Request
: Often means your request payload is malformed, missing required fields, or contains invalid parameter values (e.g., an unsupported model name, invalid temperature setting). Check the API documentation and your request structure carefully. The response body usually contains specific details about the error.401 Unauthorized
: Your API key is missing, invalid, or expired. Double-check how you're providing the authentication token or key. Ensure it hasn't been revoked or changed.403 Forbidden
: You are authenticated, but you don't have permission to access the requested resource or perform the action. This might relate to your subscription tier or specific API endpoint access rules.404 Not Found
: The requested endpoint or resource doesn't exist. Verify the API endpoint URL.429 Too Many Requests
: You've exceeded the allowed number of requests in a given time window. This is a rate limit error, discussed in more detail below.5xx Server Errors: These indicate a problem on the API provider's side.
500 Internal Server Error
: A generic error indicating something went wrong on the server. There's usually nothing you can do immediately except retry later.502 Bad Gateway
/ 503 Service Unavailable
/ 504 Gateway Timeout
: These suggest the server is temporarily overloaded, down for maintenance, or unable to communicate with an upstream service. Retrying after a delay is often the best approach.Here's a basic Python example using the requests
library to check the status code:
import requests
import os
import json
# Assume API_ENDPOINT and API_KEY are defined elsewhere
# API_ENDPOINT = "https://api.example-llm-provider.com/v1/completions"
# API_KEY = os.getenv("LLM_API_KEY")
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
data = {
"model": "some-model-name",
"prompt": "Translate 'hello' to French:",
"max_tokens": 10
}
try:
response = requests.post(API_ENDPOINT, headers=headers, json=data, timeout=30) # Set a timeout
# Check if the request was successful (status code 2xx)
if response.status_code == 200:
result = response.json()
print("Success:", result)
# Handle specific client errors
elif response.status_code == 400:
print(f"Error 400: Bad Request. Response: {response.text}")
# Log the error details for debugging
elif response.status_code == 401:
print("Error 401: Unauthorized. Check your API key.")
# Potentially stop the application or notify admin
elif response.status_code == 429:
print("Error 429: Rate limit exceeded. Try again later.")
# Implement retry logic (see below)
# Handle server errors
elif response.status_code >= 500:
print(f"Error {response.status_code}: Server error. Response: {response.text}")
# Implement retry logic
# Handle other unexpected errors
else:
print(f"Unexpected error: {response.status_code}. Response: {response.text}")
response.raise_for_status() # Raise an exception for other non-200 codes
except requests.exceptions.RequestException as e:
print(f"Network or request error occurred: {e}")
# Handle network issues, timeouts, etc.
Always check the response body for detailed error messages, especially for 400 Bad Request
errors, as the API provider often includes specific information about what went wrong.
API providers implement rate limits to ensure fair usage, prevent abuse, and maintain service stability for all users. These limits restrict the number of requests you can make or the amount of data (tokens) you can process within a specific time period (e.g., requests per minute, tokens per day).
Exceeding a rate limit typically results in a 429 Too Many Requests
error. The API response might also include headers indicating when you can retry (e.g., Retry-After
header specifying seconds to wait, or X-RateLimit-Reset
indicating the time when the limit resets).
Simply failing when an error occurs is rarely acceptable. You need strategies to make your application more resilient.
As shown in the Python example, the first step is always to check the HTTP status code of the response. Based on the code, you can decide on the appropriate action. Log errors, including the status code, response body (if available), and the request that caused the error. This information is invaluable for debugging.
For transient errors like 5xx
server issues or 429
rate limits, automatically retrying the request after a delay is a common and effective strategy. However, retrying immediately is often counterproductive, especially for rate limits.
Exponential Backoff: A widely used retry strategy is exponential backoff. Instead of retrying after a fixed delay, you increase the waiting time exponentially after each failed attempt. This prevents overwhelming the API service if it's struggling.
Here's the basic logic:
Flowchart illustrating a common retry logic with exponential backoff for handling API errors.
Here's a conceptual Python implementation of exponential backoff:
import time
import random
import requests
def call_llm_api_with_retry(api_endpoint, headers, data, max_retries=5, base_delay=1.0):
"""Calls the LLM API with exponential backoff on retryable errors."""
retries = 0
delay = base_delay
while retries < max_retries:
try:
response = requests.post(api_endpoint, headers=headers, json=data, timeout=30)
if response.status_code == 200:
return response.json() # Success
# Check for retryable errors
if response.status_code == 429 or response.status_code >= 500:
print(f"Retryable error {response.status_code}. Retrying in {delay:.2f} seconds...")
time.sleep(delay)
retries += 1
delay = base_delay * (2 ** retries) + random.uniform(0, 1) # Exponential backoff with jitter
continue # Retry the loop
else:
# Non-retryable client error (4xx except 429) or unexpected code
print(f"Non-retryable error: {response.status_code}. Response: {response.text}")
response.raise_for_status() # Raise an exception
return None # Should not be reached if raise_for_status() works
except requests.exceptions.RequestException as e:
print(f"Network error: {e}. Retrying in {delay:.2f} seconds...")
time.sleep(delay)
retries += 1
delay = base_delay * (2 ** retries) + random.uniform(0, 1) # Exponential backoff with jitter
print(f"Max retries ({max_retries}) exceeded. Failed to call API.")
return None # Indicate failure after all retries
# Example Usage (assuming headers and data are defined as before)
# result = call_llm_api_with_retry(API_ENDPOINT, headers, data)
# if result:
# print("API call successful after retries:", result)
# else:
# print("API call failed permanently.")
Pay attention to specific rate limit headers in the API response, such as:
Retry-After
: Specifies the number of seconds to wait before making the next request (often sent with 429 or 503 errors).X-RateLimit-Limit
: The maximum number of requests allowed in the time window.X-RateLimit-Remaining
: The number of requests remaining in the current window.X-RateLimit-Reset
: The time (often in Unix timestamp format) when the rate limit window resets.You can use these headers to implement more intelligent rate limiting logic, proactively slowing down requests as you approach the limit, or precisely waiting the required duration specified by Retry-After
.
Monitor your application's API usage, costs, and error rates. Most API providers offer dashboards to track usage. Setting up alerts for high error rates or approaching rate limits can help you identify and address problems proactively.
By implementing robust error checking, intelligent retry strategies like exponential backoff, and paying attention to rate limits, you can build more reliable applications that gracefully handle the inevitable issues that arise when interacting with third-party APIs.
© 2025 ApX Machine Learning