Once you've successfully sent a request to an LLM API, the next essential step is processing the response returned by the model. LLM APIs typically return data structured in JSON format, which is machine-readable and relatively easy to work with in most programming languages, including Python. Understanding the structure and content of these responses is fundamental for integrating the LLM's output into your application.

A typical response from a completion or chat API contains several pieces of information. Let's examine a common structure you might encounter (details vary slightly between providers like OpenAI, Anthropic, Google, etc., but the core concepts are similar):

{
  "id": "resp-a1b2c3d4e5",
  "object": "text_completion",
  "created": 1680000000,
  "model": "model-name-v1.0",
  "choices": [
    {
      "index": 0,
      "text": "The capital of France is Paris.",
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 7,
    "total_tokens": 17
  }
}

Or for a chat-based API:

{
  "id": "chatresp-f6g7h8i9j0",
  "object": "chat.completion",
  "created": 1680000100,
  "model": "chat-model-v2.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "\n\nThe capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 9,
    "total_tokens": 24
  }
}

Let's break down the important fields:

id: A unique identifier for the API response. Useful for logging and tracking.
object: The type of object returned (e.g., text_completion, chat.completion).
created: A Unix timestamp indicating when the response was generated.
model: The specific LLM version used to generate the response. This is important for reproducibility and understanding potential behavioral differences between model versions.
choices: This is usually the most critical part. It's an array containing one or more possible completions or messages generated by the model.
- index: The position of this choice in the array (usually starts at 0).
- text or message.content: The actual text generated by the LLM. This is the core output you'll typically use in your application. Note the slight difference in structure between completion and chat APIs. Chat APIs often return a message object containing role (assistant) and content.
- finish_reason: Explains why the model stopped generating text. Common values include:
  - stop: The model naturally concluded its response or encountered a predefined stop sequence.
  - length: The generation reached the maximum number of tokens specified in the request (max_tokens). The output might be cut off.
  - content_filter: The generated content was flagged by the provider's safety filters.
  - tool_calls: (In newer APIs) The model decided to call a function or tool you provided. This requires specific handling discussed later.
  - null: Generation is still in progress (relevant for streaming).
- logprobs: (Optional) Provides log probabilities for the generated tokens, useful for advanced analysis but often null by default.
usage: Provides information about token consumption for the request.
- prompt_tokens: Number of tokens in your input prompt.
- completion_tokens: Number of tokens generated in the response.
- total_tokens: The sum of prompt and completion tokens. This is directly related to the cost of the API call.

Accessing Response Data in Python

Assuming you have received the response using a library like requests, you first need to parse the JSON body. Most LLM client libraries or SDKs handle this automatically, returning a Python dictionary or object. If using requests directly:

import requests
import json
import os

# Assume API_ENDPOINT, HEADERS, and PAYLOAD are defined correctly
# API_ENDPOINT = "YOUR_LLM_API_ENDPOINT"
# HEADERS = {"Authorization": f"Bearer {os.getenv('LLM_API_KEY')}", "Content-Type": "application/json"}
# PAYLOAD = { "model": "your-model-name", "prompt": "What is the capital of France?", "max_tokens": 50 }

# Simulated response for demonstration - replace with actual API call
simulated_response_json = """
{
  "id": "chatresp-simulated",
  "object": "chat.completion",
  "created": 1680000100,
  "model": "chat-model-v2.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "\\n\\nThe capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 9,
    "total_tokens": 24
  }
}
"""
try:
    # In a real application:
    # response = requests.post(API_ENDPOINT, headers=HEADERS, json=PAYLOAD)
    # response.raise_for_status() # Check for HTTP errors (4xx, 5xx)
    # response_data = response.json()

    # Using simulated data:
    response_data = json.loads(simulated_response_json)

    # Basic validation: Check if 'choices' exists and is not empty
    if 'choices' in response_data and len(response_data['choices']) > 0:
        # Extract content based on expected structure (e.g., chat completion)
        if 'message' in response_data['choices'][0] and 'content' in response_data['choices'][0]['message']:
            generated_content = response_data['choices'][0]['message']['content'].strip()
            finish_reason = response_data['choices'][0].get('finish_reason', 'unknown') # Use .get for safety
            
            print(f"Generated Content: {generated_content}")
            print(f"Finish Reason: {finish_reason}")

            # Extract usage information if available
            if 'usage' in response_data:
                usage = response_data['usage']
                print(f"Tokens Used - Prompt: {usage.get('prompt_tokens', 'N/A')}, Completion: {usage.get('completion_tokens', 'N/A')}, Total: {usage.get('total_tokens', 'N/A')}")
            else:
                print("Usage data not found in response.")

        # Add checks for other structures like 'text' if needed for different API types
        elif 'text' in response_data['choices'][0]:
             generated_content = response_data['choices'][0]['text'].strip()
             finish_reason = response_data['choices'][0].get('finish_reason', 'unknown')
             print(f"Generated Content: {generated_content}")
             print(f"Finish Reason: {finish_reason}")
             # Extract usage... (similar to above)
        else:
             print("Could not find 'message'/'content' or 'text' in the first choice.")

    else:
        print("Error: 'choices' array not found or is empty in the response.")
        # Log the full response_data for debugging if needed
        # print(f"Full response: {response_data}")

except json.JSONDecodeError:
    print("Error: Failed to decode JSON response from the API.")
# except requests.exceptions.RequestException as e:
#     print(f"Error during API request: {e}") # Handle request errors
except (KeyError, IndexError, TypeError) as e:
    print(f"Error parsing expected structure from API response: {e}")
    # Potentially log response_data here too for debugging unexpected structures

Handling Multiple Choices and Finish Reasons

While most requests ask for a single completion (n=1), you can request multiple choices. If len(response_data['choices']) > 1, you'll need to iterate through the array or select a choice based on some criteria (e.g., index 0 is often the default or highest probability).

The finish_reason is important for application logic.

If it's length, the response might be incomplete. You might need to inform the user, truncate gracefully, or potentially make another API call to continue the generation (though this requires careful state management).
If it's content_filter, you should handle this appropriately, perhaps by displaying a generic message instead of the filtered content.
If it's tool_calls, your application needs to parse the requested tool invocation, execute the corresponding function, and potentially send the result back to the model in a subsequent API call.

Processing the API response involves more than just extracting the text. It requires parsing the structured data, understanding metadata like the finish reason and token usage, and handling potential variations or missing fields defensively. This careful processing ensures your application can reliably use the output generated by the LLM.

Was this section helpful?