Now that we understand the general structure of LLM APIs, let's get practical. How do we actually send a request containing our prompt from Python and receive the LLM's response? One of the most straightforward and common ways is using the standard Python requests
library. This library provides a simple interface for making HTTP requests, which is exactly what we need to communicate with most web-based LLM APIs.
While many LLM providers offer dedicated Python client libraries (which we'll explore later), using requests
directly is valuable for several reasons:
requests
To make an API call, we typically need four main pieces of information:
POST
.Content-Type
: Specifies the format of the data being sent (e.g., application/json
).Authorization
: Contains credentials to authenticate the request, often an API key (e.g., Bearer YOUR_API_KEY
). Remember to handle API keys securely, as discussed in Chapter 2.The requests
library makes sending a POST
request simple using the requests.post()
function. Let's illustrate with an example. Assume we have an API endpoint https://api.example-llm-provider.com/v1/completions
and our API key.
import requests
import os
import json
# Best practice: Load API key from environment variables or a secure config
# Assume API_KEY is set in your environment (see Chapter 2)
api_key = os.getenv("EXAMPLE_LLM_API_KEY")
if not api_key:
raise ValueError("API key not found. Please set the EXAMPLE_LLM_API_KEY environment variable.")
# 1. Define the API endpoint URL
api_url = "https://api.example-llm-provider.com/v1/completions" # Replace with actual endpoint
# 2. Set up the request headers
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
# 3. Prepare the payload (data) for the request
# Parameter names (e.g., 'model', 'prompt', 'max_tokens') vary by provider
payload = {
"model": "model-name-123", # Specify the desired LLM
"prompt": "Translate the following English text to French: 'Hello, world!'",
"max_tokens": 50, # Limit the response length
"temperature": 0.7 # Control creativity (0.0=deterministic, >1.0=more creative)
}
# 4. Send the POST request
try:
response = requests.post(api_url, headers=headers, json=payload) # Use json= to automatically serialize payload and set Content-Type if not already set
# 5. Check the response status and process the result
response.raise_for_status() # Raises an HTTPError for bad responses (4xx or 5xx)
# Assuming the API returns JSON
result = response.json()
print("API Response:")
print(json.dumps(result, indent=2)) # Pretty-print the JSON response
# Extract the generated text (structure depends on the specific API)
# This is a hypothetical structure; check your provider's documentation
if "choices" in result and len(result["choices"]) > 0:
generated_text = result["choices"][0].get("text", "No text found")
print("\nGenerated Text:")
print(generated_text.strip())
else:
print("\nCould not find generated text in the expected format.")
except requests.exceptions.RequestException as e:
print(f"An error occurred during the API request: {e}")
except Exception as e:
print(f"An unexpected error occurred: {e}")
In this example:
requests
, os
, json
).api_url
, headers
, and the payload
dictionary. Note the use of an f-string for inserting the API key into the Authorization
header.requests.post(api_url, headers=headers, json=payload)
. Using the json
parameter directly tells requests
to automatically convert the payload
dictionary into a JSON string and set the Content-Type
header to application/json
(though we also set it explicitly for clarity).response.raise_for_status()
is a crucial step. It checks if the HTTP status code indicates success (like 200 OK). If the API returned an error code (e.g., 401 Unauthorized, 429 Rate Limit Exceeded, 500 Internal Server Error), this method will raise an requests.exceptions.HTTPError
.response.json()
parses the JSON content from the response body into a Python dictionary.result["choices"][0]["text"]
). The exact structure varies significantly between APIs, so always consult the provider's documentation.try...except
catches potential network issues (requests.exceptions.RequestException
) or other problems.The payload
dictionary is where you control the LLM's behavior. While specific names differ, common parameters include:
model
: (String) The identifier for the specific language model you want to use (e.g., gpt-3.5-turbo
, claude-2.1
).prompt
or messages
: (String or List) The input text. Some APIs use prompt
for simple text completion, while others use a messages
list for chat-like interactions, often including roles (user
, assistant
, system
).max_tokens
: (Integer) The maximum number of tokens (roughly, words or parts of words) the model should generate in the response. Helps control cost and response length.temperature
: (Float, typically 0.0 to 2.0) Controls the randomness of the output. Lower values (e.g., 0.2) make the output more focused and deterministic, while higher values (e.g., 0.8) make it more creative and diverse.top_p
: (Float, 0.0 to 1.0) An alternative to temperature for controlling randomness (nucleus sampling). The model considers only the tokens whose cumulative probability mass exceeds top_p
.stop
: (String or List of Strings) Sequences of characters where the API should stop generating further tokens.Always refer to the documentation of the specific LLM provider you are using to understand the available parameters, their names, expected values, and default behaviors.
Using the requests
library provides a fundamental way to interact with LLM APIs. It gives you direct control over the HTTP communication and is a valuable skill even when using more specialized libraries, which often build upon these basic principles. In the following sections, we'll look closer at handling different types of API responses and errors, and then explore the official client libraries offered by major LLM providers.
© 2025 ApX Machine Learning