Okay, you've learned that Application Programming Interfaces (APIs) are the bridges that let our code talk to remote services, like those hosting powerful Large Language Models. Now, let's roll up our sleeves and see how to actually send a request across that bridge. This is where the interaction becomes tangible.
Think of making an API request like sending a very specific, structured letter to the LLM service. This "letter" needs a destination address, instructions on how it should be handled, proof of who you are (authentication), and the actual message (your prompt).
Most interactions with LLM APIs use the standard protocols of the web, specifically HTTP requests. A typical request involves these pieces:
https://api.example-llm-provider.com/v1/generate
.POST
.Content-Type
: Tells the server the format of the data you're sending in the request body. This is usually application/json
.Authorization
: Provides your credentials, typically an API key, to prove you have permission to use the service. More on this shortly.Before you can send requests, you almost always need an API key. You typically get this key by signing up for an account with the LLM provider.
Important: Treat your API key like a password. Keep it secret and secure. Never embed it directly in code that might be shared publicly (like in version control systems). Environment variables or secure secret management tools are better ways to handle keys in applications.
API keys are usually included in the Authorization
header. A common format is Bearer YOUR_API_KEY
, where YOUR_API_KEY
is replaced with the actual key you received from the provider.
The request body is where you put your instructions for the LLM. It's a JSON object containing key-value pairs. Common elements include:
prompt
: (String) The text input you want the LLM to process or respond to.model
: (String) Specifies which particular LLM you want to use (e.g., example-model-v2
or llama-3-8b-instruct
). Providers often offer multiple models with different capabilities and sizes.max_tokens
: (Integer) The maximum number of tokens (roughly, words or parts of words) you want the generated response to contain. This helps control length and cost.temperature
: (Float, often between 0 and 1) Controls the randomness of the output. Lower values (e.g., 0.2) make the output more focused and deterministic, while higher values (e.g., 0.8) make it more creative and diverse.Here's an example of what a simple JSON payload might look like:
{
"model": "example-model-v1-instruct",
"prompt": "Translate the following English text to French: 'Hello, world!'",
"max_tokens": 50,
"temperature": 0.7
}
curl
curl
is a command-line tool for transferring data with URLs. It's great for quickly testing API endpoints. Here's how you might send the request described above (replace YOUR_API_KEY
and the URL with actual values):
# Replace YOUR_OPENAI_API_KEY with your actual key
# Replace gpt-3.5-turbo with your desired model if different
curl https://api.openai.com/v1/chat/completions \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "Translate the following English text to French: '\''Hello, world!'\''"
}
],
"max_tokens": 50,
"temperature": 0.7
}'
Let's break down this command:
curl https://.../v1/generate
: Specifies the tool (curl
) and the endpoint URL.-X POST
: Sets the HTTP method to POST
.-H "Content-Type: application/json"
: Adds the content type header.-H "Authorization: Bearer YOUR_API_KEY"
: Adds the authorization header (remember to use your real key).-d '{...}'
: Specifies the data (payload) to send in the request body. Note the use of single quotes around the JSON and escaped single quotes (\'
) within the prompt string itself for the command line.In practice, you'll often make API calls from within a script or application. Python's requests
library is a popular choice for this.
First, ensure you have requests
installed (pip install requests
).
import requests
import json
import os # Recommended for handling API keys
# --- Configuration ---
# Best practice: Load API key from environment variable or secure storage
# api_key = os.getenv("YOUR_API_KEY")
api_key = "YOUR_API_KEY" # Replace with your actual API key if not using env variables
if not api_key:
raise ValueError("OpenAI API key not found. Set the OPENAI_API_KEY environment variable.")
api_endpoint = "https://api.openai.com/v1/chat/completions"
# --- Request Headers ---
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
# --- Request Payload (OpenAI Chat Completions Format) ---
payload = {
"model": "gpt-3.5-turbo", # Specify the model you want to use
"messages": [
{
"role": "user", # Role can be 'system', 'user', or 'assistant'
"content": "Translate the following English text to French: 'Hello, world!'" # The user's prompt
}
# You can add more messages here for conversation history, e.g.:
# {"role": "assistant", "content": "Bonjour le monde!"},
# {"role": "user", "content": "Now translate 'goodbye'"}
],
"max_tokens": 50, # Maximum number of tokens to generate
"temperature": 0.7 # Controls randomness (0.0 to 2.0)
}
# --- Send the Request ---
try:
response = requests.post(api_endpoint, headers=headers, json=payload)
# --- Check for successful response ---
response.raise_for_status() # Raises an HTTPError exception for bad responses (4xx or 5xx)
# If successful, proceed to handle the response
print("Request successful!")
response_data = response.json()
# Example of accessing the generated text (structure might vary slightly)
# print(json.dumps(response_data, indent=2)) # Print the full response nicely
if response_data.get("choices"):
print("Generated Text:", response_data["choices"][0]["message"]["content"])
else:
print("No choices found in response.")
except requests.exceptions.RequestException as e:
print(f"An error occurred during the API request: {e}")
if hasattr(e, 'response') and e.response is not None:
print(f"Status Code: {e.response.status_code}")
try:
# Try to parse and print the error details from OpenAI if available
error_details = e.response.json()
print(f"Response Body: {json.dumps(error_details, indent=2)}")
except json.JSONDecodeError:
# If response is not JSON, print the raw text
print(f"Response Body: {e.response.text}")
else:
print("No response received from server.")
except Exception as ex:
print(f"An unexpected error occurred: {ex}")
This Python code does essentially the same thing as the curl
command: it defines the URL, headers, and payload, then sends a POST
request. We've also added basic error checking using a try...except
block and response.raise_for_status()
, which is good practice. Using requests.post
with the json=
argument automatically converts the Python dictionary payload
into a JSON string and sets the Content-Type
header correctly.
To help visualize this, consider the path your request takes:
This diagram shows your client sending an HTTP POST request containing the necessary details (URL, Headers, Body) to the API server. The server validates the request, passes the instructions to the actual LLM, receives the generated text, and sends it back to your client in an HTTP response.
You've now seen the structure of an API request and how to formulate and send one using common tools. The next step is understanding what comes back from the LLM service.
© 2025 ApX Machine Learning