While databases provide access to internally managed data, many organizations and services expose their data externally through Application Programming Interfaces (APIs). APIs act as standardized contracts that allow different software systems to communicate. For data scientists, web APIs are a significant source of structured, often real-time, data ranging from social media activity and financial markets to weather forecasts and government statistics.
Unlike scraping websites, which involves parsing HTML designed for humans, APIs typically provide data in machine-readable formats like JSON (JavaScript Object Notation) or XML, making data extraction much cleaner and more reliable. We'll focus on REST (Representational State Transfer) APIs, a common architectural style for web APIs that uses standard HTTP methods.
Interacting with a REST API generally follows a request-response pattern:
Client Request: Your Python script (the client) sends an HTTP request to a specific URL (the endpoint) on the API server. This request includes:
GET for retrieving data. Other methods like POST, PUT, DELETE are used for creating, updating, or deleting resources, but are less common for simple data acquisition.Accept: application/json) or authentication credentials (Authorization: Bearer YOUR_API_KEY).?city=London&units=metric).POST or PUT, not usually required for GET requests retrieving data.Server Response: The API server processes the request and sends back an HTTP response, containing:
200 OK, 404 Not Found, 401 Unauthorized, 500 Internal Server Error).A simplified view of the client-server interaction when fetching data from a web API using an HTTP GET request.
requests LibraryThe requests library is the de facto standard in Python for making HTTP requests. If you don't have it installed, you can typically install it using pip:
pip install requests
Let's see how to use it to fetch data. We'll use the JSONPlaceholder API (https://jsonplaceholder.typicode.com), a free fake online REST API for testing and prototyping.
Basic GET Request
To fetch a list of posts, you can make a GET request to the /posts endpoint:
import requests
import pandas as pd
# Define the API endpoint URL
url = "https://jsonplaceholder.typicode.com/posts"
try:
# Send the GET request
response = requests.get(url)
# Raise an exception for bad status codes (4xx or 5xx)
response.raise_for_status()
# If the request was successful (status code 200)
print("Request successful!")
# Parse the JSON response body
data = response.json() # Returns a list of dictionaries
# Optionally, convert to a Pandas DataFrame
df_posts = pd.DataFrame(data)
print(f"Successfully retrieved {len(df_posts)} posts.")
print(df_posts.head())
except requests.exceptions.RequestException as e:
# Handle connection errors, timeouts, etc.
print(f"Request failed: {e}")
except requests.exceptions.HTTPError as e:
# Handle specific HTTP errors (like 404 Not Found, 401 Unauthorized)
print(f"HTTP error occurred: {e}")
print(f"Status Code: {e.response.status_code}")
# You might want to inspect e.response.text for more details from the API
except requests.exceptions.JSONDecodeError:
# Handle cases where the response body isn't valid JSON
print("Failed to decode JSON response.")
print("Response text:", response.text) # Log the raw text
Adding URL Parameters
Many APIs allow you to filter or customize results using query parameters appended to the URL (e.g., ?userId=1). The requests library makes this easy using the params argument, which takes a dictionary.
import requests
import pandas as pd
# Fetch posts only for userId = 5
params = {"userId": 5}
url = "https://jsonplaceholder.typicode.com/posts"
try:
response = requests.get(url, params=params)
response.raise_for_status() # Check for HTTP errors
data = response.json()
df_user5_posts = pd.DataFrame(data)
print(f"\nRetrieved {len(df_user5_posts)} posts for userId=5:")
print(df_user5_posts.head())
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
except requests.exceptions.HTTPError as e:
print(f"HTTP error occurred: {e} (Status: {e.response.status_code})")
requests automatically encodes the dictionary and appends it to the URL correctly (e.g., https://jsonplaceholder.typicode.com/posts?userId=5).
Custom Headers and Authentication
APIs often require authentication, typically via an API key. You might also need to set other headers like User-Agent or Accept. Headers are passed as a dictionary to the headers argument.
Common authentication patterns include:
Authorization header (e.g., Authorization: Bearer YOUR_KEY or Authorization: ApiKey YOUR_KEY) or a custom header (e.g., X-API-Key: YOUR_KEY).?apiKey=YOUR_KEY).Let's simulate adding an API key and a custom User-Agent header (though JSONPlaceholder doesn't require them):
import requests
# Replace 'YOUR_ACTUAL_API_KEY' with a real key if needed for another API
api_key = "YOUR_ACTUAL_API_KEY"
headers = {
"Authorization": f"Bearer {api_key}",
"User-Agent": "MyDataScienceApplication/1.0",
"Accept": "application/json"
}
url = "https://jsonplaceholder.typicode.com/todos/1" # Example: fetch a single 'todo' item
try:
# Pass the headers dictionary
response = requests.get(url, headers=headers)
response.raise_for_status()
todo_item = response.json()
print("\nRetrieved single todo item (with simulated headers):")
print(todo_item)
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
except requests.exceptions.HTTPError as e:
print(f"HTTP error occurred: {e} (Status: {e.response.status_code})")
# If you get a 401 or 403 error, it's likely an authentication issue.
Always consult the specific API's documentation for the correct authentication method and required headers.
Successfully making the request is only half the process. You need to interpret the response correctly.
response.status_code or use response.raise_for_status() to ensure the request was successful (typically status code 200). A 4xx code indicates a client error (bad request, missing authentication, resource not found), while a 5xx code indicates a server error.response.json() to parse it into a Python dictionary or list. Be prepared to handle requests.exceptions.JSONDecodeError if the response isn't valid JSON.try...except blocks to gracefully handle network issues (requests.exceptions.RequestException), HTTP errors (requests.exceptions.HTTPError), and JSON parsing errors. Logging the status code and response text (response.text) upon errors is helpful for debugging.APIs are shared resources. Most APIs enforce rate limits, restricting the number of requests you can make within a certain time window (e.g., 100 requests per minute). Exceeding these limits might result in temporary blocks (e.g., status code 429 Too Many Requests).
Always:
time.sleep()).429 Errors: If you receive a 429 status code, your script should wait (often the Retry-After header in the response indicates how long) before trying again.Respecting API terms ensures continued access and responsible data acquisition.
By mastering interaction with web APIs using libraries like requests, you gain access to a comprehensive array of dynamic datasets, significantly expanding the scope of data you can incorporate into your data science projects. This forms a critical part of the data acquisition toolkit, complementing data retrieved from databases and files.
Was this section helpful?
requests library, providing comprehensive guides and examples for making HTTP requests, handling responses, and managing various API interaction patterns.© 2026 ApX Machine LearningEngineered with