Validating input arguments is only the first step in the tool execution lifecycle. Once the arguments satisfy the Pydantic model, the server invokes the registered handler function. This function acts as the bridge between the JSON-RPC protocol and external systems. In the context of the Model Context Protocol (MCP), this bridge usually involves an asynchronous network request to a third-party REST or GraphQL API.
Implementing these handlers requires careful management of asynchronous contexts, data transformation, and network latency. Unlike local file operations, external API calls introduce unpredictability. The server must handle these interactions efficiently to maintain a responsive experience for the user interacting with the Large Language Model (LLM).
Most MCP servers implemented in Python use asyncio to handle concurrent requests. When defining a tool handler, you should use the async def syntax. This allows the server to yield control while waiting for the external API to respond, preventing the entire server from blocking during a network request.
When using the Python MCP SDK, the tool execution logic is wrapped within a function decorated to handle the CallToolRequestSchema. Inside this function, you generally employ an asynchronous HTTP client, such as httpx or aiohttp.
The following diagram illustrates the execution flow when an LLM invokes a tool that wraps an external API.
Flow of a tool request from the client, through validation and external execution, and back to the client as text.
The core of the implementation involves constructing the HTTP request using the arguments provided by the tool invocation. It is best practice to instantiate a single HTTP client instance for the lifecycle of the server or use a context manager for individual requests to ensure connections are closed properly.
Consider a scenario where we want to provide a tool called get_crypto_price that fetches the current price of a cryptocurrency. The tool accepts a ticker symbol (e.g., "BTC") and a currency (e.g., "USD").
Here is how the implementation looks using httpx:
import httpx
import mcp.types as types
from mcp.server import Server
app = Server("crypto-server")
async def fetch_crypto_price(ticker: str, currency: str) -> str:
url = f"https://api.coingecko.com/api/v3/simple/price"
params = {
"ids": ticker.lower(),
"vs_currencies": currency.lower()
}
async with httpx.AsyncClient() as client:
# Set a timeout to prevent hanging the LLM generation
response = await client.get(url, params=params, timeout=10.0)
response.raise_for_status()
data = response.json()
# Extract the specific value
price = data.get(ticker.lower(), {}).get(currency.lower())
if price:
return f"The current price of {ticker} is {price} {currency.upper()}."
else:
return f"Could not retrieve price for {ticker} in {currency}."
@app.call_tool()
async def handle_tool_call(
name: str,
arguments: dict
) -> list[types.TextContent]:
if name == "get_crypto_price":
# Arguments are already validated by the schema at this point
ticker = arguments.get("ticker")
currency = arguments.get("currency", "usd")
result = await fetch_crypto_price(ticker, currency)
return [
types.TextContent(
type="text",
text=result
)
]
raise ValueError(f"Unknown tool: {name}")
In this implementation, the fetch_crypto_price function encapsulates the external logic. It handles the URL construction, query parameters, and the network call. The handle_tool_call function acts as the router, unpacking the dictionary arguments and delegating to the specific logic function.
External APIs often return verbose JSON objects containing metadata, timestamps, and extensive details that may not be relevant to the user's query. Large Language Models have finite context windows. Dumping a 50KB raw JSON response into the conversation history can degrade the model's performance and increase costs (if paying per token).
A significant responsibility of the Tool implementation is filtering and transforming this raw data into a concise, human-readable format or a minimized JSON structure.
For example, a weather API might return:
{
"coord": { "lon": -0.13, "lat": 51.51 },
"weather": [{ "id": 300, "main": "Drizzle", "description": "light intensity drizzle", "icon": "09d" }],
"base": "stations",
"main": { "temp": 280.32, "pressure": 1012, "humidity": 81, "temp_min": 279.15, "temp_max": 281.15 },
"visibility": 10000,
"wind": { "speed": 4.1, "deg": 80 },
"clouds": { "all": 90 },
"dt": 1485789600,
"sys": { "type": 1, "id": 5091, "message": 0.0103, "country": "GB", "sunrise": 1485762037, "sunset": 1485794875 },
"id": 2643743,
"name": "London",
"cod": 200
}
Passing this entire object is inefficient. The tool should parse this and return: "Current weather in London: Drizzle, Temperature: 280.32K, Humidity: 81%."
The following chart compares the token usage impact of raw API responses versus transformed responses. Minimizing payload size is critical for maintaining long-running conversations.
Comparison of estimated token consumption between a raw API response and a curated text response.
When a tool accepts input that influences a network request, specifically URLs or Hostnames, it introduces a risk of Server-Side Request Forgery (SSRF). In an SSRF attack, an attacker (or a hallucinating model) manipulates the server into making requests to internal resources that should not be accessible from the outside.
For instance, if you implement a tool fetch_url_content(url: str), a user might ask the LLM to retrieve http://localhost:8080/admin. If your MCP server is running on a developer's machine or a cloud container with internal services, the tool might expose sensitive internal configuration data.
To mitigate this when implementing external calls:
https://wikipedia.org.http or https, blocking schemes like file:// or gopher:// which could access the local filesystem.LLMs operate in real-time for the user. If a tool takes 30 seconds to execute, the user might assume the generation has failed. Python's httpx library defaults to a timeout, but it is often generous. You should explicitly set aggressive timeouts for your external calls.
A 5 to 10-second timeout is generally appropriate for MCP tools. If the external API is slower than this, the tool should probably return a message stating that the operation is taking longer than expected or that the service is unavailable, rather than holding the connection open indefinitely.
In the httpx example provided earlier, timeout=10.0 ensures that if the CoinGecko API hangs, the tool handler raises an exception (which must be caught) rather than freezing the MCP server.
# Recommended timeout configuration
timeout_config = httpx.Timeout(10.0, connect=5.0)
async with httpx.AsyncClient(timeout=timeout_config) as client:
# ... perform request
This configuration allows 5 seconds to establish a connection and 10 seconds in total for the operation, ensuring the server recovers quickly from network issues.
Was this section helpful?
httpx library, detailing asynchronous HTTP client usage and timeout configuration.© 2026 ApX Machine LearningEngineered with