You've likely encountered basic Python decorators, functions that wrap other functions to add behavior without modifying the original function's code directly. They provide a clean syntax for common tasks like logging or access control. Now, let's explore more sophisticated applications of decorators, particularly how they can be instrumental in building flexible and maintainable machine learning systems. These advanced patterns allow for configuration, state management, and seamless integration into larger frameworks.
Often, you need a decorator that can be configured. For example, you might want a logging decorator where you can specify the log level, or a timing decorator where you set a threshold for triggering a warning. To achieve this, you need to create a decorator factory: a function that takes arguments and returns the actual decorator function.
Consider a scenario where you want to time the execution of critical functions in your ML pipeline (like feature computation or model prediction) and log a warning only if the execution time exceeds a certain limit.
import time
import functools
import logging
# Configure basic logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
def timing_threshold(threshold_seconds):
"""
Decorator factory: returns a decorator that logs a warning
if the wrapped function takes longer than 'threshold_seconds' to execute.
"""
def decorator(func):
@functools.wraps(func) # Preserve original function metadata
def wrapper(*args, **kwargs):
start_time = time.perf_counter()
result = func(*args, **kwargs)
end_time = time.perf_counter()
duration = end_time - start_time
if duration > threshold_seconds:
logging.warning(
f"Function '{func.__name__}' took {duration:.4f} seconds, "
f"exceeding the threshold of {threshold_seconds} seconds."
)
else:
logging.info(
f"Function '{func.__name__}' executed in {duration:.4f} seconds."
)
return result
return wrapper
return decorator
# Example Usage
@timing_threshold(0.5) # Apply the decorator with a 0.5 second threshold
def complex_feature_engineering(data):
"""Simulates a potentially time-consuming operation."""
# Simulate work
time.sleep(0.7)
# In a real scenario, this would perform complex calculations
processed_data = data * 2 # Placeholder operation
return processed_data
# Call the decorated function
data_input = list(range(5))
processed = complex_feature_engineering(data_input)
# Output (will show a warning because 0.7 > 0.5):
# 2023-10-27 10:30:00,123 - WARNING - Function 'complex_feature_engineering' took 0.7005 seconds, exceeding the threshold of 0.5 seconds.
@timing_threshold(1.0) # Apply with a different threshold
def quick_data_loading(filepath):
"""Simulates a faster operation."""
time.sleep(0.2)
logging.info(f"Data loaded from {filepath}")
return {"data": [1, 2, 3]}
loaded = quick_data_loading("path/to/data.csv")
# Output (will show info because 0.2 < 1.0):
# 2023-10-27 10:30:01,325 - INFO - Function 'quick_data_loading' executed in 0.2003 seconds.
In this pattern, timing_threshold(0.5)
is called first. It returns the actual decorator
function, which is then applied to complex_feature_engineering
. The wrapper
function inside the decorator
contains the timing logic and uses the threshold_seconds
value captured from the outer scope (a closure). Notice the use of functools.wraps(func)
. This is important for preserving the original function's name (__name__
), docstring (__doc__
), and other metadata, which is essential for debugging and introspection tools.
Sometimes decorators need to maintain state between calls. Imagine needing to count how many times a specific prediction endpoint is hit or implementing a simple cache. While you could use global variables (generally discouraged), a much cleaner approach is to implement the decorator as a class.
When a class is used as a decorator, its __init__
method receives the function being decorated (similar to how a simple decorator function receives it). To make the decorated function callable, the class must implement the __call__
method. This method will be executed when the decorated function is invoked.
Here's an example of a stateful decorator that counts function calls:
import functools
class CallCounter:
"""
A stateful decorator implemented as a class to count function calls.
"""
def __init__(self, func):
functools.update_wrapper(self, func) # Preserve metadata
self.func = func
self.call_count = 0
def __call__(self, *args, **kwargs):
self.call_count += 1
print(f"Call {self.call_count} to function '{self.func.__name__}'")
return self.func(*args, **kwargs)
@CallCounter
def predict_sentiment(text):
"""Analyzes text and returns a sentiment score."""
# Simulate prediction
score = len(text) / 100.0 # Placeholder logic
print(f" Predicting sentiment for: '{text[:20]}...' -> Score: {score:.2f}")
return score
# Example Usage
predict_sentiment("This is a wonderful example!")
predict_sentiment("Another call to the same function.")
predict_sentiment("Metaprogramming is powerful.")
# Output:
# Call 1 to function 'predict_sentiment'
# Predicting sentiment for: 'This is a wonderful ...' -> Score: 0.29
# Call 2 to function 'predict_sentiment'
# Predicting sentiment for: 'Another call to the ...' -> Score: 0.33
# Call 3 to function 'predict_sentiment'
# Predicting sentiment for: 'Metaprogramming is p...' -> Score: 0.27
Here, each instance of CallCounter
maintains its own call_count
. functools.update_wrapper
is used similarly to functools.wraps
but is more appropriate for class decorators to copy metadata from func
to the decorator instance self
.
You can apply multiple decorators to a single function. The order matters: decorators are applied from bottom to top (closest to the function definition first).
import functools
def log_args(func):
"""Decorator to log function arguments."""
@functools.wraps(func)
def wrapper(*args, **kwargs):
print(f"Calling {func.__name__} with args: {args}, kwargs: {kwargs}")
return func(*args, **kwargs)
return wrapper
def validate_input_shape(expected_dim):
"""Decorator factory for validating input array dimensions."""
def decorator(func):
@functools.wraps(func)
def wrapper(data_array, *args, **kwargs):
if hasattr(data_array, 'ndim') and data_array.ndim == expected_dim:
print(f"Input shape validation passed for {func.__name__}.")
return func(data_array, *args, **kwargs)
else:
raise ValueError(
f"Function '{func.__name__}' expected input with "
f"{expected_dim} dimensions, got {getattr(data_array, 'ndim', 'N/A')}"
)
return wrapper
return decorator
# Example Usage
import numpy as np
@log_args # Applied second
@validate_input_shape(2) # Applied first
def process_matrix(matrix):
"""Processes a 2D numpy array."""
print(f" Processing matrix of shape: {matrix.shape}")
# Simulate processing
return matrix.sum()
matrix_2d = np.array([[1, 2], [3, 4]])
matrix_1d = np.array([1, 2, 3])
print("Processing 2D matrix:")
process_matrix(matrix_2d)
print("\nProcessing 1D matrix (will raise error):")
try:
process_matrix(matrix_1d)
except ValueError as e:
print(f"Caught expected error: {e}")
# Output:
# Processing 2D matrix:
# Calling process_matrix with args: (array([[1, 2], [3, 4]]),), kwargs: {}
# Input shape validation passed for process_matrix.
# Processing matrix of shape: (2, 2)
#
# Processing 1D matrix (will raise error):
# Calling process_matrix with args: (array([1, 2, 3]),), kwargs: {}
# Caught expected error: Function 'process_matrix' expected input with 2 dimensions, got 1
When process_matrix(matrix_2d)
is called:
log_args
's wrapper executes first. It prints the arguments.validate_input_shape(2)
.validate_input_shape
's wrapper executes. It checks the dimensions (matrix_2d.ndim
is 2, which matches expected_dim
).process_matrix
function.Understanding this execution order is significant when decorators have side effects or depend on each other.
functools.lru_cache
Python's standard library provides useful decorators. One particularly relevant for ML is functools.lru_cache
. It implements a memoization technique, caching the results of function calls and returning the cached result when the same inputs occur again. This is highly effective for expensive, pure functions (functions that always return the same output for the same input and have no side effects) often found in feature extraction or data lookup tasks.
import functools
import time
import requests # Example requires 'requests' library: pip install requests
@functools.lru_cache(maxsize=128) # Cache up to 128 unique calls
def get_external_data(resource_id):
"""
Simulates fetching data from an external source (e.g., API, database).
This operation is assumed to be slow.
"""
print(f"Fetching data for resource_id: {resource_id}...")
# Simulate network latency or expensive computation
time.sleep(1.0)
# In reality, you might use requests.get(f"https://api.example.com/data/{resource_id}")
return {"id": resource_id, "value": resource_id * 10}
# First call - will be slow and print "Fetching..."
start = time.time()
data1 = get_external_data(101)
print(f"First call duration: {time.time() - start:.4f}s, Data: {data1}")
# Second call with the same argument - should be instantaneous (cached)
start = time.time()
data2 = get_external_data(101)
print(f"Second call duration: {time.time() - start:.4f}s, Data: {data2}")
# Call with a different argument - will be slow again
start = time.time()
data3 = get_external_data(202)
print(f"Third call duration: {time.time() - start:.4f}s, Data: {data3}")
# Output:
# Fetching data for resource_id: 101...
# First call duration: 1.0012s, Data: {'id': 101, 'value': 1010}
# Second call duration: 0.0000s, Data: {'id': 101, 'value': 1010} <- Cached!
# Fetching data for resource_id: 202...
# Third call duration: 1.0008s, Data: {'id': 202, 'value': 2020}
lru_cache
(Least Recently Used cache) automatically handles storing results based on function arguments (which must be hashable) and evicting the least recently used entries when the maxsize
limit is reached.
Less common than function decorators, class decorators modify or replace a class definition. They work similarly: a function receives the class object itself and returns a (potentially modified) class object.
Use cases include:
Here's a conceptual example of using a class decorator for registration:
MODEL_REGISTRY = {}
def register_model(cls):
"""Class decorator to register model classes."""
model_name = cls.__name__
if model_name in MODEL_REGISTRY:
print(f"Warning: Overwriting existing model in registry: {model_name}")
MODEL_REGISTRY[model_name] = cls
print(f"Registered model: {model_name}")
return cls # Return the original class unmodified
@register_model
class LogisticRegressionModel:
def __init__(self, learning_rate=0.01):
self.lr = learning_rate
def fit(self, X, y):
print(f"Fitting LogisticRegressionModel (lr={self.lr})...")
# Actual fitting logic would go here
pass
def predict(self, X):
print("Predicting with LogisticRegressionModel...")
# Actual prediction logic
return [0] * len(X) # Placeholder
@register_model
class SupportVectorMachineModel:
def __init__(self, kernel='rbf'):
self.kernel = kernel
def fit(self, X, y):
print(f"Fitting SupportVectorMachineModel (kernel={self.kernel})...")
pass
def predict(self, X):
print("Predicting with SupportVectorMachineModel...")
return [1] * len(X) # Placeholder
print("\nAvailable models in registry:", list(MODEL_REGISTRY.keys()))
# Instantiate a model from the registry
model_class = MODEL_REGISTRY['LogisticRegressionModel']
model_instance = model_class(learning_rate=0.05)
model_instance.fit(None, None) # Pass dummy data for example
# Output:
# Registered model: LogisticRegressionModel
# Registered model: SupportVectorMachineModel
#
# Available models in registry: ['LogisticRegressionModel', 'SupportVectorMachineModel']
# Fitting LogisticRegressionModel (lr=0.05)...
This pattern allows you to create extensible systems where new components (like models or data processors) can be added simply by defining them and applying the decorator.
Advanced decorator applications provide powerful tools for enhancing Python code in ML contexts. They allow for clean implementation of cross-cutting concerns like logging, validation, timing, caching, and registration, leading to more modular, reusable, and maintainable machine learning systems. As you progress through this chapter, you'll see how these techniques complement other metaprogramming features like descriptors and metaclasses to build even more sophisticated frameworks.
© 2025 ApX Machine Learning