All Courses

Advanced Decorator Applications

You've likely encountered basic Python decorators, functions that wrap other functions to add behavior without modifying the original function's code directly. They provide a clean syntax for common tasks like logging or access control. Now, let's explore more sophisticated applications of decorators, particularly how they can be instrumental in building flexible and maintainable machine learning systems. These advanced patterns allow for configuration, state management, and seamless integration into larger frameworks.

Decorators Accepting Arguments

Often, you need a decorator that can be configured. For example, you might want a logging decorator where you can specify the log level, or a timing decorator where you set a threshold for triggering a warning. To achieve this, you need to create a decorator factory: a function that takes arguments and returns the actual decorator function.

Consider a scenario where you want to time the execution of critical functions in your ML pipeline (like feature computation or model prediction) and log a warning only if the execution time exceeds a certain limit.

import time
import functools
import logging

# Configure basic logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

def timing_threshold(threshold_seconds):
    """
    Decorator factory: returns a decorator that logs a warning
    if the wrapped function takes longer than 'threshold_seconds' to execute.
    """
    def decorator(func):
        @functools.wraps(func) # Preserve original function metadata
        def wrapper(*args, **kwargs):
            start_time = time.perf_counter()
            result = func(*args, **kwargs)
            end_time = time.perf_counter()
            duration = end_time - start_time
            if duration > threshold_seconds:
                logging.warning(
                    f"Function '{func.__name__}' took {duration:.4f} seconds, "
                    f"exceeding the threshold of {threshold_seconds} seconds."
                )
            else:
                 logging.info(
                    f"Function '{func.__name__}' executed in {duration:.4f} seconds."
                 )
            return result
        return wrapper
    return decorator

# Example Usage
@timing_threshold(0.5) # Apply the decorator with a 0.5 second threshold
def complex_feature_engineering(data):
    """Simulates a potentially time-consuming operation."""
    # Simulate work
    time.sleep(0.7)
    # In a real scenario, this would perform complex calculations
    processed_data = data * 2 # Placeholder operation
    return processed_data

# Call the decorated function
data_input = list(range(5))
processed = complex_feature_engineering(data_input)
# Output (will show a warning because 0.7 > 0.5):
# 2023-10-27 10:30:00,123 - WARNING - Function 'complex_feature_engineering' took 0.7005 seconds, exceeding the threshold of 0.5 seconds.

@timing_threshold(1.0) # Apply with a different threshold
def quick_data_loading(filepath):
    """Simulates a faster operation."""
    time.sleep(0.2)
    logging.info(f"Data loaded from {filepath}")
    return {"data": [1, 2, 3]}

loaded = quick_data_loading("path/to/data.csv")
# Output (will show info because 0.2 < 1.0):
# 2023-10-27 10:30:01,325 - INFO - Function 'quick_data_loading' executed in 0.2003 seconds.

In this pattern, timing_threshold(0.5) is called first. It returns the actual decorator function, which is then applied to complex_feature_engineering. The wrapper function inside the decorator contains the timing logic and uses the threshold_seconds value captured from the outer scope (a closure). Notice the use of functools.wraps(func). This is important for preserving the original function's name (__name__), docstring (__doc__), and other metadata, which is essential for debugging and introspection tools.

Stateful Decorators

Sometimes decorators need to maintain state between calls. Imagine needing to count how many times a specific prediction endpoint is hit or implementing a simple cache. While you could use global variables (generally discouraged), a much cleaner approach is to implement the decorator as a class.

When a class is used as a decorator, its __init__ method receives the function being decorated (similar to how a simple decorator function receives it). To make the decorated function callable, the class must implement the __call__ method. This method will be executed when the decorated function is invoked.

Here's an example of a stateful decorator that counts function calls:

import functools

class CallCounter:
    """
    A stateful decorator implemented as a class to count function calls.
    """
    def __init__(self, func):
        functools.update_wrapper(self, func) # Preserve metadata
        self.func = func
        self.call_count = 0

    def __call__(self, *args, **kwargs):
        self.call_count += 1
        print(f"Call {self.call_count} to function '{self.func.__name__}'")
        return self.func(*args, **kwargs)

@CallCounter
def predict_sentiment(text):
    """Analyzes text and returns a sentiment score."""
    # Simulate prediction
    score = len(text) / 100.0 # Placeholder logic
    print(f"  Predicting sentiment for: '{text[:20]}...' -> Score: {score:.2f}")
    return score

# Example Usage
predict_sentiment("This is a wonderful example!")
predict_sentiment("Another call to the same function.")
predict_sentiment("Metaprogramming is powerful.")

# Output:
# Call 1 to function 'predict_sentiment'
#   Predicting sentiment for: 'This is a wonderful ...' -> Score: 0.29
# Call 2 to function 'predict_sentiment'
#   Predicting sentiment for: 'Another call to the ...' -> Score: 0.33
# Call 3 to function 'predict_sentiment'
#   Predicting sentiment for: 'Metaprogramming is p...' -> Score: 0.27

Here, each instance of CallCounter maintains its own call_count. functools.update_wrapper is used similarly to functools.wraps but is more appropriate for class decorators to copy metadata from func to the decorator instance self.

Combining Decorators (Stacking)

You can apply multiple decorators to a single function. The order matters: decorators are applied from bottom to top (closest to the function definition first).

import functools

def log_args(func):
    """Decorator to log function arguments."""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        print(f"Calling {func.__name__} with args: {args}, kwargs: {kwargs}")
        return func(*args, **kwargs)
    return wrapper

def validate_input_shape(expected_dim):
    """Decorator factory for validating input array dimensions."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(data_array, *args, **kwargs):
            if hasattr(data_array, 'ndim') and data_array.ndim == expected_dim:
                 print(f"Input shape validation passed for {func.__name__}.")
                 return func(data_array, *args, **kwargs)
            else:
                 raise ValueError(
                     f"Function '{func.__name__}' expected input with "
                     f"{expected_dim} dimensions, got {getattr(data_array, 'ndim', 'N/A')}"
                 )
        return wrapper
    return decorator

# Example Usage
import numpy as np

@log_args                     # Applied second
@validate_input_shape(2)      # Applied first
def process_matrix(matrix):
    """Processes a 2D numpy array."""
    print(f"  Processing matrix of shape: {matrix.shape}")
    # Simulate processing
    return matrix.sum()

matrix_2d = np.array([[1, 2], [3, 4]])
matrix_1d = np.array([1, 2, 3])

print("Processing 2D matrix:")
process_matrix(matrix_2d)

print("\nProcessing 1D matrix (will raise error):")
try:
    process_matrix(matrix_1d)
except ValueError as e:
    print(f"Caught expected error: {e}")

# Output:
# Processing 2D matrix:
# Calling process_matrix with args: (array([[1, 2], [3, 4]]),), kwargs: {}
# Input shape validation passed for process_matrix.
#   Processing matrix of shape: (2, 2)
#
# Processing 1D matrix (will raise error):
# Calling process_matrix with args: (array([1, 2, 3]),), kwargs: {}
# Caught expected error: Function 'process_matrix' expected input with 2 dimensions, got 1

When process_matrix(matrix_2d) is called:

log_args's wrapper executes first. It prints the arguments.
It then calls the function it wraps, which is the wrapper returned by validate_input_shape(2).
validate_input_shape's wrapper executes. It checks the dimensions (matrix_2d.ndim is 2, which matches expected_dim).
It then calls the original process_matrix function.

Understanding this execution order is significant when decorators have side effects or depend on each other.

Built-in Decorators for ML: `functools.lru_cache`

Python's standard library provides useful decorators. One particularly relevant for ML is functools.lru_cache. It implements a memoization technique, caching the results of function calls and returning the cached result when the same inputs occur again. This is highly effective for expensive, pure functions (functions that always return the same output for the same input and have no side effects) often found in feature extraction or data lookup tasks.

import functools
import time
import requests # Example requires 'requests' library: pip install requests

@functools.lru_cache(maxsize=128) # Cache up to 128 unique calls
def get_external_data(resource_id):
    """
    Simulates fetching data from an external source (e.g., API, database).
    This operation is assumed to be slow.
    """
    print(f"Fetching data for resource_id: {resource_id}...")
    # Simulate network latency or expensive computation
    time.sleep(1.0)
    # In reality, you might use requests.get(f"https://api.example.com/data/{resource_id}")
    return {"id": resource_id, "value": resource_id * 10}

# First call - will be slow and print "Fetching..."
start = time.time()
data1 = get_external_data(101)
print(f"First call duration: {time.time() - start:.4f}s, Data: {data1}")

# Second call with the same argument - should be instantaneous (cached)
start = time.time()
data2 = get_external_data(101)
print(f"Second call duration: {time.time() - start:.4f}s, Data: {data2}")

# Call with a different argument - will be slow again
start = time.time()
data3 = get_external_data(202)
print(f"Third call duration: {time.time() - start:.4f}s, Data: {data3}")

# Output:
# Fetching data for resource_id: 101...
# First call duration: 1.0012s, Data: {'id': 101, 'value': 1010}
# Second call duration: 0.0000s, Data: {'id': 101, 'value': 1010}  <- Cached!
# Fetching data for resource_id: 202...
# Third call duration: 1.0008s, Data: {'id': 202, 'value': 2020}

lru_cache (Least Recently Used cache) automatically handles storing results based on function arguments (which must be hashable) and evicting the least recently used entries when the maxsize limit is reached.

Class Decorators

Less common than function decorators, class decorators modify or replace a class definition. They work similarly: a function receives the class object itself and returns a (potentially modified) class object.

Use cases include:

Automatically adding methods or attributes to a class.
Registering classes in a central registry (useful for plugin systems or factories).
Enforcing certain coding standards or structures on classes.

Here's an example of using a class decorator for registration:

MODEL_REGISTRY = {}

def register_model(cls):
    """Class decorator to register model classes."""
    model_name = cls.__name__
    if model_name in MODEL_REGISTRY:
        print(f"Warning: Overwriting existing model in registry: {model_name}")
    MODEL_REGISTRY[model_name] = cls
    print(f"Registered model: {model_name}")
    return cls # Return the original class unmodified

@register_model
class LogisticRegressionModel:
    def __init__(self, learning_rate=0.01):
        self.lr = learning_rate

    def fit(self, X, y):
        print(f"Fitting LogisticRegressionModel (lr={self.lr})...")
        # Actual fitting logic would go here
        pass

    def predict(self, X):
        print("Predicting with LogisticRegressionModel...")
        # Actual prediction logic
        return [0] * len(X) # Placeholder

@register_model
class SupportVectorMachineModel:
    def __init__(self, kernel='rbf'):
        self.kernel = kernel

    def fit(self, X, y):
        print(f"Fitting SupportVectorMachineModel (kernel={self.kernel})...")
        pass

    def predict(self, X):
        print("Predicting with SupportVectorMachineModel...")
        return [1] * len(X) # Placeholder

print("\nAvailable models in registry:", list(MODEL_REGISTRY.keys()))

# Instantiate a model from the registry
model_class = MODEL_REGISTRY['LogisticRegressionModel']
model_instance = model_class(learning_rate=0.05)
model_instance.fit(None, None) # Pass dummy data for example

# Output:
# Registered model: LogisticRegressionModel
# Registered model: SupportVectorMachineModel
#
# Available models in registry: ['LogisticRegressionModel', 'SupportVectorMachineModel']
# Fitting LogisticRegressionModel (lr=0.05)...

This pattern allows you to create extensible systems where new components (like models or data processors) can be added simply by defining them and applying the decorator.

Advanced decorator applications provide powerful tools for enhancing Python code in ML contexts. They allow for clean implementation of cross-cutting concerns like logging, validation, timing, caching, and registration, leading to more modular, reusable, and maintainable machine learning systems. As you progress through this chapter, you'll see how these techniques complement other metaprogramming features like descriptors and metaclasses to build even more sophisticated frameworks.

Was this section helpful?