All Courses

Attribute Access Customization (getattr, getattribute)

While descriptors offer fine-grained control over specific attributes, Python provides even more fundamental hooks into the attribute access mechanism itself: the special methods __getattr__ and __getattribute__. These allow you to define custom behavior for how any attribute lookup is handled on an instance, offering a powerful, albeit potentially complex, tool for building dynamic and responsive machine learning components.

Understanding how Python normally finds attributes is important first. When you access obj.x, Python typically checks:

If x is a data descriptor (like a property) on the class of obj or its parent classes.
If x exists in obj.__dict__ (the instance's own attributes).
If x is a non-data descriptor or other attribute found in the class of obj or its parent classes (following the Method Resolution Order, or MRO).

If this standard lookup fails, Python makes one last attempt by calling the __getattr__ method, if defined. If the lookup succeeds at any point before this final step, __getattr__ is not called.

Handling Missing Attributes with `getattr`

The __getattr__(self, name) method is invoked only when an attribute lookup fails through the usual channels. Its purpose is to provide a fallback mechanism, allowing you to compute or retrieve an attribute dynamically when it's not found directly on the instance or its class hierarchy.

Syntax:

def __getattr__(self, name):
    # 'name' is the string name of the attribute being accessed
    # Logic to compute or retrieve the value
    # Must return the value or raise AttributeError
    if name == 'some_dynamic_attribute':
        # Calculate or fetch the value
        value = ...
        return value
    # Important: Raise AttributeError for unhandled names
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")

Use Cases in Machine Learning:

Dynamic Feature Generation: Imagine a data object where you want to access transformed versions of features on the fly without pre-computing them all. __getattr__ can intercept requests for specific transformations.

import math
import pandas as pd
import numpy as np

class DynamicFeatures:
    def __init__(self, data_frame):
        # Use object.__setattr__ to avoid triggering our own __setattr__ if defined
        object.__setattr__(self, '_data', data_frame.copy())

    def __getattr__(self, name):
        if name.startswith('log_'):
            original_feature = name[4:] # Strip 'log_' prefix
            if original_feature in self._data.columns:
                print(f"Dynamically computing log of {original_feature}")
                # Compute and return the log transform dynamically
                # Ensure positive values for log, handle errors as needed
                series = self._data[original_feature]
                # Use numpy log for potentially better performance and handling
                log_transformed = np.log(series.astype(float).clip(lower=1e-9)) # Clip to avoid log(0)
                return log_transformed
            else:
                raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}' (original feature '{original_feature}' not found)")
        elif name.startswith('squared_'):
             original_feature = name[8:]
             if original_feature in self._data.columns:
                 print(f"Dynamically computing square of {original_feature}")
                 return self._data[original_feature] ** 2
             else:
                raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}' (original feature '{original_feature}' not found)")

        # Important: If not handled, raise AttributeError
        raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")

    # To allow access to internal '_data' without triggering __getattr__ infinitely
    # we relied on object.__setattr__ in __init__. Accessing self._data inside
    # __getattr__ is safe because '_data' exists.

# Example usage
df = pd.DataFrame({'feature_a': [1, 10, 100], 'feature_b': [-2, 20, 200]})
dynamic_df = DynamicFeatures(df)

print("Accessing log_feature_a:")
print(dynamic_df.log_feature_a)

print("\nAccessing squared_feature_b:")
print(dynamic_df.squared_feature_b)

try:
    print("\nAccessing non_existent_feature:")
    print(dynamic_df.non_existent_feature)
except AttributeError as e:
    print(e)

Accessing log_feature_a:
Dynamically computing log of feature_a
0    0.000000
1    2.302585
2    4.605170
Name: feature_a, dtype: float64

Accessing squared_feature_b:
Dynamically computing square of feature_b
0        4
1      400
2    40000
Name: feature_b, dtype: int64

Accessing non_existent_feature:
'DynamicFeatures' object has no attribute 'non_existent_feature'

Lazy Loading of Resources: In ML workflows, you might deal with large models or datasets that are expensive to load into memory. __getattr__ allows you to defer loading until the resource is actually needed.

import time
# Assume joblib exists for a more realistic example stub
# import joblib

class LazyModelLoader:
    def __init__(self, model_path_dict):
        # Store paths safely using object.__setattr__ or renaming to avoid conflicts
        object.__setattr__(self, '_model_paths', model_path_dict)
        object.__setattr__(self, '_loaded_models', {}) # Cache for loaded models

    def __getattr__(self, name):
        # Check if it's a known model name we can load
        if name in self._model_paths:
            # Check if it's already loaded (in our cache)
            if name not in self._loaded_models:
                print(f"Lazy loading model '{name}' from {self._model_paths[name]}...")
                # Actual model loading logic would go here
                # e.g., self._loaded_models[name] = joblib.load(self._model_paths[name])
                time.sleep(0.5) # Simulate loading time
                self._loaded_models[name] = f"Loaded Model: {name.upper()}" # Placeholder
            # Return the loaded model from cache
            return self._loaded_models[name]

        # If the name is not a loadable model path, raise AttributeError
        raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")

    # Allow direct access to internal state if needed, handled by default __getattribute__
    # or could be explicitly handled here if __getattribute__ was overridden.

# Example usage
loader = LazyModelLoader({
    'classifier': '/path/to/classifier.pkl',
    'regressor': '/path/to/regressor.pkl'
})

# Model not loaded yet
print("Accessing classifier...")
model_c = loader.classifier # Triggers __getattr__, loads the model
print(model_c)

print("\nAccessing classifier again...")
model_c_again = loader.classifier # Accesses cached version, loading message not shown
print(model_c_again)

print("\nAccessing regressor...")
model_r = loader.regressor # Triggers __getattr__, loads the model
print(model_r)

Accessing classifier...
Lazy loading model 'classifier' from /path/to/classifier.pkl...
Loaded Model: CLASSIFIER

Accessing classifier again...
Loaded Model: CLASSIFIER

Accessing regressor...
Lazy loading model 'regressor' from /path/to/regressor.pkl...
Loaded Model: REGRESSOR

A critical point when implementing __getattr__ is to avoid causing infinite recursion. If your __getattr__ implementation tries to access an attribute on self that doesn't exist (using standard self.attribute_name syntax), it will trigger __getattr__ again, leading to a loop. Always ensure your __getattr__ either computes the value directly, accesses known existing attributes carefully (like self._data or self._loaded_models in the examples, which are set during initialization), or raises AttributeError.

Intercepting All Access with `getattribute`

Unlike __getattr__, the __getattribute__(self, name) method is far more invasive. It's called for every attribute lookup on the instance, whether the attribute exists or not. This provides a powerful mechanism to intercept and potentially alter any attribute access.

Syntax:

def __getattribute__(self, name):
    # 'name' is the string name of the attribute being accessed
    # !!! Must be implemented very carefully to avoid infinite recursion !!!

    # Safely access the original attribute value
    # Option 1: Use super() (preferred in cooperative inheritance)
    # value = super().__getattribute__(name)
    # Option 2: Use object's __getattribute__ directly
    # value = object.__getattribute__(self, name)

    print(f"Intercepting access to: {name}")
    try:
        value = object.__getattribute__(self, name) # Use object's method to prevent recursion
        # Perform actions before returning (logging, modification, etc.)
        return value
    except AttributeError:
        # Handle cases where the attribute genuinely doesn't exist
        raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")

Because __getattribute__ intercepts all access, including access to __dict__ or other methods needed for the lookup itself, it's extremely easy to create infinite recursion if you're not careful. The most common way to safely fetch the actual attribute value from within __getattribute__ is to use super().__getattribute__(name) or object.__getattribute__(self, name). Never use self.any_attribute inside __getattribute__ to fetch any_attribute, as this will call __getattribute__ again recursively.

Use Cases in Machine Learning:

Access Logging and Monitoring: You could track access to sensitive configuration parameters or model weights, perhaps for auditing or debugging complex interactions.

import datetime

class MonitoredConfig:
    def __init__(self, params):
        # Use object.__setattr__ to bypass our __getattribute__ during initialization
        object.__setattr__(self, '_params', params)
        object.__setattr__(self, '_access_log', [])

    def __getattribute__(self, name):
        # Need to bypass interception for our internal attributes!
        if name in ('_params', '_access_log', 'get_log'): # Also allow method access
             return object.__getattribute__(self, name)

        timestamp = datetime.datetime.now().isoformat()
        print(f"LOG: Accessing '{name}' at {timestamp}")

        # Safely retrieve the log and append
        log = object.__getattribute__(self, '_access_log')
        log.append((name, timestamp))

        # Safely get the actual value from the internal dict
        params = object.__getattribute__(self, '_params')
        if name in params:
            return params[name]
        else:
            # If not a known param, try default attribute lookup (e.g., for methods)
            # which we already handled for 'get_log' above.
            # For other non-param, non-method names, raise error.
            raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")

    def get_log(self):
        # Access log directly using internal name bypass
        return self._access_log


config = MonitoredConfig({'learning_rate': 0.01, 'optimizer': 'Adam', 'epochs': 100})

lr = config.learning_rate # Triggers __getattribute__
opt = config.optimizer    # Triggers __getattribute__

print(f"Configured LR: {lr}, Optimizer: {opt}")
print("\nAccess Log:")
for entry in config.get_log(): # Accessing get_log bypasses logging
    print(entry)

LOG: Accessing 'learning_rate' at 2023-10-27T10:30:00.123456
LOG: Accessing 'optimizer' at 2023-10-27T10:30:00.123500
Configured LR: 0.01, Optimizer: Adam

Access Log:
('learning_rate', '2023-10-27T10:30:00.123456')
('optimizer', '2023-10-27T10:30:00.123500')

Creating Transparent Proxies: __getattribute__ is essential for creating proxy objects that forward requests to another object, potentially adding behavior like validation, caching, or unit conversion without the user knowing they are interacting with a proxy. This can be useful for wrapping complex model objects or data sources.

class DataValidationProxy:
    def __init__(self, target_object):
        object.__setattr__(self, '_target', target_object)

    def __getattribute__(self, name):
        # Bypass for internal '_target'
        if name == '_target':
            return object.__getattribute__(self, name)

        print(f"Proxy: Intercepting access to '{name}'")
        # Safely get the target object
        target = object.__getattribute__(self, '_target')

        # Get the attribute value from the target
        value = getattr(target, name) # Use standard getattr on the target

        # Example Validation: Check if numeric data is within a range
        if name == 'sensor_reading' and isinstance(value, (int, float)):
            if not (0 <= value <= 100):
                print(f"Proxy: WARNING - sensor_reading {value} out of expected range [0, 100]")
        return value

    def __setattr__(self, name, value):
         # Also intercept setting for validation
         if name == '_target':
             object.__setattr__(self, name, value)
             return

         print(f"Proxy: Intercepting setting '{name}' to {value}")
         target = object.__getattribute__(self, '_target')

         # Example Validation: Ensure label is within known set
         if name == 'predicted_label' and value not in ['cat', 'dog', 'other']:
             raise ValueError(f"Invalid label '{value}'. Must be 'cat', 'dog', or 'other'.")

         setattr(target, name, value) # Set on the actual target


class RawModelOutput:
    def __init__(self):
        self.sensor_reading = 105.5 # Out of range example
        self.predicted_label = None
        self.confidence = 0.95

raw_output = RawModelOutput()
validated_output = DataValidationProxy(raw_output)

# Accessing via proxy triggers validation check
reading = validated_output.sensor_reading
print(f"Reading obtained: {reading}")

# Setting via proxy triggers validation check
try:
    validated_output.predicted_label = 'cat' # Valid
    print(f"Label set to: {validated_output.predicted_label}")
    validated_output.predicted_label = 'bird' # Invalid
except ValueError as e:
    print(e)

# Check original object state
print(f"Original object label: {raw_output.predicted_label}")

Proxy: Intercepting access to 'sensor_reading'
Proxy: WARNING - sensor_reading 105.5 out of expected range [0, 100]
Reading obtained: 105.5
Proxy: Intercepting setting 'predicted_label' to cat
Proxy: Intercepting access to 'predicted_label'
Label set to: cat
Proxy: Intercepting setting 'predicted_label' to bird
Invalid label 'bird'. Must be 'cat', 'dog', or 'other'.
Original object label: cat

Choosing Between `getattr` and `getattribute`

The choice depends entirely on your goal:

Use __getattr__ when you need to:
- Provide default values for missing attributes.
- Compute attributes dynamically only when they are requested and not found through normal means.
- Implement lazy loading of resources.
- It's generally safer and less prone to accidental infinite recursion.
Use __getattribute__ (with extreme caution) when you need to:
- Intercept every attribute access (read) for purposes like comprehensive logging, security checks, or transparent modification/validation before returning the value.
- Implement a proxy pattern where the proxy must handle all interactions directed at the proxied object.
- Remember the absolute necessity of using super().__getattribute__(name) or object.__getattribute__(self, name) within its implementation to fetch attribute values safely.

Controlling Attribute Assignment and Deletion

Complementing __getattr__ and __getattribute__ (which primarily handle attribute reading) are __setattr__(self, name, value) and __delattr__(self, name).

__setattr__(self, name, value): Called whenever attribute assignment is attempted (e.g., obj.x = 10). Like __getattribute__, it intercepts all assignments. You must use object.__setattr__(self, name, value) or super().__setattr__(name, value) within its implementation to actually store the value, preventing infinite recursion. This is commonly used for input validation (as seen in the proxy example), type checking, or triggering side effects when an attribute changes (e.g., marking a configuration object as 'dirty').
__delattr__(self, name): Called when attribute deletion is attempted (e.g., del obj.x). Similarly, requires careful implementation using object.__delattr__(self, name) or super().__delattr__(name) to avoid recursion. It allows you to intercept deletion, perhaps preventing deletion of critical attributes or performing cleanup actions.

These four methods (__getattr__, __getattribute__, __setattr__, __delattr__) form the core of Python's customizable attribute access control.

Example: Validating and Managing Model Hyperparameters

Let's refine the HyperparameterConfig idea to combine __setattr__ for validation and __getattr__ for potentially accessing derived or default values, while using __getattribute__ to manage access clearly.

import math

class HyperparameterConfig:
    # Define allowed hyperparameters and their validation rules
    _allowed_params = {
        'learning_rate': lambda x: isinstance(x, (float, int)) and 0 < x < 1,
        'epochs': lambda x: isinstance(x, int) and x > 0,
        'batch_size': lambda x: isinstance(x, int) and x > 0,
        'optimizer': lambda x: x in ['Adam', 'SGD', 'RMSprop']
    }
    # Define default values
    _defaults = {
        'learning_rate': 0.001,
        'epochs': 10,
        'batch_size': 32,
        'optimizer': 'Adam'
    }

    def __init__(self, **kwargs):
        # Use object.__setattr__ for internal state initialization
        object.__setattr__(self, '_params', {})
        object.__setattr__(self, '_dataset_size', None) # Example of another managed attr

        # Apply defaults
        for key, value in self._defaults.items():
            self._params[key] = value

        # Override defaults with provided kwargs, using our validation logic
        for key, value in kwargs.items():
            self.__setattr__(key, value) # Calls our custom __setattr__

    def __setattr__(self, name, value):
        # Allow setting internal attributes directly
        if name in ('_params', '_dataset_size'):
            object.__setattr__(self, name, value)
            return

        if name in self._allowed_params:
            validator = self._allowed_params[name]
            if not validator(value):
                raise ValueError(f"Invalid value '{value}' for hyperparameter '{name}'")
            # Validation passed, store in internal dict
            self._params[name] = value
        else:
            # Handle attempts to set unknown hyperparameters
            raise AttributeError(f"'{name}' is not a recognized hyperparameter. Allowed: {list(self._allowed_params.keys())}")

    def __getattr__(self, name):
        # Example: Calculate total iterations if dataset_size is known
        if name == 'total_iterations':
            if self._dataset_size is not None and self._params.get('batch_size', 0) > 0:
                iters_per_epoch = math.ceil(self._dataset_size / self._params['batch_size'])
                return self._params.get('epochs', 0) * iters_per_epoch
            else:
                raise AttributeError("Cannot compute 'total_iterations'. Set 'dataset_size' first.")

        # If the attribute wasn't found by __getattribute__ and isn't dynamically generated here, it truly doesn't exist
        raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")

    def __getattribute__(self, name):
        # Prioritize internal attributes
        if name in ('_params', '_dataset_size', '_allowed_params', '_defaults'):
            return object.__getattribute__(self, name)

        # Check if it's a known hyperparameter stored in _params
        params = object.__getattribute__(self, '_params')
        if name in params:
            return params[name]

        # Try standard attribute lookup (for methods like __init__, __setattr__, etc.)
        # Or for dynamically generated attributes via __getattr__
        try:
            return object.__getattribute__(self, name)
        except AttributeError:
            # If standard lookup fails, explicitly call __getattr__ as the final fallback
            # This structure ensures __getattr__ is only called when needed.
            return self.__getattr__(name)


# Example Usage
config = HyperparameterConfig(learning_rate=0.05, optimizer='SGD')

print(f"LR: {config.learning_rate}, Optimizer: {config.optimizer}, Default Epochs: {config.epochs}")

config.epochs = 50 # Valid set using __setattr__
print(f"Set Epochs: {config.epochs}")

try:
    config.batch_size = -10 # Invalid set using __setattr__
except ValueError as e:
    print(e)

try:
    config.dropout_rate = 0.5 # Unknown hyperparameter
except AttributeError as e:
    print(e)

# Access dynamic attribute
config._dataset_size = 50000 # Set internal attribute (bypassing __setattr__)
print(f"Total Iterations: {config.total_iterations}") # Calls __getattr__ via __getattribute__ fallback

LR: 0.05, Optimizer: SGD, Default Epochs: 10
Set Epochs: 50
Invalid value '-10' for hyperparameter 'batch_size'
'dropout_rate' is not a recognized hyperparameter. Allowed: ['learning_rate', 'epochs', 'batch_size', 'optimizer']
Total Iterations: 78150

Customizing attribute access with __getattr__, __getattribute__, __setattr__, and __delattr__ provides a deep level of control over object behavior in Python. While powerful, especially for framework development, dynamic configuration management, validation layers, and lazy resource handling in machine learning contexts, these methods require careful, considered implementation. They demand a clear understanding of the standard attribute lookup process and meticulous handling of internal state access to avoid infinite recursion, particularly when using the highly interceptive __getattribute__ and __setattr__. Mastering these techniques enables the creation of flexible, sophisticated components essential for advanced ML systems.

Was this section helpful?

Attribute Access Customization (__getattr__, __getattribute__)

Handling Missing Attributes with __getattr__

Intercepting All Access with __getattribute__