While descriptors offer fine-grained control over specific attributes, Python provides even more fundamental hooks into the attribute access mechanism itself: the special methods __getattr__
and __getattribute__
. These allow you to define custom behavior for how any attribute lookup is handled on an instance, offering a powerful, albeit potentially complex, tool for building dynamic and responsive machine learning components.
Understanding how Python normally finds attributes is important first. When you access obj.x
, Python typically checks:
x
is a data descriptor (like a property) on the class of obj
or its parent classes.x
exists in obj.__dict__
(the instance's own attributes).x
is a non-data descriptor or other attribute found in the class of obj
or its parent classes (following the Method Resolution Order, or MRO).If this standard lookup fails, Python makes one last attempt by calling the __getattr__
method, if defined. If the lookup succeeds at any point before this final step, __getattr__
is not called.
__getattr__
The __getattr__(self, name)
method is invoked only when an attribute lookup fails through the usual channels. Its purpose is to provide a fallback mechanism, allowing you to compute or retrieve an attribute dynamically when it's not found directly on the instance or its class hierarchy.
Syntax:
def __getattr__(self, name):
# 'name' is the string name of the attribute being accessed
# Logic to compute or retrieve the value
# Must return the value or raise AttributeError
if name == 'some_dynamic_attribute':
# Calculate or fetch the value
value = ...
return value
# Important: Raise AttributeError for unhandled names
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
Use Cases in Machine Learning:
Dynamic Feature Generation: Imagine a data object where you want to access transformed versions of features on the fly without pre-computing them all. __getattr__
can intercept requests for specific transformations.
import math
import pandas as pd
import numpy as np
class DynamicFeatures:
def __init__(self, data_frame):
# Use object.__setattr__ to avoid triggering our own __setattr__ if defined
object.__setattr__(self, '_data', data_frame.copy())
def __getattr__(self, name):
if name.startswith('log_'):
original_feature = name[4:] # Strip 'log_' prefix
if original_feature in self._data.columns:
print(f"Dynamically computing log of {original_feature}")
# Compute and return the log transform dynamically
# Ensure positive values for log, handle errors as needed
series = self._data[original_feature]
# Use numpy log for potentially better performance and handling
log_transformed = np.log(series.astype(float).clip(lower=1e-9)) # Clip to avoid log(0)
return log_transformed
else:
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}' (original feature '{original_feature}' not found)")
elif name.startswith('squared_'):
original_feature = name[8:]
if original_feature in self._data.columns:
print(f"Dynamically computing square of {original_feature}")
return self._data[original_feature] ** 2
else:
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}' (original feature '{original_feature}' not found)")
# Important: If not handled, raise AttributeError
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
# To allow access to internal '_data' without triggering __getattr__ infinitely
# we relied on object.__setattr__ in __init__. Accessing self._data inside
# __getattr__ is safe because '_data' exists.
# Example usage
df = pd.DataFrame({'feature_a': [1, 10, 100], 'feature_b': [-2, 20, 200]})
dynamic_df = DynamicFeatures(df)
print("Accessing log_feature_a:")
print(dynamic_df.log_feature_a)
print("\nAccessing squared_feature_b:")
print(dynamic_df.squared_feature_b)
try:
print("\nAccessing non_existent_feature:")
print(dynamic_df.non_existent_feature)
except AttributeError as e:
print(e)
Accessing log_feature_a:
Dynamically computing log of feature_a
0 0.000000
1 2.302585
2 4.605170
Name: feature_a, dtype: float64
Accessing squared_feature_b:
Dynamically computing square of feature_b
0 4
1 400
2 40000
Name: feature_b, dtype: int64
Accessing non_existent_feature:
'DynamicFeatures' object has no attribute 'non_existent_feature'
Lazy Loading of Resources: In ML workflows, you might deal with large models or datasets that are expensive to load into memory. __getattr__
allows you to defer loading until the resource is actually needed.
import time
# Assume joblib exists for a more realistic example stub
# import joblib
class LazyModelLoader:
def __init__(self, model_path_dict):
# Store paths safely using object.__setattr__ or renaming to avoid conflicts
object.__setattr__(self, '_model_paths', model_path_dict)
object.__setattr__(self, '_loaded_models', {}) # Cache for loaded models
def __getattr__(self, name):
# Check if it's a known model name we can load
if name in self._model_paths:
# Check if it's already loaded (in our cache)
if name not in self._loaded_models:
print(f"Lazy loading model '{name}' from {self._model_paths[name]}...")
# Actual model loading logic would go here
# e.g., self._loaded_models[name] = joblib.load(self._model_paths[name])
time.sleep(0.5) # Simulate loading time
self._loaded_models[name] = f"Loaded Model: {name.upper()}" # Placeholder
# Return the loaded model from cache
return self._loaded_models[name]
# If the name is not a loadable model path, raise AttributeError
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
# Allow direct access to internal state if needed, handled by default __getattribute__
# or could be explicitly handled here if __getattribute__ was overridden.
# Example usage
loader = LazyModelLoader({
'classifier': '/path/to/classifier.pkl',
'regressor': '/path/to/regressor.pkl'
})
# Model not loaded yet
print("Accessing classifier...")
model_c = loader.classifier # Triggers __getattr__, loads the model
print(model_c)
print("\nAccessing classifier again...")
model_c_again = loader.classifier # Accesses cached version, loading message not shown
print(model_c_again)
print("\nAccessing regressor...")
model_r = loader.regressor # Triggers __getattr__, loads the model
print(model_r)
Accessing classifier...
Lazy loading model 'classifier' from /path/to/classifier.pkl...
Loaded Model: CLASSIFIER
Accessing classifier again...
Loaded Model: CLASSIFIER
Accessing regressor...
Lazy loading model 'regressor' from /path/to/regressor.pkl...
Loaded Model: REGRESSOR
A critical point when implementing __getattr__
is to avoid causing infinite recursion. If your __getattr__
implementation tries to access an attribute on self
that doesn't exist (using standard self.attribute_name
syntax), it will trigger __getattr__
again, leading to a loop. Always ensure your __getattr__
either computes the value directly, accesses known existing attributes carefully (like self._data
or self._loaded_models
in the examples, which are set during initialization), or raises AttributeError
.
__getattribute__
Unlike __getattr__
, the __getattribute__(self, name)
method is far more invasive. It's called for every attribute lookup on the instance, whether the attribute exists or not. This provides a powerful mechanism to intercept and potentially alter any attribute access.
Syntax:
def __getattribute__(self, name):
# 'name' is the string name of the attribute being accessed
# !!! Must be implemented very carefully to avoid infinite recursion !!!
# Safely access the original attribute value
# Option 1: Use super() (preferred in cooperative inheritance)
# value = super().__getattribute__(name)
# Option 2: Use object's __getattribute__ directly
# value = object.__getattribute__(self, name)
print(f"Intercepting access to: {name}")
try:
value = object.__getattribute__(self, name) # Use object's method to prevent recursion
# Perform actions before returning (logging, modification, etc.)
return value
except AttributeError:
# Handle cases where the attribute genuinely doesn't exist
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
Because __getattribute__
intercepts all access, including access to __dict__
or other methods needed for the lookup itself, it's extremely easy to create infinite recursion if you're not careful. The most common way to safely fetch the actual attribute value from within __getattribute__
is to use super().__getattribute__(name)
or object.__getattribute__(self, name)
. Never use self.any_attribute
inside __getattribute__
to fetch any_attribute
, as this will call __getattribute__
again recursively.
Use Cases in Machine Learning:
Access Logging and Monitoring: You could track access to sensitive configuration parameters or model weights, perhaps for auditing or debugging complex interactions.
import datetime
class MonitoredConfig:
def __init__(self, params):
# Use object.__setattr__ to bypass our __getattribute__ during initialization
object.__setattr__(self, '_params', params)
object.__setattr__(self, '_access_log', [])
def __getattribute__(self, name):
# Need to bypass interception for our internal attributes!
if name in ('_params', '_access_log', 'get_log'): # Also allow method access
return object.__getattribute__(self, name)
timestamp = datetime.datetime.now().isoformat()
print(f"LOG: Accessing '{name}' at {timestamp}")
# Safely retrieve the log and append
log = object.__getattribute__(self, '_access_log')
log.append((name, timestamp))
# Safely get the actual value from the internal dict
params = object.__getattribute__(self, '_params')
if name in params:
return params[name]
else:
# If not a known param, try default attribute lookup (e.g., for methods)
# which we already handled for 'get_log' above.
# For other non-param, non-method names, raise error.
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
def get_log(self):
# Access log directly using internal name bypass
return self._access_log
config = MonitoredConfig({'learning_rate': 0.01, 'optimizer': 'Adam', 'epochs': 100})
lr = config.learning_rate # Triggers __getattribute__
opt = config.optimizer # Triggers __getattribute__
print(f"Configured LR: {lr}, Optimizer: {opt}")
print("\nAccess Log:")
for entry in config.get_log(): # Accessing get_log bypasses logging
print(entry)
LOG: Accessing 'learning_rate' at 2023-10-27T10:30:00.123456
LOG: Accessing 'optimizer' at 2023-10-27T10:30:00.123500
Configured LR: 0.01, Optimizer: Adam
Access Log:
('learning_rate', '2023-10-27T10:30:00.123456')
('optimizer', '2023-10-27T10:30:00.123500')
Creating Transparent Proxies: __getattribute__
is essential for creating proxy objects that forward requests to another object, potentially adding behavior like validation, caching, or unit conversion without the user knowing they are interacting with a proxy. This can be useful for wrapping complex model objects or data sources.
class DataValidationProxy:
def __init__(self, target_object):
object.__setattr__(self, '_target', target_object)
def __getattribute__(self, name):
# Bypass for internal '_target'
if name == '_target':
return object.__getattribute__(self, name)
print(f"Proxy: Intercepting access to '{name}'")
# Safely get the target object
target = object.__getattribute__(self, '_target')
# Get the attribute value from the target
value = getattr(target, name) # Use standard getattr on the target
# Example Validation: Check if numeric data is within a range
if name == 'sensor_reading' and isinstance(value, (int, float)):
if not (0 <= value <= 100):
print(f"Proxy: WARNING - sensor_reading {value} out of expected range [0, 100]")
return value
def __setattr__(self, name, value):
# Also intercept setting for validation
if name == '_target':
object.__setattr__(self, name, value)
return
print(f"Proxy: Intercepting setting '{name}' to {value}")
target = object.__getattribute__(self, '_target')
# Example Validation: Ensure label is within known set
if name == 'predicted_label' and value not in ['cat', 'dog', 'other']:
raise ValueError(f"Invalid label '{value}'. Must be 'cat', 'dog', or 'other'.")
setattr(target, name, value) # Set on the actual target
class RawModelOutput:
def __init__(self):
self.sensor_reading = 105.5 # Out of range example
self.predicted_label = None
self.confidence = 0.95
raw_output = RawModelOutput()
validated_output = DataValidationProxy(raw_output)
# Accessing via proxy triggers validation check
reading = validated_output.sensor_reading
print(f"Reading obtained: {reading}")
# Setting via proxy triggers validation check
try:
validated_output.predicted_label = 'cat' # Valid
print(f"Label set to: {validated_output.predicted_label}")
validated_output.predicted_label = 'bird' # Invalid
except ValueError as e:
print(e)
# Check original object state
print(f"Original object label: {raw_output.predicted_label}")
Proxy: Intercepting access to 'sensor_reading'
Proxy: WARNING - sensor_reading 105.5 out of expected range [0, 100]
Reading obtained: 105.5
Proxy: Intercepting setting 'predicted_label' to cat
Proxy: Intercepting access to 'predicted_label'
Label set to: cat
Proxy: Intercepting setting 'predicted_label' to bird
Invalid label 'bird'. Must be 'cat', 'dog', or 'other'.
Original object label: cat
__getattr__
and __getattribute__
The choice depends entirely on your goal:
Use __getattr__
when you need to:
Use __getattribute__
(with extreme caution) when you need to:
super().__getattribute__(name)
or object.__getattribute__(self, name)
within its implementation to fetch attribute values safely.Complementing __getattr__
and __getattribute__
(which primarily handle attribute reading) are __setattr__(self, name, value)
and __delattr__(self, name)
.
__setattr__(self, name, value)
: Called whenever attribute assignment is attempted (e.g., obj.x = 10
). Like __getattribute__
, it intercepts all assignments. You must use object.__setattr__(self, name, value)
or super().__setattr__(name, value)
within its implementation to actually store the value, preventing infinite recursion. This is commonly used for input validation (as seen in the proxy example), type checking, or triggering side effects when an attribute changes (e.g., marking a configuration object as 'dirty').
__delattr__(self, name)
: Called when attribute deletion is attempted (e.g., del obj.x
). Similarly, requires careful implementation using object.__delattr__(self, name)
or super().__delattr__(name)
to avoid recursion. It allows you to intercept deletion, perhaps preventing deletion of critical attributes or performing cleanup actions.
These four methods (__getattr__
, __getattribute__
, __setattr__
, __delattr__
) form the core of Python's customizable attribute access control.
Let's refine the HyperparameterConfig
idea to combine __setattr__
for validation and __getattr__
for potentially accessing derived or default values, while using __getattribute__
to manage access clearly.
import math
class HyperparameterConfig:
# Define allowed hyperparameters and their validation rules
_allowed_params = {
'learning_rate': lambda x: isinstance(x, (float, int)) and 0 < x < 1,
'epochs': lambda x: isinstance(x, int) and x > 0,
'batch_size': lambda x: isinstance(x, int) and x > 0,
'optimizer': lambda x: x in ['Adam', 'SGD', 'RMSprop']
}
# Define default values
_defaults = {
'learning_rate': 0.001,
'epochs': 10,
'batch_size': 32,
'optimizer': 'Adam'
}
def __init__(self, **kwargs):
# Use object.__setattr__ for internal state initialization
object.__setattr__(self, '_params', {})
object.__setattr__(self, '_dataset_size', None) # Example of another managed attr
# Apply defaults
for key, value in self._defaults.items():
self._params[key] = value
# Override defaults with provided kwargs, using our validation logic
for key, value in kwargs.items():
self.__setattr__(key, value) # Calls our custom __setattr__
def __setattr__(self, name, value):
# Allow setting internal attributes directly
if name in ('_params', '_dataset_size'):
object.__setattr__(self, name, value)
return
if name in self._allowed_params:
validator = self._allowed_params[name]
if not validator(value):
raise ValueError(f"Invalid value '{value}' for hyperparameter '{name}'")
# Validation passed, store in internal dict
self._params[name] = value
else:
# Handle attempts to set unknown hyperparameters
raise AttributeError(f"'{name}' is not a recognized hyperparameter. Allowed: {list(self._allowed_params.keys())}")
def __getattr__(self, name):
# Example: Calculate total iterations if dataset_size is known
if name == 'total_iterations':
if self._dataset_size is not None and self._params.get('batch_size', 0) > 0:
iters_per_epoch = math.ceil(self._dataset_size / self._params['batch_size'])
return self._params.get('epochs', 0) * iters_per_epoch
else:
raise AttributeError("Cannot compute 'total_iterations'. Set 'dataset_size' first.")
# If the attribute wasn't found by __getattribute__ and isn't dynamically generated here, it truly doesn't exist
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
def __getattribute__(self, name):
# Prioritize internal attributes
if name in ('_params', '_dataset_size', '_allowed_params', '_defaults'):
return object.__getattribute__(self, name)
# Check if it's a known hyperparameter stored in _params
params = object.__getattribute__(self, '_params')
if name in params:
return params[name]
# Try standard attribute lookup (for methods like __init__, __setattr__, etc.)
# Or for dynamically generated attributes via __getattr__
try:
return object.__getattribute__(self, name)
except AttributeError:
# If standard lookup fails, explicitly call __getattr__ as the final fallback
# This structure ensures __getattr__ is only called when needed.
return self.__getattr__(name)
# Example Usage
config = HyperparameterConfig(learning_rate=0.05, optimizer='SGD')
print(f"LR: {config.learning_rate}, Optimizer: {config.optimizer}, Default Epochs: {config.epochs}")
config.epochs = 50 # Valid set using __setattr__
print(f"Set Epochs: {config.epochs}")
try:
config.batch_size = -10 # Invalid set using __setattr__
except ValueError as e:
print(e)
try:
config.dropout_rate = 0.5 # Unknown hyperparameter
except AttributeError as e:
print(e)
# Access dynamic attribute
config._dataset_size = 50000 # Set internal attribute (bypassing __setattr__)
print(f"Total Iterations: {config.total_iterations}") # Calls __getattr__ via __getattribute__ fallback
LR: 0.05, Optimizer: SGD, Default Epochs: 10
Set Epochs: 50
Invalid value '-10' for hyperparameter 'batch_size'
'dropout_rate' is not a recognized hyperparameter. Allowed: ['learning_rate', 'epochs', 'batch_size', 'optimizer']
Total Iterations: 78150
Customizing attribute access with __getattr__
, __getattribute__
, __setattr__
, and __delattr__
provides a deep level of control over object behavior in Python. While powerful, especially for framework development, dynamic configuration management, validation layers, and lazy resource handling in machine learning contexts, these methods require careful, considered implementation. They demand a clear understanding of the standard attribute lookup process and meticulous handling of internal state access to avoid infinite recursion, particularly when using the highly interceptive __getattribute__
and __setattr__
. Mastering these techniques enables the creation of highly flexible, robust, and sophisticated components essential for advanced ML systems.
© 2025 ApX Machine Learning