As mentioned in the chapter introduction, metaprogramming grants us powerful tools to control program structure and behavior dynamically. Among these, descriptors offer a mechanism for managing attribute access in a controlled, reusable way. Instead of relying solely on standard attribute lookup, Python allows us to define objects that implement specific methods to intercept and customize what happens when an attribute is accessed, set, or deleted.
At its core, a descriptor is any object defining one or more of the methods __get__
, __set__
, or __delete__
. When an instance of a class (the owner class) has an attribute that is itself an instance of a descriptor class, Python's attribute access mechanism automatically invokes these special methods. This allows you to embed logic directly into attribute access itself, moving beyond simple value storage.
The behavior of descriptors is governed by three special methods:
__get__(self, instance, owner)
:
self
: The descriptor instance itself.instance
: The instance through which the attribute was accessed. If accessed via the class itself (e.g., MyClass.attribute
), instance
will be None
.owner
: The owner class (e.g., MyClass
).AttributeError
.__set__(self, instance, value)
:
self
: The descriptor instance.instance
: The instance on which the attribute is being set.value
: The value being assigned to the attribute.__delete__(self, instance)
:
del
.self
: The descriptor instance.instance
: The instance from which the attribute is being deleted.It's important to understand that descriptor instances are typically created at the class level. They are attributes of the owner class, not directly of the instances of that class, although they manage instance-specific data.
Let's illustrate this with a simple example: a descriptor that ensures a model hyperparameter is always a positive float.
import logging
logging.basicConfig(level=logging.INFO)
class PositiveFloat:
"""Descriptor ensuring an attribute is a positive float."""
def __init__(self, name):
# Store the internal name used in the instance's __dict__
self.private_name = '_' + name
def __get__(self, instance, owner):
if instance is None:
# Accessed via the class, return the descriptor itself
return self
# Retrieve the value from the instance's dictionary
return getattr(instance, self.private_name, None)
def __set__(self, instance, value):
try:
# Attempt conversion and validation
val_float = float(value)
if val_float <= 0:
raise ValueError(f"Value must be positive, got {val_float}")
# Store the validated value in the instance's dictionary
setattr(instance, self.private_name, val_float)
logging.info(f"Set {self.private_name.lstrip('_')} to {val_float} on {instance}")
except (ValueError, TypeError) as e:
# Handle conversion errors or validation failure
raise TypeError(f"Value must be a positive float. {e}") from e
# Example Usage in an ML Model Configuration Class
class ModelConfig:
learning_rate = PositiveFloat('learning_rate')
regularization_strength = PositiveFloat('regularization_strength')
def __init__(self, lr, reg_strength):
# The __set__ method of the descriptor is invoked here
self.learning_rate = lr
self.regularization_strength = reg_strength
def __repr__(self):
return (f"ModelConfig(lr={self.learning_rate}, "
f"reg_strength={self.regularization_strength})")
# --- Try it out ---
config = ModelConfig(lr=0.01, reg_strength=0.005)
print(config) # Output: ModelConfig(lr=0.01, reg_strength=0.005)
# Accessing the attribute invokes __get__
print(f"Current Learning Rate: {config.learning_rate}") # Output: Current Learning Rate: 0.01
# Setting the attribute invokes __set__
config.learning_rate = 0.008
print(config) # Output: ModelConfig(lr=0.008, reg_strength=0.005)
# Attempting to set an invalid value raises an error via __set__
try:
config.regularization_strength = -0.1
except TypeError as e:
print(f"Error setting regularization: {e}")
# Output: Error setting regularization: Value must be a positive float. Value must be positive, got -0.1
try:
config.learning_rate = "invalid"
except TypeError as e:
print(f"Error setting learning rate: {e}")
# Output: Error setting learning rate: Value must be a positive float. could not convert string to float: 'invalid'
# Accessing via the class returns the descriptor instance
print(ModelConfig.learning_rate) # Output: <__main__.PositiveFloat object at 0x...>
In this example, PositiveFloat
is the descriptor class. When we assign learning_rate = PositiveFloat('learning_rate')
in ModelConfig
, we make learning_rate
a descriptor instance associated with the ModelConfig
class. Assignments like config.learning_rate = 0.01
trigger PositiveFloat.__set__
, and accesses like config.learning_rate
trigger PositiveFloat.__get__
. Notice how the descriptor uses the instance's __dict__
(via getattr
and setattr
with a private name like _learning_rate
) to store the actual data for each ModelConfig
instance.
The presence or absence of the __set__
method fundamentally changes how a descriptor interacts with instance attributes:
__get__
and __set__
(and optionally __delete__
). Data descriptors have higher precedence in the attribute lookup order. If an instance has both a data descriptor and an entry in its __dict__
with the same name, the data descriptor takes priority. This is why our PositiveFloat
descriptor reliably intercepts assignments.__get__
. Non-data descriptors have lower precedence than instance __dict__
entries. If an instance dictionary has an entry with the same name, that entry will shadow the descriptor. Methods are a common example of non-data descriptors (they implement __get__
to bind self
).Understanding this distinction is important when designing how your attributes should behave, especially regarding whether instance-level assignments should override the descriptor's logic.
Descriptors move beyond simple syntax sugar; they enable sophisticated patterns useful in ML contexts:
Hyperparameter Validation: As shown above, descriptors are ideal for enforcing constraints on hyperparameters (types, ranges, allowed values) directly at the point of assignment, making configurations more reliable. You could create descriptors for learning rates, layer sizes, activation function names, etc.
Lazy Loading of Resources: ML often involves large objects like datasets, embeddings, or pre-trained model weights. Loading these eagerly can consume significant memory and time. A descriptor can defer loading until the attribute is first accessed.
import time
class LazyLoader:
"""Descriptor to load a resource only when first accessed."""
def __init__(self, name, load_func):
self.private_name = '_' + name
self.load_func = load_func
self.loaded = False
def __get__(self, instance, owner):
if instance is None:
return self
value = getattr(instance, self.private_name, None)
if not self.loaded:
print(f"Loading resource for '{self.private_name.lstrip('_')}'...")
start_time = time.time()
value = self.load_func() # Execute the loading function
setattr(instance, self.private_name, value)
self.loaded = True # Mark as loaded (for this descriptor instance)
end_time = time.time()
print(f"...loaded in {end_time - start_time:.2f} seconds.")
return value
# Example: Simulate loading large embeddings
def load_word_embeddings():
# Simulate a time-consuming load operation
time.sleep(2)
return {"word1": [0.1, 0.2], "word2": [0.3, 0.4]}
class ModelPipeline:
embeddings = LazyLoader('embeddings', load_word_embeddings)
def process(self, text):
# Accessing self.embeddings triggers __get__ and loading (if not already loaded)
print(f"Accessing embeddings to process: {text}")
emb_vector = self.embeddings.get(text, [0.0, 0.0])
print(f"Processed '{text}' using vector {emb_vector}")
# Further processing...
pipeline = ModelPipeline()
print("Pipeline initialized.")
# Embeddings are not loaded yet.
pipeline.process("word1")
# Output:
# Pipeline initialized.
# Accessing embeddings to process: word1
# Loading resource for 'embeddings'...
# ...loaded in 2.00 seconds.
# Processed 'word1' using vector [0.1, 0.2]
pipeline.process("word2")
# Output:
# Accessing embeddings to process: word2
# Processed 'word2' using vector [0.3, 0.4]
# (No loading message this time)
Managed Attributes & Side Effects: Descriptors can trigger actions when an attribute is set or accessed. For instance, changing a model parameter could automatically invalidate a cached prediction or log the modification event.
Interfacing with External Systems: A descriptor could manage communication with a feature store, automatically fetching the latest feature values when an attribute is accessed or pushing updates when set.
property()
You might recognize some of this behavior from Python's built-in property()
function. Indeed, property
is a high-level, convenient way to create descriptors, primarily for managing getters, setters, and deleters for a single attribute within a class definition.
class SimpleConfig:
def __init__(self, initial_value):
self._value = initial_value
@property
def value(self):
"""Getter for value."""
print("Getting value")
return self._value
@value.setter
def value(self, new_value):
"""Setter for value with validation."""
print(f"Setting value to {new_value}")
if not isinstance(new_value, (int, float)):
raise TypeError("Value must be numeric")
self._value = new_value
@value.deleter
def value(self):
"""Deleter for value."""
print("Deleting value")
del self._value
# Usage:
conf = SimpleConfig(10)
print(conf.value) # Calls the getter
conf.value = 20 # Calls the setter
try:
conf.value = "bad" # Calls setter, raises TypeError
except TypeError as e:
print(e)
del conf.value # Calls the deleter
Internally, property(fget, fset, fdel)
creates a data descriptor instance. While property
is excellent for simple attribute management within a single class, creating a full descriptor class offers more power and reusability:
PositiveFloat
) and reuse it for multiple attributes across different classes.Descriptors provide a powerful abstraction for controlling attribute access. By intercepting get, set, and delete operations, they allow for the implementation of validation, lazy loading, logging, and other cross-cutting concerns in a clean, reusable manner. This capability is particularly advantageous when building complex ML components, configurations, or frameworks where controlled access and behavior associated with attributes are necessary.
© 2025 ApX Machine Learning