Now that we've reviewed and explored several powerful Python features beyond the basics, it's time to put them into practice. The following exercises are designed to solidify your understanding of list comprehensions, generators, advanced function arguments, decorators, context managers, basic object-oriented programming, and error handling in contexts relevant to data preparation and analysis. Working through these examples will help you integrate these techniques into your own data science workflows.
You have a list of dictionaries, where each dictionary represents a sensor reading. Each reading has a 'sensor_id', 'timestamp', and 'value'. Your task is to create a new list containing only the readings from 'sensor_id' S1
where the 'value' is greater than 50. Use a list comprehension for this.
Data:
sensor_data = [
{'sensor_id': 'S1', 'timestamp': 1678886400, 'value': 45.6},
{'sensor_id': 'S2', 'timestamp': 1678886401, 'value': 60.1},
{'sensor_id': 'S1', 'timestamp': 1678886402, 'value': 55.9},
{'sensor_id': 'S3', 'timestamp': 1678886403, 'value': 32.0},
{'sensor_id': 'S1', 'timestamp': 1678886404, 'value': 62.3},
{'sensor_id': 'S2', 'timestamp': 1678886405, 'value': 58.7},
{'sensor_id': 'S1', 'timestamp': 1678886406, 'value': 49.8},
]
Solution:
# Using a list comprehension to filter the data
filtered_readings = [
reading for reading in sensor_data
if reading['sensor_id'] == 'S1' and reading['value'] > 50
]
# Print the result to verify
print(filtered_readings)
# Expected Output:
# [{'sensor_id': 'S1', 'timestamp': 1678886402, 'value': 55.9},
# {'sensor_id': 'S1', 'timestamp': 1678886404, 'value': 62.3}]
This solution elegantly filters the list in a single, readable line. List comprehensions are often more performant and Pythonic for such tasks compared to traditional for
loops with if
conditions and append
.
Imagine you have a very large log file where each line represents an event. Processing the entire file at once might consume too much memory. Write a generator function get_error_lines(filepath)
that takes a file path, reads the file line by line, and yields only the lines containing the word "ERROR".
Hint: Use a yield
statement inside the loop that iterates through the file lines.
Solution:
import os # Used for creating a dummy file
# Create a dummy log file for demonstration
log_content = """
INFO: Process started
DEBUG: Connection established
ERROR: Failed to read record 102
INFO: Processing record 103
WARN: Disk space low
ERROR: Timeout connecting to service X
INFO: Process finished
"""
dummy_filepath = 'sample.log'
with open(dummy_filepath, 'w') as f:
f.write(log_content)
# Generator function
def get_error_lines(filepath):
"""
Reads a file line by line and yields lines containing 'ERROR'.
"""
try:
with open(filepath, 'r') as f:
for line in f:
if "ERROR" in line:
yield line.strip() # strip() removes leading/trailing whitespace
except FileNotFoundError:
print(f"Error: File not found at {filepath}")
# Optionally, yield nothing or raise an exception
# Using the generator
error_lines_generator = get_error_lines(dummy_filepath)
print("Error lines found:")
for error_line in error_lines_generator:
print(error_line)
# Clean up the dummy file
os.remove(dummy_filepath)
# Expected Output:
# Error lines found:
# ERROR: Failed to read record 102
# ERROR: Timeout connecting to service X
Generators are ideal here because they process the file lazily. Only one line is held in memory at a time during iteration, making this approach suitable for massive files. The try-except
block also demonstrates basic error handling for the case where the file doesn't exist.
Write a function aggregate_data(agg_func, *args)
that takes an aggregation function (agg_func
, e.g., sum
, max
, min
) and a variable number of numerical arguments (*args
). The function should apply the aggregation function to the arguments and return the result. Handle the case where no numerical arguments are provided.
Solution:
def aggregate_data(agg_func, *args):
"""
Applies an aggregation function to a variable number of arguments.
Args:
agg_func: The function to apply (e.g., sum, max, min).
*args: A variable number of numerical arguments.
Returns:
The result of the aggregation, or None if no arguments are provided.
"""
if not args:
print("Warning: No data provided for aggregation.")
return None
# Check if all arguments are numbers (int or float)
if not all(isinstance(arg, (int, float)) for arg in args):
print("Error: All arguments must be numerical.")
return None
try:
return agg_func(args)
except Exception as e:
print(f"Error during aggregation: {e}")
return None
# Example Usage
numbers = [10, 5, 25, 15, 8]
total = aggregate_data(sum, *numbers)
maximum = aggregate_data(max, *numbers)
minimum = aggregate_data(min, 1, 2, 3, 0.5) # Directly passing arguments
no_data = aggregate_data(sum)
mixed_data = aggregate_data(sum, 10, 'a', 30)
print(f"Sum: {total}")
print(f"Max: {maximum}")
print(f"Min: {minimum}")
print(f"No Data Result: {no_data}")
print(f"Mixed Data Result: {mixed_data}")
# Expected Output:
# Sum: 63
# Max: 25
# Min: 0.5
# Warning: No data provided for aggregation.
# No Data Result: None
# Error: All arguments must be numerical.
# Mixed Data Result: None
This function uses *args
to accept any number of positional arguments, making it flexible. It includes checks for empty input and non-numerical types, demonstrating basic validation and error feedback.
Write a decorator time_it
that measures the execution time of any function it wraps and prints the duration. Apply this decorator to a function that simulates a data processing task (e.g., creating a large list).
Solution:
import time
def time_it(func):
"""
A decorator that prints the execution time of the wrapped function.
"""
def wrapper(*args, **kwargs):
start_time = time.perf_counter() # More precise than time.time()
result = func(*args, **kwargs)
end_time = time.perf_counter()
duration = end_time - start_time
print(f"Function '{func.__name__}' executed in {duration:.4f} seconds")
return result
return wrapper
@time_it
def simulate_data_processing(n_records):
"""
Simulates processing data by creating a list of squares.
"""
print(f"Processing {n_records} records...")
processed_data = [i*i for i in range(n_records)]
# Simulate some more work
time.sleep(0.1)
print("Processing complete.")
return len(processed_data) # Return the count of processed items
# Example Usage
num_records = 1_000_000
count = simulate_data_processing(num_records)
print(f"Processed {count} items.")
# Example Output (exact time will vary):
# Processing 1000000 records...
# Processing complete.
# Function 'simulate_data_processing' executed in 0.1578 seconds
# Processed 1000000 items.
Decorators provide a clean way to add cross-cutting concerns like logging, timing, or access control to functions without modifying their core logic. The time_it
decorator intercepts the function call, records the time before and after, prints the difference, and then returns the original function's result.
Create a simple context manager class TempFileHandler
that creates a temporary file upon entering the with
block and automatically deletes it upon exiting, even if errors occur within the block.
Hint: Implement the __enter__
and __exit__
special methods. __enter__
should create and return the file object (or path), and __exit__
should handle cleanup.
Solution:
import os
import tempfile
class TempFileHandler:
"""
A context manager for creating and automatically deleting a temporary file.
"""
def __init__(self, mode='w+t', suffix='.tmp', prefix='my_temp_'):
self.mode = mode
self.suffix = suffix
self.prefix = prefix
self.temp_file = None
self.temp_filepath = ""
def __enter__(self):
# Create a named temporary file
self.temp_file = tempfile.NamedTemporaryFile(
mode=self.mode,
suffix=self.suffix,
prefix=self.prefix,
delete=False # We handle deletion in __exit__
)
self.temp_filepath = self.temp_file.name
print(f"Entering context: Created temporary file '{self.temp_filepath}'")
return self.temp_file # Return the file object to be used in the 'with' block
def __exit__(self, exc_type, exc_val, exc_tb):
# This method is called upon exiting the 'with' block
print(f"Exiting context for '{self.temp_filepath}'...")
if self.temp_file:
self.temp_file.close()
try:
os.remove(self.temp_filepath)
print(f"Successfully deleted temporary file '{self.temp_filepath}'")
except OSError as e:
print(f"Error deleting temporary file '{self.temp_filepath}': {e}")
# If an exception occurred within the 'with' block, exc_type, exc_val,
# and exc_tb will contain information about it.
if exc_type:
print(f"An exception of type {exc_type.__name__} occurred: {exc_val}")
# Return False (or nothing) to propagate the exception,
# return True to suppress it. We'll let it propagate.
return False
return True # No exception occurred or we handled it
# Example Usage 1: Successful operation
print("--- Example 1: Successful file operation ---")
try:
with TempFileHandler(suffix='.csv') as tmp_f:
print(f"File object inside 'with': {tmp_f}")
tmp_f.write("header1,header2\n")
tmp_f.write("value1,value2\n")
# File is still open and exists here
print(f"File exists during 'with' block: {os.path.exists(tmp_f.name)}")
# Outside the 'with' block:
print("After 'with' block.")
# Check if file exists (it shouldn't)
print(f"File exists after 'with' block: {os.path.exists(tmp_f.name)}") # Accessing tmp_f.name is okay
except Exception as e:
print(f"Caught unexpected error: {e}")
print("\n--- Example 2: Operation with an error ---")
try:
with TempFileHandler(suffix='.log') as tmp_f:
filepath = tmp_f.name # Store filepath before potential error
print(f"File object inside 'with': {tmp_f}")
tmp_f.write("Log entry 1\n")
# Simulate an error
result = 10 / 0
tmp_f.write("This won't be written\n")
except ZeroDivisionError:
print("Caught expected ZeroDivisionError.")
# Check if file exists (it should have been deleted by __exit__)
print(f"File exists after error in 'with' block: {os.path.exists(filepath)}")
except Exception as e:
print(f"Caught unexpected error: {e}")
# Expected Output:
# --- Example 1: Successful file operation ---
# Entering context: Created temporary file '.../my_temp_....csv'
# File object inside 'with': <_io.TextIOWrapper name='.../my_temp_....csv' mode='w+t' encoding='...'>
# File exists during 'with' block: True
# Exiting context for '.../my_temp_....csv'...
# Successfully deleted temporary file '.../my_temp_....csv'
# After 'with' block.
# File exists after 'with' block: False
#
# --- Example 2: Operation with an error ---
# Entering context: Created temporary file '.../my_temp_....log'
# File object inside 'with': <_io.TextIOWrapper name='.../my_temp_....log' mode='w+t' encoding='...'>
# Exiting context for '.../my_temp_....log'...
# An exception of type ZeroDivisionError occurred: division by zero
# Successfully deleted temporary file '.../my_temp_....log'
# Caught expected ZeroDivisionError.
# File exists after error in 'with' block: False
Context managers guarantee that cleanup code (like closing files, releasing locks, or deleting temporary resources) runs reliably, making your code more robust. The __exit__
method receives details about any exception that occurred, allowing for conditional cleanup or error logging.
These exercises provide practical scenarios where the advanced Python constructs discussed in this chapter are beneficial. As you encounter data loading, cleaning, and transformation tasks, consider how these techniques can make your code more efficient, readable, and maintainable.
© 2025 ApX Machine Learning