As introduced earlier in the chapter, while high-level libraries provide powerful abstractions, building truly efficient and novel machine learning solutions often requires delving into the underlying algorithmic mechanics. Standard library functions might not always offer the optimal approach for a specific problem constraint, or you might need to implement a custom algorithm altogether. This is where understanding fundamental algorithm design paradigms like greedy algorithms and dynamic programming becomes highly valuable. These techniques provide structured ways to approach complex optimization problems frequently encountered in machine learning, potentially reducing computational complexity significantly, for instance, from exponential O(2n) time to polynomial time.
A greedy algorithm builds up a solution piece by piece, always choosing the option that offers the most obvious and immediate benefit at the current step. It makes a locally optimal choice hoping that this sequence of local optima leads to a globally optimal solution. This approach is conceptually simple and often computationally efficient.
For a greedy strategy to guarantee a global optimum, the problem must exhibit two properties:
Applications in ML:
Example: Simplified Sequential Forward Selection
Imagine selecting the best two features from a set {A,B,C,D} based on some model performance metric (e.g., accuracy).
This is greedy because at each step, we added the single best feature without backtracking or considering pairs like {A,C} initially.
import numpy as np
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression # Example classifier
from sklearn.model_selection import cross_val_score
# Assume X (n_samples, n_features), y (n_samples,) are available
# features = ['A', 'B', 'C', 'D'] # Corresponding names for columns in X
def evaluate_features(X_subset, y, model):
"""Evaluates feature subset using cross-validation."""
if X_subset.shape[1] == 0:
return 0.0 # No features, zero score
# Using cross_val_score for robustness
scores = cross_val_score(model, X_subset, y, cv=3, scoring='accuracy')
return np.mean(scores)
def sequential_forward_selection(X, y, k, model, feature_names):
"""Simple SFS implementation."""
num_features = X.shape[1]
selected_indices = []
remaining_indices = list(range(num_features))
current_score = 0.0
print(f"Target number of features: {k}\n")
for i in range(k):
best_new_score = -1.0
best_feature_index = -1
print(f"--- Selecting Feature {i+1} ---")
for idx in remaining_indices:
trial_indices = selected_indices + [idx]
X_trial = X[:, trial_indices]
score = evaluate_features(X_trial, y, model)
print(f" Trying feature '{feature_names[idx]}' (Indices: {trial_indices}): Score = {score:.4f}")
if score > best_new_score:
best_new_score = score
best_feature_index = idx
if best_feature_index != -1:
print(f"\nSelected feature '{feature_names[best_feature_index]}' with score {best_new_score:.4f}\n")
selected_indices.append(best_feature_index)
remaining_indices.remove(best_feature_index)
current_score = best_new_score
else:
print("No improvement found, stopping.")
break # No improvement possible
selected_feature_names = [feature_names[i] for i in selected_indices]
print(f"Final selected features ({len(selected_indices)}): {selected_feature_names}")
print(f"Final Score: {current_score:.4f}")
return selected_indices, selected_feature_names
# --- Example Usage (requires actual data X, y) ---
# feature_names = ['Feature1', 'Feature2', 'Feature3', 'Feature4', 'Feature5']
# X = np.random.rand(100, 5) # Placeholder data
# y = (X[:, 0] + X[:, 1] * 0.5 + np.random.randn(100) * 0.1 > 0.6).astype(int) # Placeholder labels
# model = LogisticRegression()
# k_features_to_select = 3
# selected_indices, selected_names = sequential_forward_selection(X, y, k_features_to_select, model, feature_names)
# print("\nSelected Indices:", selected_indices)
Limitations:
The main drawback of greedy algorithms is that they don't always yield the globally optimal solution. A locally optimal choice might lead down a path where the overall best solution becomes unreachable. For instance, in feature selection, adding the single best feature now might prevent the selection of a pair of features later that work exceptionally well together but are individually mediocre.
Dynamic Programming (DP) is a powerful technique for solving optimization problems by breaking them down into simpler, overlapping subproblems. It solves each subproblem only once and stores its solution, typically in a table (tabulation) or using memoization. When the same subproblem is encountered again, DP simply looks up the stored result instead of recomputing it.
DP is applicable when the problem exhibits:
This contrasts with divide-and-conquer algorithms (like Merge Sort), where subproblems are typically independent and don't overlap significantly.
Applications in ML:
Example: Levenshtein Distance (Edit Distance)
Let's find the edit distance between "PYTHON" and "PYTHONS". We can define D(i,j) as the edit distance between the first i characters of string 1 and the first j characters of string 2. The recurrence relation is:
D(i,j)=min⎩⎨⎧D(i−1,j)+1D(i,j−1)+1D(i−1,j−1)+cost(str1[i],str2[j])(deletion from str1)(insertion into str1)(match/substitution)where cost(c1,c2) is 0 if c1==c2 and 1 otherwise. The base cases are D(i,0)=i and D(0,j)=j.
We can compute this using tabulation (a bottom-up approach building a table).
import numpy as np
def levenshtein_distance(s1, s2):
"""Calculates Levenshtein distance using DP (Tabulation)."""
m, n = len(s1), len(s2)
# Create a DP table (m+1 x n+1)
dp_table = np.zeros((m + 1, n + 1), dtype=int)
# Initialize base cases
for i in range(m + 1):
dp_table[i, 0] = i # Cost of deleting i chars from s1 to get empty string
for j in range(n + 1):
dp_table[0, j] = j # Cost of inserting j chars into empty string to get s2[:j]
# Fill the table using the recurrence relation
for i in range(1, m + 1):
for j in range(1, n + 1):
cost = 0 if s1[i - 1] == s2[j - 1] else 1 # Cost is 0 for match, 1 for substitution
deletion_cost = dp_table[i - 1, j] + 1
insertion_cost = dp_table[i, j - 1] + 1
substitution_cost = dp_table[i - 1, j - 1] + cost
dp_table[i, j] = min(deletion_cost, insertion_cost, substitution_cost)
# The final distance is in the bottom-right cell
print("DP Table:")
print(dp_table)
return dp_table[m, n]
# Example
string1 = "PYTHON"
string2 = "PYTHONS"
distance = levenshtein_distance(string1, string2)
print(f"\nLevenshtein distance between '{string1}' and '{string2}': {distance}")
# Expected output: 1 (one insertion: 'S')
string3 = "KITTEN"
string4 = "SITTING"
distance2 = levenshtein_distance(string3, string4)
print(f"\nLevenshtein distance between '{string3}' and '{string4}': {distance2}")
# Expected output: 3 (substitute K->S, substitute E->I, insert G)
The DP table shows the minimum edit distance between prefixes of the two strings.
dp_table[i, j]
holds the distance betweens1[:i]
ands2[:j]
.
Alternatively, DP problems can be solved using memoization (a top-down recursive approach with caching). A dictionary or array stores results of subproblems as they are computed. If a function call encounters a subproblem already solved, it returns the cached result.
def levenshtein_memoization(s1, s2):
"""Calculates Levenshtein distance using DP (Memoization)."""
memo = {}
def compute_dist(i, j):
if (i, j) in memo:
return memo[(i, j)]
# Base cases
if i == 0:
return j # Cost of inserting j chars
if j == 0:
return i # Cost of deleting i chars
# Recursive step
cost = 0 if s1[i - 1] == s2[j - 1] else 1
deletion = compute_dist(i - 1, j) + 1
insertion = compute_dist(i, j - 1) + 1
substitution = compute_dist(i - 1, j - 1) + cost
result = min(deletion, insertion, substitution)
memo[(i, j)] = result
return result
return compute_dist(len(s1), len(s2))
# Example using memoization
distance_memo = levenshtein_memoization(string1, string2)
print(f"\nMemoization result for '{string1}' and '{string2}': {distance_memo}")
distance2_memo = levenshtein_memoization(string3, string4)
print(f"Memoization result for '{string3}' and '{string4}': {distance2_memo}")
Memoization can sometimes be more intuitive to implement if you think recursively, while tabulation avoids recursion depth limits and can be slightly more efficient due to lower overhead in some Python implementations.
Understanding these paradigms is not just about implementing textbook algorithms. It's about recognizing the structure of optimization problems you encounter in ML. Can the problem be broken down? Do subproblems overlap? Can local choices lead to a global solution? Asking these questions helps you design more efficient custom components, optimize existing processes (like feature engineering or hyperparameter tuning strategies), and better understand the trade-offs involved in different algorithmic approaches within machine learning libraries. They provide a powerful mental toolkit for tackling computational challenges beyond relying solely on off-the-shelf functions.
© 2025 ApX Machine Learning