Let's put the concepts of graph representation and traversal into practice. We'll use a small, illustrative graph and implement its structure and traversal algorithms in Python. This hands-on experience reinforces the trade-offs between different representations and the mechanics of BFS and DFS.
Imagine a small social network where users are connected if they are friends. We'll represent this network as an undirected graph.
Nodes: Alice, Bob, Charlie, David, Eve Edges:
Here's a diagram of our simple social network graph:
A simple undirected graph representing friendships between five individuals.
We'll implement this graph using both an adjacency list and an adjacency matrix.
The adjacency list is often preferred for sparse graphs (graphs with relatively few edges compared to the possible number of edges) because it's space-efficient. We can represent it using a Python dictionary where keys are nodes and values are lists of their neighbors.
# Adjacency List Representation
graph_adj_list = {
'Alice': ['Bob', 'Charlie'],
'Bob': ['Alice', 'David'],
'Charlie': ['Alice', 'David', 'Eve'],
'David': ['Bob', 'Charlie', 'Eve'],
'Eve': ['Charlie', 'David']
}
# Example Usage: Get neighbors of Charlie
print(f"Neighbors of Charlie: {graph_adj_list['Charlie']}")
# Example Usage: Check if Alice and David are connected
is_connected = 'David' in graph_adj_list['Alice'] # Requires checking neighbors
print(f"Are Alice and David directly connected? {is_connected}")
This representation makes it easy to find all neighbors of a given node (lookup time proportional to the number of neighbors). However, checking if an edge exists between two specific nodes might require searching the neighbor list of one node. Adding or removing edges is generally efficient.
The adjacency matrix uses a 2D array where the value at row i
, column j
is 1 (or the edge weight) if there's an edge between node i
and node j
, and 0 otherwise. This requires mapping node names to integer indices.
import numpy as np
nodes = ['Alice', 'Bob', 'Charlie', 'David', 'Eve']
node_to_index = {node: i for i, node in enumerate(nodes)}
num_nodes = len(nodes)
# Initialize an empty matrix
adj_matrix = np.zeros((num_nodes, num_nodes), dtype=int)
# Populate the matrix based on edges (symmetric for undirected graph)
edges = [
('Alice', 'Bob'), ('Alice', 'Charlie'), ('Bob', 'David'),
('Charlie', 'David'), ('Charlie', 'Eve'), ('David', 'Eve')
]
for u, v in edges:
u_idx, v_idx = node_to_index[u], node_to_index[v]
adj_matrix[u_idx, v_idx] = 1
adj_matrix[v_idx, u_idx] = 1 # Undirected graph
print("Adjacency Matrix:")
print(adj_matrix)
print(f"\nNode mapping: {node_to_index}")
# Example Usage: Check if Alice and David are connected
alice_idx = node_to_index['Alice']
david_idx = node_to_index['David']
is_connected_matrix = adj_matrix[alice_idx, david_idx] == 1
print(f"\nAre Alice and David directly connected (Matrix)? {is_connected_matrix}")
# Example Usage: Check if Alice and Bob are connected
bob_idx = node_to_index['Bob']
is_connected_matrix_ab = adj_matrix[alice_idx, bob_idx] == 1
print(f"Are Alice and Bob directly connected (Matrix)? {is_connected_matrix_ab}")
The adjacency matrix provides very fast O(1) edge checking between any two nodes. However, it requires O(V2) space, where V is the number of vertices, which can be inefficient for sparse graphs. Iterating over all neighbors of a node requires checking an entire row, taking O(V) time. Adding or removing a node is costly as it requires resizing the matrix.
Now, let's implement BFS and DFS using our adjacency list representation.
BFS explores the graph layer by layer. It uses a queue to keep track of nodes to visit.
from collections import deque
def bfs(graph, start_node):
"""Performs BFS on a graph represented by an adjacency list."""
if start_node not in graph:
print(f"Start node '{start_node}' not found in graph.")
return []
visited = set() # Keep track of visited nodes
queue = deque([start_node]) # Initialize queue with start node
visited.add(start_node)
traversal_order = [] # Store the order of visited nodes
while queue:
current_node = queue.popleft() # Dequeue a node
traversal_order.append(current_node)
# print(f"Visiting: {current_node}") # Optional: print visiting step
# Enqueue neighbors that haven't been visited
for neighbor in graph.get(current_node, []):
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return traversal_order
# Perform BFS starting from 'Alice'
bfs_path = bfs(graph_adj_list, 'Alice')
print(f"\nBFS Traversal starting from Alice: {bfs_path}")
# Perform BFS starting from 'David'
bfs_path_david = bfs(graph_adj_list, 'David')
print(f"BFS Traversal starting from David: {bfs_path_david}")
BFS is useful for finding the shortest path in terms of the number of edges in an unweighted graph. Notice how it explores nodes closer to the start node first ('Bob' and 'Charlie') before moving further away ('David', 'Eve').
DFS explores as far as possible along each branch before backtracking. It can be implemented recursively or iteratively using a stack. Here's a recursive implementation:
def dfs_recursive(graph, start_node, visited=None, traversal_order=None):
"""Performs DFS recursively on a graph represented by an adjacency list."""
if visited is None:
visited = set()
if traversal_order is None:
traversal_order = []
if start_node not in graph:
print(f"Start node '{start_node}' not found in graph.")
return []
visited.add(start_node)
traversal_order.append(start_node)
# print(f"Visiting: {start_node}") # Optional: print visiting step
for neighbor in graph.get(start_node, []):
if neighbor not in visited:
dfs_recursive(graph, neighbor, visited, traversal_order)
return traversal_order
# Perform DFS starting from 'Alice'
visited_set_dfs = set() # Need to manage visited set outside for subsequent calls if needed
dfs_path = dfs_recursive(graph_adj_list, 'Alice', visited_set_dfs)
print(f"\nDFS Traversal starting from Alice: {dfs_path}")
# Perform DFS starting from 'David'
visited_set_dfs_david = set()
dfs_path_david = dfs_recursive(graph_adj_list, 'David', visited_set_dfs_david)
print(f"DFS Traversal starting from David: {dfs_path_david}")
The DFS path can vary depending on the order neighbors are processed. In our example starting from 'Alice', it might go deep via 'Bob' to 'David' before exploring 'Charlie' and its branch towards 'Eve'. DFS is useful for tasks like cycle detection, topological sorting (for Directed Acyclic Graphs), and exploring connectivity.
NetworkX
provide pre-built graph structures and algorithms, saving implementation time and offering optimized performance. However, understanding the underlying implementations, as practiced here, is important for selecting and using these tools effectively.This practical exercise demonstrates how to represent graphs and implement fundamental traversal algorithms. These techniques form the basis for many graph-based machine learning applications, such as analyzing network structures, building recommendation engines, or performing community detection.
© 2025 ApX Machine Learning