Once we have represented our relational data as a graph, a common task is to systematically explore it. We might want to find all entities reachable from a starting point, discover the shortest connection between two entities, or simply understand the structure of a component within the graph. Graph traversal algorithms provide the mechanisms for visiting nodes in an orderly fashion. Breadth-First Search (BFS) is one of the fundamental traversal algorithms.
Imagine dropping a pebble into a calm pond. Ripples spread out uniformly in concentric circles. BFS explores a graph in a similar way: it starts at a chosen source node and explores the neighbor nodes first, before moving to the next level neighbors. It explores the graph layer by layer.
BFS operates by maintaining a queue of nodes to visit and a set of nodes that have already been visited to avoid cycles and redundant work.
Here’s how it typically works:
s
to start the traversal.s
.s
as visited.u
from the front of the queue.u
(e.g., print its value, check if it's the target node).v
of u
:
v
has not been visited:
v
as visited.v
.This process ensures that nodes are visited in increasing order of their distance (number of edges) from the source node. This property makes BFS suitable for finding the shortest path in an unweighted graph.
Let's trace BFS on a simple graph, starting from node 'A'. We'll use an adjacency list representation: A: [B, D], B: [A, C], C: [B, E], D: [A, E], E: [C, D].
A simple undirected graph. Node 'F' is included to show it won't be visited if starting from 'A'.
Steps (Starting at A):
queue = [A]
, visited = {A}
.queue = [B]
, visited = {A, B}
.queue = [B, D]
, visited = {A, B, D}
.queue = [D, C]
, visited = {A, B, D, C}
.queue = [C, E]
, visited = {A, B, D, C, E}
.queue = [E]
, visited = {A, B, D, C, E}
.queue = []
, visited = {A, B, D, C, E}
.The order of nodes visited (processed) is A, B, D, C, E. Notice how all nodes at distance 1 from A (B, D) were visited before nodes at distance 2 (C, E). Node F was never visited because it's not reachable from A.
Here's a more formal pseudocode representation:
BFS(graph, start_node):
let Q be a queue
let visited be a set
add start_node to Q
add start_node to visited
while Q is not empty:
current_node = Q.dequeue()
// Process current_node (e.g., print, check condition)
process(current_node)
for neighbor in graph.get_neighbors(current_node):
if neighbor is not in visited:
add neighbor to visited
add neighbor to Q
Using Python's collections.deque
for an efficient queue implementation and a dictionary for the adjacency list is common:
from collections import deque
def bfs(graph, start_node):
"""
Performs Breadth-First Search on a graph.
Args:
graph (dict): Adjacency list representation of the graph
(e.g., {'A': ['B', 'D'], ...}).
start_node: The node to start the traversal from.
Returns:
list: A list of nodes in the order they were visited.
Returns an empty list if start_node is not in graph.
"""
if start_node not in graph:
return [] # Start node must be in the graph
visited = set()
queue = deque([start_node])
visited.add(start_node)
visited_order = []
while queue:
current_node = queue.popleft() # Dequeue from the left
visited_order.append(current_node)
# Process node here if needed (e.g., print(current_node))
# Add neighbors to the queue
# Use graph.get(current_node, []) to handle nodes with no outgoing edges
for neighbor in graph.get(current_node, []):
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor) # Enqueue to the right
return visited_order
# Example usage with the graph from the visualization:
graph_example = {
'A': ['B', 'D'],
'B': ['A', 'C'],
'C': ['B', 'E'],
'D': ['A', 'E'],
'E': ['C', 'D'],
'F': [] # Node F is not connected to the main component
}
start = 'A'
traversal_path = bfs(graph_example, start)
print(f"BFS traversal starting from {start}: {traversal_path}")
# Expected Output: BFS traversal starting from A: ['A', 'B', 'D', 'C', 'E']
start_unconnected = 'F'
traversal_unconnected = bfs(graph_example, start_unconnected)
print(f"BFS traversal starting from {start_unconnected}: {traversal_unconnected}")
# Expected Output: BFS traversal starting from F: ['F']
Understanding the performance of BFS is important for choosing the right algorithm.
Time Complexity: BFS visits each node and each edge exactly once in the worst case (assuming an adjacency list representation).
u
, we iterate through its neighbors. Over the entire execution, the total number of neighbor checks corresponds to the sum of the degrees of all visited nodes. For a directed graph, this is the number of outgoing edges from visited nodes. For an undirected graph, each edge (u,v) is considered twice (once from u and once from v). In either case, the total work related to edges is proportional to the number of edges E in the connected component being explored.Space Complexity: The space required is dominated by the storage for the visited
set and the queue
.
visited
set can store up to V nodes.queue
might need to hold all nodes at a particular level. For a very dense graph or a star graph, this could be close to O(V) nodes.While not always directly part of a model's core training loop like gradient descent, BFS is a valuable tool in the ML practitioner's toolkit for graph-related tasks:
BFS provides a systematic, level-by-level exploration of graph structures. Its ability to find shortest paths in unweighted graphs and its predictable O(V+E) performance make it a cornerstone algorithm for analyzing relational data represented as graphs. We will next explore Depth-First Search (DFS), which uses a different strategy to traverse graphs.
© 2025 ApX Machine Learning