While understanding the properties and operations of heaps is important, implementing them efficiently from scratch can be complex. Fortunately, Python's standard library provides the heapq
module, offering a highly optimized implementation of the heap queue algorithm, also known as the priority queue algorithm.
The heapq
module works directly with standard Python lists, treating them as min-heaps. This means the smallest element is always at the root (index 0). Let's explore its primary functions.
To use the heapq
functions, you first import the module:
import heapq
You then operate on a standard Python list.
heapq.heappush
To add an element to the heap while maintaining the heap property, use heapq.heappush(heap, item)
. The function modifies the list in-place.
# Start with an empty list (our heap)
min_heap = []
# Push items onto the heap
heapq.heappush(min_heap, 50)
heapq.heappush(min_heap, 20)
heapq.heappush(min_heap, 80)
heapq.heappush(min_heap, 10)
heapq.heappush(min_heap, 30)
print(f"Heap after pushes: {min_heap}")
# Expected output (order might vary slightly depending on implementation details,
# but 10 will be the first element):
# Heap after pushes: [10, 20, 80, 50, 30]
Internally, heappush
adds the element to the end of the list and then "sifts it up" to its correct position to maintain the min-heap property. This operation typically takes O(logn) time, where n is the number of elements in the heap.
A diagram representing the min-heap
[10, 20, 80, 50, 30]
. The root (10) is the smallest element.
heapq.heappop
To remove and return the smallest element (the root) from the heap, use heapq.heappop(heap)
. This also modifies the list in-place and maintains the heap property.
# Continuing from the previous example
smallest_item = heapq.heappop(min_heap)
print(f"Popped item: {smallest_item}")
print(f"Heap after pop: {min_heap}")
# Expected output:
# Popped item: 10
# Heap after pop: [20, 30, 80, 50]
heappop
replaces the root with the last element in the list and then "sifts it down" to restore the heap property. Like heappush
, this operation typically takes O(logn) time.
The heap structure after popping the smallest element (10). 20 is now the root.
Since heapq
uses a list where the first element heap[0]
is always the smallest, you can peek at the minimum value without removing it simply by accessing the element at index 0. Remember to check if the heap is empty first.
if min_heap:
print(f"Smallest item currently in heap: {min_heap[0]}")
else:
print("Heap is empty.")
# Expected output:
# Smallest item currently in heap: 20
heapq.heapify
If you already have a list of items and want to turn it into a valid heap, you could repeatedly call heappush
. However, a more efficient method is heapq.heapify(list)
. This function rearranges the list elements in-place to satisfy the heap property.
data = [50, 20, 80, 10, 30, 5, 90, 45]
print(f"Original list: {data}")
heapq.heapify(data) # Transform list into a heap in-place
print(f"Heapified list: {data}")
print(f"Smallest item: {data[0]}")
# Expected output (heapified list order might vary, smallest is first):
# Original list: [50, 20, 80, 10, 30, 5, 90, 45]
# Heapified list: [5, 10, 80, 20, 30, 50, 90, 45] # Example output
# Smallest item: 5
The advantage of heapify
is its time complexity. It runs in linear time, O(n), which is generally faster than inserting n elements one by one using heappush
, which would take O(nlogn) time in total.
heapq
also provides functions that combine push and pop operations efficiently.
heapq.heappushpop(heap, item)
: Pushes item
onto the heap
and then pops and returns the smallest item. It's more efficient than calling heappush
followed by heappop
separately. This is useful for maintaining a fixed-size heap (e.g., always keeping the 'k' largest elements).heapq.heapreplace(heap, item)
: Pops and returns the smallest item from the heap
, and then pushes the new item
. It differs from heappushpop
as it pops first. This can be slightly more efficient but raises an IndexError
if the heap is initially empty.# Example using heappushpop (maintaining top 3 elements)
top_k_heap = [10, 20, 30] # Assume this is already a heap
heapq.heapify(top_k_heap) # Ensure it's a heap
# Process new items, keeping only the top 3 largest (using min-heap for smallest)
# If we model this as keeping the 'k' largest using a min-heap of size 'k',
# we push a new item and pop the smallest if the heap size exceeds 'k'.
# heappushpop is useful if the heap is already at size 'k'.
new_item = 5
if len(top_k_heap) < 3:
heapq.heappush(top_k_heap, new_item)
elif new_item > top_k_heap[0]: # Only consider if larger than the smallest in heap
smallest = heapq.heappushpop(top_k_heap, new_item)
# smallest variable now holds the element that was pushed out (10 in this case)
print(f"Heap after considering {new_item}: {top_k_heap}") # Output: [20, 30, 5] -> needs heapify logic
# Example using heapreplace
data_heap = [10, 20, 30]
heapq.heapify(data_heap)
smallest = heapq.heapreplace(data_heap, 5) # Replaces 10 with 5, returns 10
print(f"Replaced item: {smallest}")
print(f"Heap after replace: {data_heap}")
# Expected output:
# Replaced item: 10
# Heap after replace: [5, 20, 30] # 5 is pushed, 10 popped. Heap property maintained.
Since heapq
directly implements only a min-heap, what if you need a max-heap (where the largest element is always at the root)? The standard techniques involve modifying the values stored in the heap:
# Method 1: Negating values for a max-heap
max_heap_negated = []
data = [10, 50, 20, 80, 30]
for item in data:
heapq.heappush(max_heap_negated, -item) # Push negative value
print(f"Internal min-heap (negated): {max_heap_negated}")
largest_item = -heapq.heappop(max_heap_negated) # Pop smallest negated, negate back
print(f"Largest item popped: {largest_item}") # Should be 80
print(f"Internal heap after pop: {max_heap_negated}")
# Method 2: Using tuples (priority, item) for a max-heap
max_heap_tuples = []
data_with_ids = [('task1', 10), ('task2', 50), ('task3', 20)]
for task_id, priority in data_with_ids:
heapq.heappush(max_heap_tuples, (-priority, task_id)) # Push (-priority, item)
print(f"Internal min-heap (tuples): {max_heap_tuples}")
neg_priority, highest_priority_task = heapq.heappop(max_heap_tuples)
print(f"Highest priority task popped: {highest_priority_task} (Priority: {-neg_priority})") # task2, 50
print(f"Internal heap after pop: {max_heap_tuples}")
heapq
also provides convenient functions nlargest(k, iterable)
and nsmallest(k, iterable)
, which efficiently find the k largest or smallest items from any iterable. These functions often use heaps internally, especially when k is small relative to the total number of items n.
scores = [34, 78, 12, 45, 99, 67, 23, 88, 50]
top_3 = heapq.nlargest(3, scores)
bottom_3 = heapq.nsmallest(3, scores)
print(f"Scores: {scores}")
print(f"Top 3 scores: {top_3}") # Output: [99, 88, 78]
print(f"Bottom 3 scores: {bottom_3}") # Output: [12, 23, 34]
Using heapq
allows you to leverage highly efficient, built-in implementations for priority queue and heap-related tasks within your machine learning workflows, such as managing candidates in beam search, finding nearest neighbors (when combined with other structures), or implementing selection algorithms. Its performance (O(logn) for push/pop, O(n) for heapify) makes it suitable for handling large datasets where efficiency is paramount.
© 2025 ApX Machine Learning