When dealing with CPU-bound machine learning tasks in standard CPython, the Global Interpreter Lock (GIL) prevents multiple threads from executing Python bytecode simultaneously on different CPU cores. This significantly limits the performance gains achievable through threading for computationally intensive work like numerical calculations common in ML. The multiprocessing
module provides a powerful alternative by creating separate processes, each with its own Python interpreter and memory space, thus bypassing the GIL and enabling true parallel execution on multi-core systems.
This approach is particularly well-suited for tasks where computation dominates communication overhead, such as complex data transformations, parallel model training during hyperparameter searches, or evaluating multiple model variations concurrently.
Process
The fundamental building block in multiprocessing
is the Process
class. You can create a new process by instantiating Process
, specifying the target function to be executed in the new process via the target
argument, and passing any necessary arguments using the args
(tuple) or kwargs
(dictionary) parameters.
Once a Process
object is created, you initiate its execution by calling the start()
method. This spawns a new child process that begins running the specified target function. To ensure the main program waits for the child process to complete its work before proceeding, you call the join()
method on the Process
object. This is important for synchronizing results or ensuring tasks are finished before moving to subsequent steps.
Consider a simplified scenario where we apply a computationally intensive feature engineering step to different segments of data:
import multiprocessing
import time
import os
def compute_intensive_feature(data_chunk):
"""Simulates a CPU-bound calculation on a data chunk."""
pid = os.getpid()
print(f"Process {pid}: Starting computation on chunk of size {len(data_chunk)}")
# Simulate work
result = sum(x * x for x in data_chunk)
time.sleep(1)
print(f"Process {pid}: Finished computation.")
return result
if __name__ == "__main__": # Important guard for multiprocessing
# Sample data split into chunks
data = list(range(100000))
chunk1 = data[:50000]
chunk2 = data[50000:]
print("Main Process: Starting child processes...")
start_time = time.time()
# Create Process objects
p1 = multiprocessing.Process(target=compute_intensive_feature, args=(chunk1,))
p2 = multiprocessing.Process(target=compute_intensive_feature, args=(chunk2,))
# Start processes
p1.start()
p2.start()
# Wait for processes to complete
p1.join()
p2.join()
end_time = time.time()
print(f"Main Process: All child processes finished in {end_time - start_time:.2f} seconds.")
# Note: Retrieving results directly like this requires IPC (covered later).
# This example focuses on parallel execution flow.
Important: On platforms that use
spawn
orforkserver
start methods (like Windows or sometimes macOS), you must protect the main part of your script that creates processes within anif __name__ == "__main__":
block. This prevents infinite recursion where child processes re-import and re-execute the process creation code.
Pool
While creating individual Process
objects gives fine-grained control, managing a large number of processes manually can become cumbersome. The multiprocessing.Pool
class offers a convenient way to manage a pool of worker processes. It handles the distribution of tasks to available workers automatically.
A common pattern is to use the Pool.map()
method. It applies a given function to each item in an iterable (like a list), distributing the work across the worker processes in the pool. It collects the results and returns them in a list, maintaining the order corresponding to the input iterable.
Let's adapt the previous example to use a Pool
:
import multiprocessing
import time
import os
import math
def compute_intensive_feature_pool(x):
"""Simulates a CPU-bound calculation for a single item."""
# Simulate work for one item - more realistic for map
result = math.sqrt(x) * math.sin(x) * math.cos(x)
# Add a small delay to represent computation
for _ in range(10000):
pass
return result
if __name__ == "__main__":
data = list(range(200000)) # Larger dataset for Pool demonstration
print("Main Process: Starting Pool processing...")
start_time_serial = time.time()
# Serial execution for comparison
# results_serial = [compute_intensive_feature_pool(item) for item in data]
# Uncomment above line to run serial comparison, but it will be slow.
# print(f"Main Process: Serial execution finished (example - commented out).")
start_time_parallel = time.time()
# Determine number of processes (often based on CPU cores)
num_processes = multiprocessing.cpu_count()
print(f"Main Process: Using a pool of {num_processes} processes.")
# Create a Pool context manager (automatically handles close/join)
with multiprocessing.Pool(processes=num_processes) as pool:
# Apply the function in parallel
results_parallel = pool.map(compute_intensive_feature_pool, data)
end_time_parallel = time.time()
print(f"Main Process: Parallel execution finished in {end_time_parallel - start_time_parallel:.2f} seconds.")
# print(f"Result length: {len(results_parallel)}") # Verify output length
The Pool
automatically divides the data
iterable into chunks and assigns each chunk to a worker process. map
blocks until all results are computed and returned.
Other useful Pool
methods include:
apply_async()
: Executes a function asynchronously for a single set of arguments. It returns an AsyncResult
object immediately, allowing the main program to perform other tasks while the function runs in a worker process. You can retrieve the result later using get()
.imap()
/ imap_unordered()
: Lazy versions of map
. They return iterators that yield results as they become available, which can be memory-efficient for very large datasets. imap
preserves order, while imap_unordered
returns results in the order they complete.After submitting all tasks to a Pool
, you should typically call pool.close()
to indicate that no more tasks will be submitted, followed by pool.join()
to wait for all submitted tasks to complete. Using the Pool
as a context manager (as shown in the example with with multiprocessing.Pool(...) as pool:
) handles close()
and join()
automatically, which is the recommended practice.
Since processes operate in separate memory spaces, any data passed to a child process (as arguments) or returned from it needs to be transferred. multiprocessing
handles this primarily through serialization, using the pickle
module by default. The arguments passed to target
functions or submitted via Pool
methods are pickled in the main process and unpickled in the worker process. Return values are pickled in the worker and unpickled back in the main process (or the process calling get()
on an AsyncResult
).
Pickling introduces overhead, especially for large objects. Transferring large datasets like NumPy arrays or Pandas DataFrames between processes can become a bottleneck if not managed carefully. Furthermore, not all Python objects are pickleable (e.g., generators, lambda functions, nested functions, some complex custom objects). This limitation is important when designing functions intended for parallel execution. More advanced techniques for Inter-Process Communication (IPC), such as
Queue
or shared memory, will be discussed later for scenarios requiring more sophisticated data exchange.
multiprocessing
The multiprocessing
module finds numerous applications in accelerating ML workflows:
While multiprocessing
enables true parallelism, keep these points in mind:
Pool
helps amortize this cost over many tasks.multiprocessing.cpu_count()
is a common starting point, but optimal performance might require tuning based on the specific task and hardware.The multiprocessing
module is an essential tool for leveraging multi-core processors to speed up CPU-bound computations in Python ML pipelines. By understanding how to create processes using Process
and manage them efficiently with Pool
, you can significantly reduce the execution time of demanding tasks, leading to faster experimentation and model development cycles. Remember to consider the trade-offs related to process creation and data serialization when designing your parallel solutions.
© 2025 ApX Machine Learning