Chapter 3: Graph-Level Optimizations

Once a machine learning model is captured into an Intermediate Representation (IR), the logical structure of the computation is established. However, the initial graph often mirrors the high-level Python code defined by the user, prioritizing API usability over execution efficiency. Running this graph directly usually results in suboptimal performance due to excessive memory access and redundant calculations.

This chapter examines graph-level optimizations, which are architectural transformations applied to the computation graph before code generation begins. These passes rewrite the graph structure to reduce the computational footprint and memory bandwidth usage while ensuring the mathematical result remains unchanged.

You will learn how compilers analyze the dataflow to perform operator fusion, a process that combines multiple operations into a single kernel. For instance, computing an element-wise addition followed by an activation function like $y = \text{ReLU}(x + b)$ typically requires writing the intermediate sum to main memory and reading it back. Fusion allows the hardware to perform the activation on the data while it is still in registers or cache.

We will also cover the following optimization techniques:

Constant Folding: Evaluating expressions with static inputs at compile time to remove them from the runtime execution.
Dead Code Elimination: Identifying and removing graph nodes that do not influence the final output.
Memory Layout Transformation: Adjusting tensor storage formats (such as switching from NCHW to NHWC) to better align with hardware specific memory access patterns.
Common Subexpression Elimination: Detecting repeated calculations within the graph and restructuring the dataflow to compute the result only once.

By the end of this section, you will have the skills to inspect a computation graph and implement a custom pass to modify its structure programmatically.

Sections

3.1 Operator Fusion Strategies
3.2 Constant Folding and Propagation
3.3 Dead Code Elimination
3.4 Memory Layout Transformation
3.5 Common Subexpression Elimination
3.6 Implementing a Graph Pass

Chapter 3: Graph-Level Optimizations

We will also cover the following optimization techniques:

Constant Folding: Evaluating expressions with static inputs at compile time to remove them from the runtime execution.
Dead Code Elimination: Identifying and removing graph nodes that do not influence the final output.
Memory Layout Transformation: Adjusting tensor storage formats (such as switching from NCHW to NHWC) to better align with hardware specific memory access patterns.
Common Subexpression Elimination: Detecting repeated calculations within the graph and restructuring the dataflow to compute the result only once.

By the end of this section, you will have the skills to inspect a computation graph and implement a custom pass to modify its structure programmatically.

Sections

3.1 Operator Fusion Strategies
3.2 Constant Folding and Propagation
3.3 Dead Code Elimination
3.4 Memory Layout Transformation
3.5 Common Subexpression Elimination
3.6 Implementing a Graph Pass