Defining the structure of a computation graph provides the skeleton of a machine learning model. To flesh out this skeleton, the compiler needs precise information about the data flowing through the edges. Deep learning compilers require this data to be strictly typed and shaped. Unlike standard Python code where variables can change type dynamically, an Intermediate Representation (IR) treats tensor shapes and data types (dtypes) as rigid contracts. These contracts allow the compiler to calculate memory requirements and select the correct hardware instructions long before the code actually runs.The Significance of Data TypesThe data type of a tensor determines how the numerical values are stored in memory and how the hardware processing units manipulate them. While a data scientist might think in terms of generic "floats," a compiler must be specific. It needs to know if a value is a 32-bit floating point (FP32), a 16-bit half-precision float (FP16), or an 8-bit integer (INT8).This specificity is critical for two main reasons:Memory Bandwidth: Moving data is often more expensive than computing on it. An FP16 tensor occupies half the memory bandwidth of an FP32 tensor. The compiler uses dtype information to optimize memory access patterns.Instruction Selection: Hardware accelerators like GPUs or TPUs often have specialized units for specific types. For instance, NVIDIA Tensor Cores operate primarily on FP16 or BF16 inputs. If the IR specifies FP32, the compiler might inject cast operations or fall back to slower, general-purpose CUDA cores.In the IR, a node representing an addition operation is not generic. It is explicitly an add.f32 or add.i8. If a framework generates a graph where an FP32 tensor is added to an INT32 tensor, the compiler's first pass, type inference, must detect this mismatch. It will either raise an error or insert an explicit cast node to align the types, ensuring valid machine code generation.Tensor Shapes and RankAlongside data types, the shape of a tensor is its most defining attribute. The shape defines the dimensionality (rank) and the extent of each dimension.Rank: The number of dimensions. A scalar has rank 0, a vector has rank 1, and a batch of images typically has rank 4 (Batch, Channel, Height, Width).Dimensions: The specific size of each axis.The compiler relies on shape information to perform static memory allocation. If the compiler knows that a tensor has a shape of $(32, 128)$ and uses float32 (4 bytes), it can pre-calculate that the buffer requires exactly $32 \times 128 \times 4 = 16,384$ bytes. This allows the generated binary to allocate memory efficiently on the heap or stack without the overhead of dynamic memory management calls like malloc during execution.Shape InferenceOne of the primary responsibilities of the compiler frontend is shape inference. This is the process of propagating shape information from input nodes through the entire graph to determine the shape of every intermediate and output tensor.Consider a Convolution operation. The output shape is not arbitrary; it is a deterministic function of the input shape, weight shape, padding, stride, and dilation. The compiler implements these formulas as part of its analysis passes.The standard formula for calculating the output spatial dimension in a convolution is:$$ O = \left\lfloor \frac{I - K + 2P}{S} \right\rfloor + 1 $$Where:$O$ is the output size$I$ is the input size$K$ is the kernel (filter) size$P$ is the padding$S$ is the strideBy applying this logic to every node in the DAG, the compiler builds a complete map of tensor sizes.digraph ShapeInference { rankdir=LR; bgcolor="transparent"; node [shape=record, style=filled, fontname="Helvetica", fontsize=11, penwidth=0]; edge [color="#adb5bd", arrowsize=0.8]; input [label="{Input Tensor|Shape: [1, 64, 128, 128]|Type: float32}", fillcolor="#e7f5ff", fontcolor="#1864ab"]; weights [label="{Weights|Shape: [128, 64, 3, 3]|Type: float32}", fillcolor="#eebefa", fontcolor="#862e9c"]; op [label="{Conv2D|Stride: 1\nPadding: 1}", shape=box, fillcolor="#f1f3f5", fontcolor="#495057", style="rounded,filled"]; output [label="{Inferred Output|Shape: [1, 128, 128, 128]|Type: float32}", fillcolor="#d3f9d8", fontcolor="#2b8a3e"]; input -> op; weights -> op; op -> output; }Visualization of shape inference propagation. The compiler calculates the output geometry based on input dimensions and operator attributes.If the inferred shapes are incompatible, for example, trying to perform a matrix multiplication with dimensions $(M, K)$ and $(J, N)$ where $K \neq J$, the compiler identifies this as a structural error. This verification protects the runtime from crashing due to invalid memory accesses.Handling BroadcastingFrameworks like NumPy and PyTorch allow implicit broadcasting, where a smaller tensor is automatically expanded to match the shape of a larger tensor during element-wise operations. For example, adding a scalar bias to a vector.While convenient for the user, implicit behavior is problematic for low-level code generation. The compiler often performs a pass to make these broadcasts explicit. It might insert specific broadcast or expand nodes into the IR.Explicit broadcasting ensures that the backend code generator knows exactly how to handle memory strides. Instead of physically copying data to expand the tensor (which wastes memory), the compiler uses zero-stride memory access patterns to read the same value repeatedly while traversing the larger tensor's dimensions.Layout SensitivityThe shape of a tensor in the IR is mathematically defined, but its physical layout in memory can vary. A 4D tensor for images is usually represented as NCHW (Batch, Channels, Height, Width) or NHWC (Batch, Height, Width, Channels).NCHW: Often preferred by GPU implementations using cuDNN.NHWC: often preferred by CPU backends and TPU processing units for better vectorization.The definition of "shape" in the IR implies a logical ordering of dimensions. However, optimizing compilers often rewrite these layouts transparently. We will discuss the mechanics of this transformation in the "Memory Layout Transformation" section of Chapter 3. For now, it is sufficient to understand that the shape metadata in the IR acts as the source of truth for the logical organization of the data, regardless of how the bytes are physically arranged in RAM.