Implementing a custom dialect shifts the role from simply using MLIR's existing functionalities to actively building new compiler components. While predefined dialects like linalg and affine cover standard tensor operations, bespoke hardware accelerators or domain-specific frameworks often require operations that do not map cleanly to existing IR structures. By defining a custom dialect, you establish a contract between the high-level logic of your machine learning model and the optimization passes that follow.This practical session focuses on the Operation Definition Specification (ODS) framework using TableGen. We will construct a minimal dialect named tmath (Tensor Math) designed to handle simplified matrix operations. We will then implement a lowering pass that transforms these high-level operations into the affine dialect, enabling the polyhedral optimizations discussed in previous chapters.The TableGen WorkflowWriting C++ boilerplate for every compiler operation is error-prone and difficult to maintain. MLIR solves this by using TableGen, a record-based domain-specific language, to define the structure, constraints, and verification logic of operations. The build system then generates the corresponding C++ classes (headers and implementations) automatically.The workflow involves three primary stages: defining the dialect, defining the operations, and linking the generated artifacts into the main compiler binary.digraph G { rankdir=TB; bgcolor="transparent"; node [shape=box, style=filled, fontname="Arial", fontsize=12, margin=0.2]; edge [fontname="Arial", fontsize=10, color="#868e96"]; subgraph cluster_src { label=""; style=invis; td_file [label="Dialect Definition\n(.td file)", fillcolor="#a5d8ff", color="#1c7ed6"]; } subgraph cluster_tool { label=""; style=invis; tblgen [label="mlir-tblgen\n(TableGen Tool)", fillcolor="#ffc9c9", color="#fa5252"]; } subgraph cluster_out { label=""; style=invis; inc_h [label="Declarations\n(.h.inc)", fillcolor="#b2f2bb", color="#37b24d"]; inc_cpp [label="Definitions\n(.cpp.inc)", fillcolor="#b2f2bb", color="#37b24d"]; } compiler [label="MLIR Compiler\nBinary", fillcolor="#e9ecef", color="#495057"]; td_file -> tblgen [label="input"]; tblgen -> inc_h [label="generates"]; tblgen -> inc_cpp [label="generates"]; inc_h -> compiler [label="#include"]; inc_cpp -> compiler [label="#include"]; }The build process where TableGen translates declarative specifications into C++ source code that is subsequently compiled into the final binary.Defining the Dialect SchemaThe dialect definition serves as the namespace and registration point for your operations. In a file named TMathDialect.td, we inherit from the Dialect class provided by ODS.// TMathDialect.td include "mlir/IR/OpBase.td" def TMath_Dialect : Dialect { let name = "tmath"; let summary = "A minimal tensor math dialect for demonstration"; let description = [{ The tmath dialect provides high-level matrix operations designed to be lowered to affine loops. }]; let cppNamespace = "::mlir::tmath"; }This definition generates a C++ class TMathDialect in the specified namespace. The name field determines the prefix for the IR, so operations will appear as tmath.op_name in the text format.defining Structured OperationsWith the dialect established, we define the operations. An operation in MLIR is characterized by its name, arguments (operands and attributes), results, and traits. Traits allow the compiler to reason about the behavior of the operation, such as whether it has side effects or if the result type depends on the input type.We will define a MatMulOp. Unlike the generic linalg.matmul, our operation will enforce strictly ranked 2D tensors to simplify the lowering logic.// TMathOps.td include "TMathDialect.td" include "mlir/Interfaces/SideEffectInterfaces.td" def MatMulOp : Op<TMath_Dialect, "matmul", [Pure]> { let summary = "Performs matrix multiplication"; let description = [{ Computes the product of two 2D tensors. Returns a new tensor with dimensions [M, N] given inputs [M, K] and [K, N]. }]; let arguments = (ins F32Tensor:$lhs, F32Tensor:$rhs ); let results = (outs F32Tensor:$result); let assemblyFormat = "$lhs `,` $rhs attr-dict `:` type($lhs) `*` type($rhs) `->` type($result)"; let hasVerifier = 1; }In this specification:[Pure] Trait: Indicates that the operation does not access memory or global state essentially, allowing Dead Code Elimination (DCE) to remove it if unused.Arguments: We constrain inputs to F32Tensor. ODS automatically generates type checking code to ensure only 32-bit floating-point tensors are accepted.Assembly Format: Defines how the textual representation looks. This declarative format saves us from writing a custom C++ parser and printer.Verifier: Setting hasVerifier = 1 tells TableGen that we will provide a C++ implementation to validate runtime constraints, such as ensuring the inner dimensions of the matrices match ($$K$$ in $$[M, K] \times [K, N]$$).Implementing the VerifierTableGen generates the declaration, but we must implement the logic in the corresponding .cpp file. This verification step is important for catching shape mismatches early in the compilation pipeline.// TMathOps.cpp llvm::LogicalResult MatMulOp::verify() { auto lhsType = getLhs().getType().cast<RankedTensorType>(); auto rhsType = getRhs().getType().cast<RankedTensorType>(); if (lhsType.getRank() != 2 || rhsType.getRank() != 2) return emitOpError("operands must be 2D tensors"); if (lhsType.getDimSize(1) != rhsType.getDimSize(0)) { return emitOpError("dimension mismatch: lhs column size ") << lhsType.getDimSize(1) << " must match rhs row size " << rhsType.getDimSize(0); } return success(); }Lowering to Affine LoopsA dialect is only useful if it can be translated into executable code. We will implement a rewrite pattern that lowers tmath.matmul into the affine dialect. This transformation replaces the high-level matrix multiplication with three nested loops, explicit loads, and stores.The pattern rewriting infrastructure in MLIR revolves around the OpRewritePattern class. We override the matchAndRewrite method.struct MatMulLowering : public OpRewritePattern<tmath::MatMulOp> { using OpRewritePattern<tmath::MatMulOp>::OpRewritePattern; LogicalResult matchAndRewrite(tmath::MatMulOp op, PatternRewriter &rewriter) const override { auto loc = op.getLoc(); Value lhs = op.getLhs(); Value rhs = op.getRhs(); // Get shapes auto lhsType = lhs.getType().cast<RankedTensorType>(); auto rhsType = rhs.getType().cast<RankedTensorType>(); int64_t M = lhsType.getDimSize(0); int64_t K = lhsType.getDimSize(1); int64_t N = rhsType.getDimSize(1); // Allocate buffer for the result using memref auto resultMemRefType = MemRefType::get({M, N}, rewriter.getF32Type()); Value resultAlloc = rewriter.create<memref::AllocOp>(loc, resultMemRefType); // Create Affine Maps for the loops // Loop i from 0 to M // Loop j from 0 to N // Loop k from 0 to K // Build the loop nest buildAffineLoopNest(rewriter, loc, {M, N, K}, [&](OpBuilder &b, Location loc, ValueRange ivs) { Value i = ivs[0]; Value j = ivs[1]; Value k = ivs[2]; // Load A[i, k] Value aVal = b.create<affine::AffineLoadOp>(loc, lhs, ValueRange{i, k}); // Load B[k, j] Value bVal = b.create<affine::AffineLoadOp>(loc, rhs, ValueRange{k, j}); // Compute product Value product = b.create<arith::MulFOp>(loc, aVal, bVal); // Load current C[i, j] (accumulator) Value cVal = b.create<affine::AffineLoadOp>(loc, resultAlloc, ValueRange{i, j}); // Accumulate Value sum = b.create<arith::AddFOp>(loc, cVal, product); // Store result back b.create<affine::AffineStoreOp>(loc, sum, resultAlloc, ValueRange{i, j}); } ); // Since we moved from Tensor to MemRef, generating a tensor_to_memref // or bufferization pass is usually required here. // For this snippet, we assume the inputs were already bufferized. rewriter.replaceOp(op, resultAlloc); return success(); } };Registering the Dialect and PassesTo use the dialect in a tool like mlir-opt, it must be registered in the MLIRContext. The lowering pass is then added to a PassManager.int main(int argc, char **argv) { mlir::DialectRegistry registry; // Register our custom dialect registry.insert<mlir::tmath::TMathDialect>(); // Register standard dialects we lower to registry.insert<mlir::affine::AffineDialect, mlir::memref::MemRefDialect>(); mlir::MLIRContext context(registry); // Load a module, run the pass manager... // ... }By completing this workflow, you have successfully extended the compiler's intermediate representation. The tmath dialect can now be used as a target from a frontend (like a Python parser) and can be optimized or lowered into standard dialects that eventually map to LLVM IR for CPU execution or SPIR-V for GPU execution. This extensibility is the mechanism that allows MLIR to support diverse hardware architectures without rewriting the entire compilation stack.