Setting up a compiler development environment requires ensuring that several software layers communicate correctly. Unlike installing a standard Python data science library, an ML compiler stack must interface directly with low-level hardware drivers and C++ build tools. Apache TVM, an open-source machine learning compiler framework, is utilized as a primary study platform. TVM exposes the internal representations and optimization passes that need inspection.We will also briefly touch upon LLVM, a library that is used to construct, optimize, and produce intermediate and/or binary machine code. Most ML compilers, including TVM, rely on LLVM to handle the final generation of machine code for CPUs.The following diagram illustrates how the tools you are about to install fit into the compilation pipeline.digraph G { rankdir=TB; node [shape=box, style=filled, fontname="Helvetica", fontsize=10, color="#dee2e6"]; edge [fontname="Helvetica", fontsize=9, color="#868e96"]; subgraph cluster_frontend { label = "High-Level"; style = filled; color = "#f8f9fa"; Python [label="Python Script", fillcolor="#a5d8ff", width=1.5]; Framework [label="PyTorch / TensorFlow", fillcolor="#b197fc", width=1.5]; } subgraph cluster_compiler { label = "Compiler Stack"; style = filled; color = "#f8f9fa"; TVM [label="Apache TVM\n(Optimization)", fillcolor="#63e6be", width=1.5]; LLVM [label="LLVM\n(Code Gen)", fillcolor="#ced4da", width=1.5]; } Hardware [label="Hardware\n(CPU/GPU)", fillcolor="#ffc9c9", width=1.5]; Python -> Framework; Framework -> TVM [label="Computation Graph"]; TVM -> LLVM [label="Low-Level IR"]; LLVM -> Hardware [label="Machine Code"]; }Flow of data from high-level frameworks through the compiler stack to hardware execution.Environment PrerequisitesBefore installing the compiler stack, ensure your environment meets the basic requirements. We recommend using a Linux-based environment (Ubuntu 20.04 or later) or macOS. Windows users are encouraged to use the Windows Subsystem for Linux (WSL2) to avoid pathing and build tool inconsistencies.You will need Python 3.8 or higher. While it is possible to install packages globally, using a virtual environment prevents version conflicts with other projects.# Create a virtual environment named 'ml-compiler' python3 -m venv ml-compiler # Activate the environment source ml-compiler/bin/activate # On Linux/macOS # ml-compiler\Scripts\activate # On WindowsInstalling Apache TVMFor production deployment, engineers often build TVM from source to enable specific CUDA backends or experimental features. However, for learning the internal mechanics of graph transformations and IR, the pre-built Python binary provides a stable and accessible starting point.Install the package via pip:pip install apache-tvm numpy decorator attrsIf you intend to use PyTorch as your frontend framework (recommended for this course), ensure it is also installed in the same environment:pip install torch torchvisionVerifying the InstallationOnce the packages are installed, we must verify that the compiler can define a computation, generate code, and execute it. We will write a minimal "Hello World" program for compilers: a vector addition kernel.Unlike standard Python programming where you execute logic immediately, using a compiler involves three distinct phases:Definition: Describing the inputs and the mathematical operation.Schedule: Defining how the computation loops should be organized.Build: Generating the executable machine code function.Create a file named verify_install.py and add the following code:import tvm from tvm import te import numpy as np def verify_vector_add(): # 1. Definition: Declare tensor shapes and the computation n = te.var("n") A = te.placeholder((n,), name="A") B = te.placeholder((n,), name="B") # Describe the mathematical intent: C[i] = A[i] + B[i] C = te.compute(A.shape, lambda i: A[i] + B[i], name="C") # 2. Schedule: Create a default execution schedule s = te.create_schedule(C.op) # 3. Build: Compile the function for the host CPU # 'llvm' tells TVM to use LLVM to generate CPU binary code tgt = tvm.target.Target(target="llvm", host="llvm") fadd = tvm.build(s, [A, B, C], target=tgt, name="myadd") # 4. Execute: Run the compiled function ctx = tvm.cpu(0) n_val = 1024 a_data = tvm.nd.array(np.random.uniform(size=n_val).astype(A.dtype), ctx) b_data = tvm.nd.array(np.random.uniform(size=n_val).astype(B.dtype), ctx) c_data = tvm.nd.array(np.zeros(n_val, dtype=C.dtype), ctx) fadd(a_data, b_data, c_data) # Validation np.testing.assert_allclose( c_data.asnumpy(), a_data.asnumpy() + b_data.asnumpy() ) print("Success: Vector addition compiled and executed correctly.") if __name__ == "__main__": verify_vector_add()Run the script in your terminal:python verify_install.pyIf you see the success message, your environment is correctly configured to perform code generation using the LLVM backend.Understanding the Build OutputWhen you run the verification script, the function tvm.build performs the heavy lifting. It takes the high-level description of vector addition and lowers it through several intermediate representations.To see what the compiler actually produced, we can inspect the generated source code. Modifying the build call allows us to print the intermediate representation (IR) or the final assembly.You can modify the previous script to print the source code just before execution:# Print the LLVM IR (Intermediate Representation) print(fadd.get_source())The output will resemble LLVM assembly code. While we will study this syntax in detail in the chapter on "Code Generation Backends", observing it now confirms that your Python script is indeed generating low-level instructions.Troubleshooting Common IssuesMissing LLVM Support If you receive an error stating RuntimeError: target attribute llvm is not enabled, it indicates that the pre-built TVM package cannot find the LLVM library on your system, or the package was built without it.Solution: Ensure you installed apache-tvm and not a stripped-down version. On Linux, you may need to install llvm system libraries (e.g., sudo apt-get install llvm-dev).Clang/GCC Requirement Some compilation steps may require a system C++ compiler to link object files.Solution: Verify that you have g++ or clang installed and available in your system path.Python Version Mismatch TVM bindings are sensitive to Python versions.Solution: Ensure you are running the script with the same Python executable used to install the pip packages.With your environment verified, you are ready to move past simple vector addition. In the next chapter, we will examine how these tools handle complex neural network operators and manage the flow of data through Intermediate Representations.