forward-mode (Jacobian-vector products, JVPs) and reverse-mode (vector-Jacobian products, VJPs). We will cover how to compute these directly using `jax.jvp` and `jax.vjp`, which form the building blocks for `jax.grad` and other transformations. Key topics include: * Calculating higher-order derivatives by composing differentiation transformations. * Strategies for efficiently computing full Jacobian and Hessian matrices. * Defining custom differentiation rules for functions using `jax.custom_vjp` and `jax.custom_jvp`, useful for numerical stability, performance optimization, or handling non-JAX code. * Understanding how automatic differentiation interacts with control flow primitives like `lax.scan`, `lax.cond`, and `lax.while_loop`. * Techniques for handling functions that are not differentiable or where gradients should be explicitly stopped, such as using `jax.lax.stop_gradient`. By the end of this chapter, you will have a more detailed understanding of JAX's autodiff system and the tools to apply it effectively in advanced scenarios.2a:T54b,

As machine learning models grow in size and complexity, running computations on a single accelerator (like a GPU or TPU) becomes insufficient. Training large models or processing massive datasets often requires distributing the work across multiple devices. This chapter focuses on how JAX facilitates such distributed computing.

We will start with the fundamental concepts of parallelism relevant to machine learning workloads. You'll learn how JAX manages different compute devices and how to use its core primitive for multi-device execution: pmap (parallel map). We will cover the Single-Program Multiple-Data (SPMD) paradigm that pmap employs and demonstrate how to implement data parallelism, a common technique for accelerating training.

Furthermore, you'll explore essential collective communication operations (like psum, pmean) needed to aggregate information, such as gradients, across devices

Chapter 3: Distributed Computing with JAX

Sections