Home
Blog
Courses
LLMs
EN
All Courses
Mixture of Experts: Core Concepts and Hands-on Implementation
Chapter 1: Foundations of Mixture of Experts Models
Overview of Sparsely-Gated MoE Architecture
The Gating Network: Formulation and Function
Expert Networks: Specialization and Capacity
Mathematical Formulation of the MoE Layer
Load Balancing and Auxiliary Losses
Challenges in MoE Training: Expert Collapse
Comparison with Dense Model Scaling
Hands-on: Implementing a Basic MoE Layer
Chapter 2: Advanced Routing Mechanisms
Analysis of Top-k Gating and its Variants
Noisy Top-k Gating for Load Balancing
Hash-based Routing for Deterministic Selection
Switch Transformers: Simplified Routing
Soft MoE: Differentiable Routing
Analyzing Routing Decisions and Specialization
Hands-on: Implementing Different Routing Strategies
Chapter 3: Training and Optimization of Large-Scale MoEs
Expert Parallelism for Distributed Training
Combining Model, Data, and Expert Parallelism
Capacity Factor and its Impact on Performance
Techniques for Mitigating Router Z-Loss Instability
Precision and its Effects: BFloat16 Training
Fine-tuning Strategies for Pre-trained MoE Models
Practice: Configuring a Distributed Training Job
Chapter 4: Efficient Inference with MoE Models
Inference Challenges: Memory and Latency
Expert Offloading to CPU or NVMe
Batching Strategies for Sparse Activation
Model Distillation for MoE Compression
Quantization Techniques for MoE Layers
Speculative Decoding with MoE Models
Hands-on: Building an Optimized Inference Pipeline
Chapter 5: Integrating MoE into Modern Architectures
Replacing FFNs with MoE Layers in Transformers
Placement of MoE Layers: Frequency and Location
MoE in Vision Transformers (ViT)
MoE in Multi-modal Models
Architectural Variants and their Properties
Analyzing Parameter vs. FLOPs Trade-offs
Practice: Modifying a Transformer to use MoE
Hash-based Routing for Deterministic Selection
Was this section helpful?
Helpful
Report Issue
Mark as Complete
© 2025 ApX Machine Learning
Hash-based Routing in MoE Models