Home
Blog
Courses
LLMs
EN
All Courses
Advanced PyTorch
Chapter 1: PyTorch Internals and Autograd
Tensor Implementation Details
Understanding the Computational Graph
Autograd Engine Mechanics
Custom Autograd Functions: Forward and Backward
Higher-Order Gradient Computation
Inspecting Gradients and Graph Visualization
Memory Management Considerations
Hands-on Practical: Building Custom Autograd Functions
Chapter 2: Advanced Neural Network Architectures
Implementing Transformers from Components
Advanced Attention Mechanisms
Graph Neural Networks with PyTorch Geometric
Normalizing Flows for Generative Modeling
Neural Ordinary Differential Equations
Meta-Learning Algorithms
Practice: Implementing a Custom GNN Layer
Chapter 3: Optimization Techniques and Training Strategies
Sophisticated Optimizers Overview
Advanced Learning Rate Scheduling
Regularization Methods
Gradient Clipping and Accumulation
Mixed-Precision Training with torch.cuda.amp
Strategies for Handling Large Datasets
Automated Hyperparameter Tuning
Hands-on Practical: Implementing Mixed-Precision Training
Chapter 4: Model Deployment and Performance Optimization
TorchScript Fundamentals: Tracing vs Scripting
Model Quantization Techniques
Model Pruning Strategies
Performance Analysis with PyTorch Profiler
Optimizing Kernels with External Libraries
Exporting Models to ONNX Format
Serving Models with TorchServe
Practice: Profiling and Quantizing a Model
Chapter 5: Distributed Training and Parallelism
Fundamental Concepts of Distributed Computing
Data Parallelism with DistributedDataParallel (DDP)
Tensor Model Parallelism
Pipeline Parallelism Implementation
Fully Sharded Data Parallelism (FSDP)
Using torch.distributed Primitives
Setting up Distributed Environments
Hands-on Practical: Setting up a DDP Training Script
Chapter 6: Custom Extensions and Interoperability
Building Custom C++ Extensions
Building Custom CUDA Extensions
Working with the ATen Library
Interfacing PyTorch with NumPy
Extending torch.nn with Custom Modules
Extending torch.optim with Custom Optimizers
Foreign Function Interfaces (FFI)
Practice: Building a Simple CUDA Extension
Hands-on Practical: Implementing Mixed-Precision Training
Was this section helpful?
Helpful
Report Issue
Mark as Complete
© 2025 ApX Machine Learning
Practice: Implement Mixed-Precision Training in PyTorch