Chapter 3: Leveraging Cloud Platforms for AI

While building on-premise infrastructure provides maximum control, cloud platforms offer an alternative path defined by flexibility and on-demand scalability. This chapter shifts our focus from owning hardware to using the extensive compute resources offered by major cloud providers. The primary trade-off involves exchanging direct hardware management for access to specialized accelerators, managed services, and a pay-as-you-go pricing model.

Here, you will learn to navigate the offerings of providers like AWS, GCP, and Azure. We will evaluate the differences between raw virtual machines, known as Infrastructure-as-a-Service (IaaS), and higher-level managed AI platforms such as SageMaker or Vertex AI. You will gain the skills to select appropriate GPU and CPU instances for both training and inference workloads, configure object storage for large datasets, and set up secure networking using Virtual Private Clouds (VPCs). The chapter concludes with a hands-on lab where you will provision and connect to a GPU-enabled cloud instance, putting these principles into practice.

Sections

3.1 Overview of Major Cloud Providers for AI
3.2 Comparing Managed AI Services vs IaaS
3.3 Selecting Virtual Machine Instances for Training
3.4 Choosing Instances for Inference and Serving
3.5 Object Storage Services for Datasets
3.6 Understanding Cloud Networking and VPCs
3.7 Security Considerations in the Cloud
3.8 Hands-on Practical: Launching a GPU Cloud Instance