Applying cloud pricing models to a common machine learning scenario is essential for understanding cost-effective decisions. A concrete example demonstrates how dramatically the choice of pricing model can affect the final cost of a training job.
Imagine you are tasked with training a computer vision model for an image classification task. Here are the specifics of the job:
Your goal is to calculate the projected cost of this training job using three different cloud pricing models. We will use the following pricing for a virtual machine instance equipped with one A10G GPU.
| Service/Instance Type | Unit Price | Notes |
|---|---|---|
| On-Demand Instance | $1.20 / hour | Billed per second, no commitment. |
| 1-Year Reserved Instance | $0.72 / hour (effective rate) | Requires a 1-year upfront commitment. |
| Spot Instance | $0.36 / hour (average price) | Price fluctuates; 70% average discount. |
| Spot Interruption Rate | 1 interruption per 24 hours | An assumption for this workload. |
| Interruption Overhead | 15 minutes | Time lost to restart the job from a checkpoint. |
| Data Egress | $0.09 / GB | Cost to transfer data out of the cloud. |
Let's calculate the cost for each option.
This is the most straightforward calculation and serves as our baseline. The cost is simply the hourly rate multiplied by the total duration of the job.
CostOn-Demand=Hourly Rate×Duration CostOn-Demand=$1.20/hour×100 hours=$120.00The On-Demand cost provides maximum flexibility with no commitment, but it is the most expensive option.
Using a Reserved Instance (RI) provides a significant discount in exchange for a long-term commitment. For this single job, we calculate the cost based on the effective hourly rate.
CostReserved=Effective Hourly Rate×Duration CostReserved=$0.72/hour×100 hours=$72.00This is a 40% saving compared to the On-Demand price. However, remember that the organization is committed to paying for this instance for an entire year. This option is only truly cost-effective if you have a continuous stream of workloads to keep the instance utilized for the majority of its commitment term.
Spot Instances offer the deepest discounts but come with the risk of interruption. Our calculation must account for this risk by adding the cost of the time lost due to interruptions.
First, let's determine the number of expected interruptions over the 100-hour job run.
Interruptions=Hours per InterruptionTotal Duration=24 hours/interruption100 hours≈4.17We'll round this up to 5 interruptions to be conservative.
Next, calculate the total overhead time. This is the time spent restarting the job after each interruption.
Total Overhead=Number of Interruptions×Interruption Overhead Total Overhead=5 interruptions×15 minutes/interruption=75 minutes=1.25 hoursNow, we calculate the total billable time, which includes the original duration plus the overhead.
Total Billable Time=Original Duration+Total Overhead=100 hours+1.25 hours=101.25 hoursFinally, we find the total compute cost for using a Spot Instance.
CostSpot Compute=Total Billable Time×Average Spot Price CostSpot Compute=101.25 hours×$0.36/hour=$36.45While small for this job, data egress fees are an important part of total cloud cost. Let's calculate the cost to download the final 400 MB model artifact.
First, convert megabytes (MB) to gigabytes (GB).
400 MB=0.4 GBNow, calculate the egress cost.
CostEgress=0.4 GB×$0.09/GB=$0.036This cost is negligible for a single model, but for continuous deployment systems that move terabytes of data, these fees can become significant. For our comparison, we'll add this to each total.
Let's summarize our findings in a final table.
| Pricing Model | Compute Cost | Total Cost (incl. Egress) | Savings vs. On-Demand |
|---|---|---|---|
| On-Demand | $120.00 | $120.04 | 0% |
| Reserved Instance | $72.00 | $72.04 | ~40% |
| Spot Instance | $36.45 | $36.49 | ~70% |
This analysis makes the financial trade-offs clear.
The total estimated cost for the same 100-hour training job varies significantly across different cloud pricing models.
This practical exercise demonstrates a fundamental principle of AI infrastructure cost management. For workloads that are fault-tolerant and not time-critical, Spot Instances offer substantial savings. For predictable, long-running needs, Reserved Instances provide a good balance of cost and reliability. On-Demand instances serve as a valuable, albeit expensive, option for short-term, urgent tasks or for initial development and benchmarking before committing to a long-term plan. As an infrastructure engineer, running these kinds of cost projections before launching major workloads is an indispensable practice.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with