After a model has been trained and validated, it holds potential value. However, that potential is only realized when the model's predictions are made available to users or other software systems. This transition from a trained artifact to a working service is known as model deployment. It's the stage where your model graduates from the lab and starts performing its job in a production environment.Choosing the right deployment strategy is a critical decision that depends entirely on the requirements of your application. The primary question to ask is: How and when does your application need predictions? The answer will guide you toward one of the common deployment patterns.Batch Prediction: Processing Data in BulkThe simplest deployment strategy is batch prediction, also known as offline scoring. In this approach, the model processes a large collection, or "batch," of observations at once. This process typically runs on a fixed schedule, for example, once a day, and the predictions are stored in a database or file system for later use.You should consider a batch prediction strategy when:Real-time predictions are not necessary. For example, calculating weekly sales forecasts or segmenting customers for a monthly marketing campaign.You are processing large volumes of data. It is often more efficient to process terabytes of data in a single run than to handle each data point individually.Latency is not a primary concern. The time it takes to generate a prediction for a single observation can be long, as the job is optimized for throughput, not speed.A typical batch prediction workflow involves a scheduled job that reads input data from a source like a data warehouse, feeds it to the model, and writes the resulting predictions to a destination table.digraph G { rankdir=TB; splines=ortho; node [shape=box, style="rounded,filled", fontname="Arial", fillcolor="#e9ecef"]; edge [color="#495057"]; "Data Warehouse" [fillcolor="#a5d8ff"]; "Prediction Store" [fillcolor="#b2f2bb"]; "Batch Job" [shape=cylinder, fillcolor="#ffc9c9"]; "Data Warehouse" -> "Batch Job" [label="Input Data"]; "Batch Job" -> "Prediction Store" [label="Predictions"]; {rank=same; "Data Warehouse"; "Prediction Store"}; }A high-level view of a batch prediction system. A scheduled job processes data in bulk and stores the output.Online Prediction: Serving Predictions on DemandIn contrast to the batch approach, online prediction, or real-time serving, generates predictions as they are requested. The model is wrapped in an API and deployed as a persistent service, often behind a load balancer, ready to respond to incoming requests with very low latency.This strategy is essential for applications that require immediate feedback. Common use cases include:Fraud detection for financial transactions.Product recommendations on an e-commerce website as a user browses.Spam filtering for incoming emails.Online serving architectures are generally more complex than batch systems. They require scalable infrastructure to handle request traffic and must be monitored closely to ensure high availability and low latency. The model is typically exposed via a REST API endpoint, allowing other services to request predictions by sending a simple HTTP request with the input data.digraph G { rankdir=TB; splines=ortho; node [shape=box, style="rounded,filled", fontname="Arial", fillcolor="#e9ecef"]; edge [color="#495057"]; "User Application" [fillcolor="#a5d8ff"]; "Model API" [fillcolor="#fcc2d7"]; "User Application" -> "Model API" [label="Request (Data)"]; "Model API" -> "User Application" [label="Response (Prediction)"]; }An online prediction system responds to individual requests in real time.Choosing the Right PathThe table below summarizes the main differences between these two primary strategies.FeatureBatch PredictionOnline PredictionDataLarge, static datasetsSingle or few data pointsTriggerScheduled (e.g., hourly, daily)On-demand (API call)LatencyHigh (minutes to hours)Low (milliseconds)ThroughputHighLow to high (scalable)Use CaseNon-interactive reportingInteractive applicationsInfrastructureSimpler (e.g., cron job)More complex (e.g., API server)In some cases, a hybrid approach might be used. For instance, an e-commerce site could use a batch job to pre-calculate recommendations for all users overnight, while an online model provides real-time adjustments based on a user's current session.Understanding these patterns is fundamental to designing an effective MLOps workflow. The chosen strategy directly impacts how you will version, test, deploy, and monitor your model, which are topics we will cover throughout the rest of this course.