Operating machine learning systems for standard models is established practice. However, large language models (LLMs), often containing billions or trillions of parameters (e.g., a model with P parameters where P≫109), introduce distinct operational requirements that go beyond traditional MLOps. Their sheer size, computational demands for training and inference, and unique failure modes necessitate specialized approaches.
This chapter establishes the foundational concepts of LLMOps. We will examine how established MLOps principles adapt, and where they fall short, when applied to LLMs. You will learn about:
By the end of this chapter, you will understand the unique context and requirements for managing large models operationally, preparing you for the specific techniques covered in subsequent chapters.
1.1 Transitioning from MLOps to LLMOps
1.2 Unique Challenges of LLMs in Production
1.3 Infrastructure Requirements for Large Models
1.4 The LLMOps Lifecycle Stages
1.5 Tooling Considerations for LLMOps
© 2025 ApX Machine Learning