All Courses

Getting Started with Kerb Toolkit

Chapter 1: Introduction and First Text Generation

Course Overview and Setup

Principles of the Toolkit

Configuring LLM Providers

Executing Your First Generation Call

Handling Streaming Responses

Chapter 2: Advanced Prompting Techniques

Introduction to Prompt Engineering

Creating Dynamic Prompts with the Template Engine

Managing and Versioning Prompts

Implementing Few-Shot Prompting

Extracting Structured Data from LLM Outputs

Parsing JSON and Code Snippets

Chapter 3: Managing Context and Tokens

The Importance of the Context Window

Counting Tokens with the Tokenizer

Strategies for Text Truncation

Managing Token Budgets for Complex Prompts

Chapter 4: Preparing Data for Retrieval

Data Loading Fundamentals

Loading Documents from Different Sources

The Rationale Behind Text Chunking

Applying Chunking Strategies

Text Preprocessing for Better Retrieval

Chapter 5: Embeddings and Semantic Search

Understanding Text Embeddings

Generating Embeddings

Fundamentals of Vector Similarity

Performing Semantic Search

Choosing an Embedding Model

Chapter 6: Building Retrieval-Augmented Generation (RAG) Systems

Anatomy of a RAG System

Creating a Simple Retrieval Pipeline

Implementing Different Search Methods

Improving Relevance with Re-ranking

Managing Retrieved Context for Generation

Chapter 7: Building Conversational Applications with Memory

The Challenge of Stateful Conversations

Implementing Conversation Buffer Memory

Using Summary Memory for Long Conversations

Tracking Entities Across a Conversation

Chapter 8: Developing Autonomous Agents

Introduction to LLM Agents

The ReAct Pattern for Reasoning and Acting

Building a ReAct Agent

Defining and Using Tools

Implementing Plan-and-Execute Agents

Orchestrating Multi-Agent Systems

Chapter 9: Optimizing for Performance and Cost

Identifying Performance Bottlenecks

Implementing LLM Response Caching

Caching Embeddings to Reduce API Calls

Cache Invalidation Strategies

Chapter 10: Ensuring Application Safety and Reliability

Adding Safety Guardrails to Applications

Implementing Content Moderation

Detecting and Masking Personal Information

Introduction to Testing LLM Applications

Using Mocks for Deterministic Tests

Chapter 11: Advanced Capabilities

Processing Image Inputs with Multimodal Models

Benchmarking LLM Outputs with Evaluation Metrics

Preparing Datasets for Fine-Tuning

Identifying Performance Bottlenecks

Was this section helpful?

References

Retrieval-Augmented Generation for Large Language Models: A Survey, Yuan-Fang Li, Genggeng Hao, Chunyang Li, Jingyang Ding, Yutong Zhou, Yanmin An, Gang Chen, Jianxin Li, Jun Liu, Xiang Li, Huaijun Li, Yu Han, Haoran Chen, Weizhao Li, Guodong Long, Ruoyu Chen, Cheng Chen, Jie Xu, Chunjing Gan, Quan Z. Sheng, Lei Pan, Kun Xu, Chen Wang, Wei Luo, Shirui Pan, Lei Wang, Xiaohui Tao, Minjuan Zhu, Jie Hu, Faliang Huang, Yonghong Kang, Yi Hu, Jingjing Xu, Tongtong Li, Yuxin Li, Zaiyu Li, Jiawen Lin, Wei Chen, Xifeng Yan, Xiangliang Zhang, Hongzhi Yin, Kai Chen, Bo Li, Guanghua Wang, Quan Li, Zhicheng Dou, Yanyan Shen, Yiming Li, Feifei Li, Chuan Zhou, Pengfei Wang, Peng Zhang, Jinyang Li, Xiangyu Fan, Ruimao Zhang, Dong Guo, Wei Xu, Linzhang Wang, Zhenyu Wang, Yi Wu, Jiajin Li, Qiang Wei, Yang Yang, Xindong Wu, Jianshe Zhou, Zhaoyu Wang, Hao Wang, Xinzhi Gao, Yanchun Zhang, 2024 ACM Computing Surveys, Vol. 56 (ACM) DOI: 10.1145/3639089 - A comprehensive survey of Retrieval-Augmented Generation (RAG) in LLMs, covering architectures, challenges, and optimization strategies, relevant for understanding the performance considerations of RAG applications.
Best practices for API usage, OpenAI, 2023 (OpenAI) - Official guide from OpenAI providing strategies to reduce latency and cost when interacting with their APIs, including techniques like caching and batching that address identified bottlenecks.
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems, Martin Kleppmann, 2017 (O'Reilly Media) - A foundational book on building robust, scalable, and high-performance data systems, offering principles for understanding and optimizing latency, throughput, and cost in distributed applications, applicable to the underlying infrastructure of LLM systems.

© 2025 ApX Machine LearningEngineered with