All Courses

Real-Time Data Pipelines with Apache Kafka and Flink

Chapter 1: Stream Processing Architectures and Semantics

Evolution of Distributed Logs

Lambda versus Kappa Architecture

Processing Guarantees and Semantics

Event Time versus Processing Time

Hands-on Practical: Designing a Kappa Pipeline

Chapter 2: Advanced Kafka Producer and Consumer Internals

Replication Protocols and ISRs

Idempotent Producers and Transactions

Custom Partitioning Strategies

Consumer Group Rebalancing protocols

Hands-on Practical: Implementing Transactional Writes

Chapter 3: Flink State Management and Checkpointing

State Backends: HashMap versus RocksDB

Asynchronous Barrier Snapshots

Incremental Checkpointing

State Schema Evolution

Hands-on Practical: State Migration with Savepoints

Chapter 4: Advanced Windowing and Time Attributes

Watermark Generation Strategies

Handling Late Data and Side Outputs

Custom Window Triggers and Evictors

Session Windows and Gap Analysis

Hands-on Practical: Building a Custom Trigger

Chapter 5: Low-Level Operations with ProcessFunctions

The ProcessFunction Hierarchy

Timer Services and Event Scheduling

Broadcast State Pattern

Async I/O for External Lookups

Hands-on Practical: Dynamic Rule Evaluation

Chapter 6: Production Deployment and Reliability

Serialization with Avro and Protobuf

Schema Registry Integration

Kafka Connect for Sink and Source

Failure Recovery Strategies

Hands-on Practical: Schema Evolution in Flight

Chapter 7: Performance Tuning and Monitoring

Identifying Backpressure

Memory Management and Slot Allocation

Tuning RocksDB Performance

Kafka Consumer Lag Analysis

Hands-on Practical: Resolving Skewed Data

Chapter 8: Real-Time AI and Feature Engineering

Online Feature Generation

Model Serving Patterns in Streams

Request-Response over Async Streams

Feature Store Integration

Hands-on Practical: Real-Time Inference Pipeline

Watermark Generation Strategies

Was this section helpful?

References

Working with Event Time, The Apache Flink Community, 2018 (Apache Software Foundation) - Official documentation for Apache Flink's event time processing, watermarks, watermark strategies, and idleness handling.
Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing, Tyler Akidau, Slava Chernyak, Reuven Lax, 2018 (O'Reilly Media) - A comprehensive guide covering the fundamental concepts of distributed stream processing, including event time, watermarks, windowing, and correctness guarantees.
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Unified Stream Processing, Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J. Fernández-Moctezuma, Rui Li, Davorin Marković, Francis McCabe, Jean-Baptiste Onofré, Abigail P. O'Neil, Paul Nordstrom, and Sam Whittle, 2015 Proceedings of the VLDB Endowment, Vol. 8 (VLDB Endowment) DOI: 10.14778/2824032.2824076 - Foundational paper introducing the Dataflow Model, which influenced modern stream processing systems like Flink, detailing event time, processing time, and watermark semantics.

© 2026 ApX Machine LearningEngineered with