Distributed systems inherently face a difficult trade-off between consistency and availability. Apache Kafka's default configuration allows a producer to retry sending messages when network jitter or temporary broker failures occur. If the original message was successfully persisted but the acknowledgement was lost on the wire, this retry mechanism results in duplicate records in the log. This behavior satisfies "at-least-once" semantics. However, for financial ledgers, inventory management, or any stateful aggregation where correctness is critical, duplicates introduce state corruption.To bridge the gap between reliable delivery and data correctness, Kafka introduces two distinct but related capabilities: Idempotent Producers and Transactional API. These features elevate the system from simple message passing to providing "exactly-once" processing semantics (EOS).The Idempotent ProducerIdempotence ensures that a specific message is written to a specific partition exactly once, regardless of how many times the producer sends it. This is a local guarantee, scoped to a single producer session and a single target partition. It protects against network-induced duplicates but does not handle cross-partition atomicity or application crashes.PID and Sequence NumbersWhen enable.idempotence=true is configured, the producer undergoes an initialization protocol. Upon startup, it requests a Producer ID (PID) from the broker. This PID is transparent to the user and is reset if the application restarts.Along with the PID, the producer maintains a monotonically increasing sequence number ($SeqNum$) for each partition it writes to. Every batch of messages sent to the broker includes the PID and the base sequence number. The broker maintains a map of the last committed sequence number ($LastSeqNum$) for every PID-partition pair in memory and in the .snapshot files.The broker validates incoming requests using specific logic. For a new request with sequence number $S_{new}$:If $S_{new} = LastSeqNum + 1$, the broker accepts the write and updates $LastSeqNum$.If $S_{new} \le LastSeqNum$, the broker identifies this as a duplicate caused by a retry. It returns an acknowledgement immediately without writing the data again.If $S_{new} > LastSeqNum + 1$, it indicates a message gap, implying data loss occurred in the stream. The broker rejects this with an OutOfOrderSequenceException.This mechanism effectively deduplicates messages at the broker level with negligible performance overhead. It does not require a two-phase commit or coordination with other services.digraph G { rankdir=TB; node [fontname="Helvetica", shape=box, style=filled, color="#adb5bd"]; edge [fontname="Helvetica", color="#868e96"]; subgraph cluster_producer { label = "Producer Client"; style=filled; color="#e9ecef"; Msg1 [label="Msg A (Seq: 10)", fillcolor="#a5d8ff"]; Msg2 [label="Msg A (Seq: 10) [Retry]", fillcolor="#a5d8ff"]; } subgraph cluster_broker { label = "Broker Leader"; style=filled; color="#f8f9fa"; State [label="Current State\nPID: 501\nLastSeq: 9", fillcolor="#b2f2bb"]; Check1 [label="Check Seq 10\n(9 + 1 = 10)", fillcolor="#e9ecef"]; Check2 [label="Check Seq 10\n(10 <= 10)", fillcolor="#e9ecef"]; Log [label="Partition Log\n...Msg(9), Msg(10)", fillcolor="#ced4da"]; } Msg1 -> Check1 [label="First Attempt"]; State -> Check1; Check1 -> Log [label="Commit", color="#20c997"]; Log -> State [label="Update LastSeq=10", style=dashed]; Msg2 -> Check2 [label="Retry (Network Delay)"]; State -> Check2; Check2 -> Msg2 [label="Ack (No Write)", style=dotted]; }The sequence number validation logic prevents duplicate persistence when network retries occur.Atomic Multi-Partition WritesWhile idempotence solves duplication within a single stream, complex pipelines often require atomicity across multiple components. A common pattern is "consume-process-produce," where an application consumes from source Topic A, applies logic, and writes results to sink Topic B.If the application crashes after writing to Topic B but before committing the offset for Topic A, the system enters an inconsistent state. Upon restart, the consumer reads the message from Topic A again (since the offset was not committed) and produces a duplicate to Topic B. Idempotence cannot solve this because the new producer instance has a new PID.Kafka Transactions allow an application to write to multiple partitions (and commit consumer offsets) atomically. Either all writes succeed and become visible, or all are discarded.The Transaction Coordinator and LogTransactions introduce a server-side module called the Transaction Coordinator. This component manages the state of transactions. The state is persisted in an internal topic named __transaction_state. This internal topic functions similarly to the __consumer_offsets topic but uses a specialized binary format for transaction metadata.To use transactions, the producer must configure a static transactional.id. Unlike the ephemeral PID, the transactional.id persists across application restarts. This persistence enables the coordinator to identify resurrected instances of the same application.The Transactional ProtocolThe write path involves a sophisticated coordination protocol.Find Coordinator: The producer queries the cluster to locate the Transaction Coordinator responsible for its transactional.id (determined by hashing the ID).Initialize Transaction: The producer registers with the coordinator. If a previous instance with the same transactional.id is still active, the coordinator fences it off by incrementing the producer epoch.Begin Transaction: The producer starts a local transaction.Consume and Produce: The producer writes data to various partitions. Crucially, the coordinator writes "ongoing" status to the transaction log and notes which partitions are involved in this transaction.Commit or Abort: The client initiates a commit.Two-Phase Commit (2PC):Prepare: The coordinator writes a PREPARE_COMMIT message to the __transaction_state log. Once this message is persisted, the transaction is guaranteed to complete.Commit: The coordinator writes specialized Control Messages (Markers) to every partition involved in the transaction. These markers are invisible to standard consumers but dictate whether the preceding data messages should be treated as committed or aborted.Complete: The coordinator writes COMMITTED to the transaction log.digraph G { rankdir=TB; node [fontname="Helvetica", shape=box, style=filled]; edge [fontname="Helvetica", color="#868e96"]; subgraph cluster_0 { label = "Broker / Infrastructure"; style=filled; color="#f1f3f5"; Coord [label="Transaction\nCoordinator", fillcolor="#d0bfff"]; TxLog [label="__transaction_state", shape=cylinder, fillcolor="#ced4da"]; TopicA [label="Topic A\n(Data Partition)", shape=cylinder, fillcolor="#91a7ff"]; TopicB [label="Topic B\n(Data Partition)", shape=cylinder, fillcolor="#91a7ff"]; } Prod [label="Producer\n(transactional.id)", fillcolor="#ffec99"]; Prod -> Coord [label="1. Init & Begin"]; Prod -> TopicA [label="2. Write Data"]; Prod -> TopicB [label="2. Write Data"]; Coord -> TxLog [label="3. Log State"]; Prod -> Coord [label="4. Commit"]; Coord -> TopicA [label="5. Write Commit Marker", color="#fa5252", style=dashed]; Coord -> TopicB [label="5. Write Commit Marker", color="#fa5252", style=dashed]; }The Transaction Coordinator manages the lifecycle, writing commit markers to data partitions only after the transaction state is persisted.Isolation Levels and Read SemanticsThe implementation of transactions in Kafka relies on the fact that producers write data to the log immediately, regardless of the transaction status. An uncommitted message resides on the partition disk just like a committed one. The distinction is enforced at the consumer level via the isolation.level configuration.read_uncommitted (Default)In this mode, consumers read all messages in offset order. They will see messages from open transactions and even messages from aborted transactions. This mode offers lower latency but no consistency guarantees for transactional workflows.read_committedWhen a consumer is configured with isolation.level=read_committed, it processes messages only up to the Last Stable Offset (LSO). The LSO is defined as the first offset of the earliest open transaction.$$ \text{LSO} = \min(\text{FirstOffsetOfOpenTx}_1, \text{FirstOffsetOfOpenTx}_2, \dots) $$Messages appearing after the LSO are buffered in the consumer until the transaction is decided.If a COMMIT marker is encountered, the buffered messages are released to the application.If an ABORT marker is encountered, the buffered messages are discarded, and the consumer advances past the marker.This buffering mechanism implies that read_committed consumers may experience higher latency if concurrent transactions are long-running. A stalled transaction holds back the LSO, preventing consumers from reading subsequent committed messages from other producers on the same partition.Zombie Fencing and EpochsIn distributed environments, a common failure mode is the "Zombie Producer." This occurs when a producer process hangs (e.g., during a long garbage collection pause) or loses network connectivity. The cluster assumes the producer is dead, and a new instance starts. If the old producer wakes up and attempts to write, it could corrupt the data.Kafka solves this using Epoch Fencing. When the new producer instance initializes with the transactional.id, the coordinator assigns it a higher epoch number. The coordinator (and all brokers involved) will subsequently reject any write requests carrying the old epoch. This fencing is handled automatically by the protocol, ensuring that only one valid writer exists for a transactional ID at any given moment.Operational NotesImplementing transactions incurs a performance cost. The overhead comes from:Producer: Interaction with the coordinator and the addition of control batches.Consumer: Buffering overhead in the read_committed mode.Broker: Management of the __transaction_state log and marker replication.To optimize, producers should aim for larger transaction batches. Starting and committing a transaction for every single message severely degrades throughput due to the constant round-trips to the coordinator. Conversely, extremely large transactions increase the risk of blocking consumers due to LSO lag. A balanced approach involves tuning the transaction duration and size based on the specific latency requirements of the downstream consumers.