Kafka in Docker - Prelude Part One - Consensus Algorithms

It might seem strange to start an article series about how to use Kafka in Docker by talking about consensus algorithms. That said, I think think it’s a good place to start because of the sheer number of concepts that are thrown at you when trying to setup Kafka. Even something seemingly simple as to what image to use can become complex quickly.

This isn’t meant to be a full treatise of all consensus algorithms ever created but just a primer to get us to the goal of Kafka in Docker. Although not used by Kafka itself, any discussion of consensus algorithms will likely mention one of the most well known out there; Paxos.

Note: In this article I used the words “algorithm” and “protocol” near interchangeably but I’d be remiss in not mentioning that they aren’t really interchangeable. Put simply, an algorithm is a well-defined set of steps or rules to solve a problem. A protocol is a set of rules for communication between systems.

Paxos

Paxos was created by Leslie Lamport, a Turing Award-winning computer scientist, and was defined in the seminal paper The Part-Time Parliament. This algorithm is designed to help distributed systems agree on a single value, even if some nodes fail or the messages are lost. In short, Paxos is how multiple machines can agree on “yes” or “no” and be sure that they are on the same page.

Note that Paxos today is largely considered to be a family of algorithms but the original was one algorithm.

How Paxos Works at High Level

This is an extremely high level overview of Paxos. Paxos is something that is famously hard to understand; so much so that Lamport himself later wrote Paxos Made Simple. Even then many still say that Paxos is still not simple or easy to understand.

Paxos has three main roles:

Proposers: Suggest values
Acceptors: Vote on values (a quorum must agree)
Learners: Learn the chosen value

The algorithm happens in two phases:

Prepare Phase
- A proposer sends a prepare request with a proposal number.
- Acceptors promise not accept lower-numbers proposals and reply with any previously accepted proposal.
Accept Phase
- A proposer sends a accept request with a value.
- If a majority of acceptors agree, the value is chose and can be learned.

Paxos Disadvantages

It’s hard to understand and implement.
It’s difficult to extend for real-world isslies like cluster membership or log replication.

This leads us to our next consensus algorithm / protocol; Zab.

Zab

Zab was originally implemented as part of ZooKeeper at Yahoo!. Its key contributors are Patrick Hunt, Mahadev Konar, Flavio P. Junqueria, and Benjamin Reed. Zab is mentioned in the paper ZooKeeper: Wait-free coordination for Internet-scale Systems but it became formalized and presented in the 2011 paper titled Zab: High-performance broadcast for primary-backup systems. This paper formally defines Zab as a protocol describing its theortical model, proof of safety and liveness as well as leader elections and crash recovery.

Zab has three main roles:

Leader: Proposes and broadcasts new state updates
Followers: Replicate updates from the leader and acknowledge them
Observers (optional): Receive updates but doesn’t participate in consensus

Zab has two major phases:

Recovery Phase
- A new leader is elected and initiates a new epoch.
- The leader gathers state history from a quorum of followers.
- The leader selects the most up-to-date history and syncs that to all followers.
- Once a quorum acknowledges that it has the same history, the system transitions to broadcasting mode.
Broadcast Phase
- A leader sends a proposal (a state change) to all followers.
- Followers persist the proposal to disk and send back an acknowledgment.
- When the Leader receives acknowledgments from a quorum, then it sends a commit message to all nodes.
- Followers apply the committed proposal in the exact order received.

Zab Disadvantages

It has a bottleneck because of reliance on a single Leader.
It has a static set of participants.
It has a complex recovery process.
Its support for partial replication support can be considered weak.

Finally we are ready to take a look at Raft.

Raft

Raft was created by Diego Ongaro and John Ousterhout and was introduced in the 2013 paper In Search of an Understandable Consensus Algorithm. The core motivation behind Raft was to make consensus understandable and practical to implement.

Raft has three main roles:

Leader: Handles all client requests, manages log replication, and sends heartbeats to maintain authority.
Followers: Respond to requests from the leader and apply committed log entries.
Candidates: Temporarily emerges during leader election.

Raft has two major phases:

Leader Election Phase
- A follower becomes a candidate if it doesn’t hear from a leader within a timeout period.
- The candidate then starts a new term and sends a RequestVote RPC to all other nodes.
- Other nodes vote for the candidate.
- If the candidate receives votes from a majority, it becomes the leader.
Log Replication Phase
- The leader accepts new log entries from clients.
- It appends each entry to its own log and sends an AppendEntries PRC to followers.
- Followers append the entry to their logs and reply with success.
- Once a majority of followers acknowledge an entry, the leader marks it as committed and notifies the followers in future AppendEntries.
- All nodes then apply the committed entries to their state machines in order.

Application to Kafka in Docker

You might be wondering, why did I have to know all of that stuff just to install Kafka in Docker? That’s a great question and to illustrate this I’ll show some snippets of a possible config that could appear in a Docker Compose file (using both ZooKeeper and Raft). Note that for brevity’s sake I have removed zookeeper3 and kafka3 from the following examples.

ZooKeeper Example

services:
  zookeeper1:
    image: confluentinc/cp-zookeeper:7.6.0
    hostname: zookeeper1
    container_name: zookeeper1
    environment:
      ZOOKEEPER_SERVER_ID: 1
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
      ZOOKEEPER_INIT_LIMIT: 5
      ZOOKEEPER_SYNC_LIMIT: 2
      ZOOKEEPER_SERVERS: zookeeper1:2888:3888;zookeeper2:2888:3888
    networks:
      - kafka-net

  zookeeper2:
    image: confluentinc/cp-zookeeper:7.6.0
    hostname: zookeeper2
    container_name: zookeeper2
    environment:
      ZOOKEEPER_SERVER_ID: 2
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
      ZOOKEEPER_INIT_LIMIT: 5
      ZOOKEEPER_SYNC_LIMIT: 2
      ZOOKEEPER_SERVERS: zookeeper1:2888:3888;zookeeper2:2888:3888
    networks:
      - kafka-net

  kafka1:
    image: confluentinc/cp-kafka:7.6.0
    hostname: kafka1
    container_name: kafka1
    ports:
      - "9092:9092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper1:2181,zookeeper2:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka1:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 2
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 3
    networks:
      - kafka-net

networks:
  kafka-net:

Raft Example

services:
  kafka1:
    image: bitnami/kafka:3.7
    container_name: kafka1
    ports:
      - "9092:9092"
    environment:
      KAFKA_ENABLE_KRAFT: yes
      KAFKA_CFG_NODE_ID: 1
      KAFKA_CFG_PROCESS_ROLES: controller,broker
      KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 1@kafka1:9093,2@kafka2:9093
      KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
      KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://kafka1:9092
      KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_KRAFT_CLUSTER_ID: kraft-cluster-1
      KAFKA_CFG_LOG_DIRS: /bitnami/kafka/data
    networks:
      - kafka-net

  kafka2:
    image: bitnami/kafka:3.7
    container_name: kafka2
    environment:
      KAFKA_ENABLE_KRAFT: yes
      KAFKA_CFG_NODE_ID: 2
      KAFKA_CFG_PROCESS_ROLES: controller,broker
      KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 1@kafka1:9093,2@kafka2:9093
      KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
      KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://kafka2:9092
      KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_KRAFT_CLUSTER_ID: kraft-cluster-1
      KAFKA_CFG_LOG_DIRS: /bitnami/kafka/data
    networks:
      - kafka-net

Without the some fundamentals in consensus algorithms, terms such as “zookeeper”, “kraft” (uses Raft), “controller”, “quorum”, “voters” can be confusing to understand and might seem extraordinarily complex. This is especially true comparing and contrasting that a seemingly trivial install of Postgresql or MySQL.

In the next article, I plan to cover the differences in images that you might encounter as well as ZooKeeper vs KRaft.