Kafka In Docker - Installation Time

In this third article we finally get to installing Kafka in Docker. This installation is focused on installing in development environment but may concepts can be carried over to deployment ready scenarios.

The application that I’m using in this example is a simple Ruby on Rails application. I mention this only for completeness sake since the focus on this is Kafka and Docker.

Which Image Should I Use?

That depends entirely on your needs but the following is a list of possible images to use.

Popular Kafka Docker Images

Image	KRaft	ZooKeeper	Open Source?	Best For
bitnami/kafka	✅	✅	✅ Apache 2.0	Most users, easy switch between modes
confluentinc/cp-kafka	✅	✅ (< 8.0)	🚫 Mostly OSS, but requires Confluent license for full features	Full Confluent Platform users
apache/kafka	✅	🚫	✅ Apache 2.0	Minimalist, official
wurstmeister/kafka	❌	✅	✅ Apache 2.0	Legacy tutorials only
landoop/fast-data-dev	❌	✅	🚫 Mixed	Quick sandboxing, not prod-ready

Apache/Kafka Image

services:
  app:
    build:
      context: .
    stdin_open: true
    tty: true

  zookeeper:
    image: zookeeper:3.8
    ports:
      - "2181:2181"
    volumes:
      - kafka_zookeeper_data:/data

  kafka-1:
    image: confluentinc/cp-kafka:7.9.2
    ports:
      - "9092:9092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-1:9092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
    volumes:
      - cp_kafka_1_data:/var/lib/kafka/data
    depends_on:
      - zookeeper

  kafka-2:
    image: confluentinc/cp-kafka:7.9.2
    ports:
      - "9093:9093"
    environment:
      KAFKA_BROKER_ID: 2
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-2:9093
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
    volumes:
      - cp_kafka_2_data:/var/lib/kafka/data
    depends_on:
      - zookeeper

  kafka-3:
    image: confluentinc/cp-kafka:7.9.2
    ports:
      - "9094:9094"
    environment:
      KAFKA_BROKER_ID: 3
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-3:9094
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
    volumes:
      - cp_kafka_3_data:/var/lib/kafka/data
    depends_on:
      - zookeeper

  kafka-4:
    image: confluentinc/cp-kafka:7.9.2
    ports:
      - "9095:9095"
    environment:
      KAFKA_BROKER_ID: 4
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-4:9095
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
    volumes:
      - cp_kafka_4_data:/var/lib/kafka/data
    depends_on:
      - zookeeper

volumes:
  cp_kafka_1_data:
  cp_kafka_2_data:
  cp_kafka_3_data:
  cp_kafka_4_data:
  kafka_zookeeper_data:

Let’s break down this config a little bit. Here we are installing a very basic ZooKeeper instance and using the cp-kafka image version 7.9.2. This version is important because as of version 8.0 they go full KRaft. Here we are installing four Kafka nodes. In a development environment you could get away with a smaller amount of nodes. That said, I’m using four nodes here to better illustrate how this might look in terms of replication and leader elections.

A quick note about the ports section here. The declaration "9092:9092" means that port 9092 is made available to the host operating system and any traffic destined for that port will be sent to the appropriate container. I usually don’t have a mapping like this for these kinds of services but I leave it here in case someone wants to write an application locally to communicate to the internal containers.

The environment variables that are associated with listeners both have PLAINTEXT in them. This works in development but should not be used in a deployed application. If PLAINTEXT is used this means that there is no client authentication used when connecting to Kafka, no encryption of the data transmitted to Kafka, and it does not guarantee data integrity (it could be intercepted and modified in transit). There are a number of alternatives to using PLAINTEXT as shown below.

SSL - encrypts traffic using TLS
SASL_PLAINTEXT - adds authentication but still no encryption
SASL_SSL - both encryption and authentication

Bitnami/Kafka Image

services:
  app:
    build:
      context: .
    stdin_open: true
    tty: true

  kafka-1:
    image: bitnami/kafka:4.0
    environment:
      - KAFKA_ENABLE_KRAFT=yes
      - KAFKA_KRAFT_CLUSTER_ID=kraft-cluster
      - KAFKA_NODE_ID=1
      - KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093,4@kafka-4:9093
      - KAFKA_CFG_PROCESS_ROLES=broker,controller
      - KAFKA_CFG_LISTENERS=INTERNAL://:9092,CONTROLLER://:9093
      - KAFKA_CFG_ADVERTISED_LISTENERS=INTERNAL://kafka-1:9092
      - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=INTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT
      - KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
      - KAFKA_CFG_NUM_PARTITIONS=1
      - KAFKA_CFG_DEFAULT_REPLICATION_FACTOR=2
      - KAFKA_CFG_TRANSACTION_STATE_LOG_REPLICATION_FACTOR=2
      - KAFKA_CFG_TRANSACTION_STATE_LOG_MIN_ISR=1
      - KAFKA_CFG_INTER_BROKER_LISTENER_NAME=INTERNAL
    volumes:
      - bitnami_kafka_1_data:/var/lib/kafka/data

  kafka-2:
    image: bitnami/kafka:4.0
    environment:
      - KAFKA_ENABLE_KRAFT=yes
      - KAFKA_KRAFT_CLUSTER_ID=kraft-cluster
      - KAFKA_NODE_ID=2
      - KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093,4@kafka-4:9093
      - KAFKA_CFG_PROCESS_ROLES=broker,controller
      - KAFKA_CFG_LISTENERS=INTERNAL://:9092,CONTROLLER://:9093
      - KAFKA_CFG_ADVERTISED_LISTENERS=INTERNAL://kafka-2:9092
      - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=INTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT
      - KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
      - KAFKA_CFG_INTER_BROKER_LISTENER_NAME=INTERNAL
    volumes:
      - bitnami_kafka_2_data:/var/lib/kafka/data

  kafka-3:
    image: bitnami/kafka:4.0
    environment:
      - KAFKA_ENABLE_KRAFT=yes
      - KAFKA_KRAFT_CLUSTER_ID=kraft-cluster
      - KAFKA_NODE_ID=3
      - KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093,4@kafka-4:9093
      - KAFKA_CFG_PROCESS_ROLES=broker,controller
      - KAFKA_CFG_LISTENERS=INTERNAL://:9092,CONTROLLER://:9093
      - KAFKA_CFG_ADVERTISED_LISTENERS=INTERNAL://kafka-3:9092
      - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=INTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT
      - KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
      - KAFKA_CFG_INTER_BROKER_LISTENER_NAME=INTERNAL
    volumes:
      - bitnami_kafka_3_data:/var/lib/kafka/data

  kafka-4:
    image: bitnami/kafka:4.0
    environment:
      - KAFKA_ENABLE_KRAFT=yes
      - KAFKA_KRAFT_CLUSTER_ID=kraft-cluster
      - KAFKA_NODE_ID=4
      - KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093,4@kafka-4:9093
      - KAFKA_CFG_PROCESS_ROLES=broker,controller
      - KAFKA_CFG_LISTENERS=INTERNAL://:9092,CONTROLLER://:9093
      - KAFKA_CFG_ADVERTISED_LISTENERS=INTERNAL://kafka-4:9092
      - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=INTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT
      - KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
      - KAFKA_CFG_INTER_BROKER_LISTENER_NAME=INTERNAL
    volumes:
      - bitnami_kafka_4_data:/var/lib/kafka/data

volumes:
  bitnami_kafka_1_data:
  bitnami_kafka_2_data:
  bitnami_kafka_3_data:
  bitnami_kafka_4_data:

This config uses the bitnami/kafka image as well as the KRaft instead of ZooKeeper. This config is quite a bit different from the ZooKeeper config described above. So let’s break this down a bit.

The first thing that we have to do here is enable KRaft using the KAFKA_ENABLE_KRAFT declaration.

A note about the `kafka-1` node

This node is configured differently from the others. It has KAFKA_CFG_NUM_PARTITIONS, KAFKA_CFG_DEFAULT_REPLICATION_FACTOR, KAFKA_CFG_TRANSACTION_STATE_LOG_REPLICATION_FACTOR, and KAFKA_CFG_TRANSACTION_STATE_LOG_MIN_ISR.

`KAFKA_CFG_NUM_PARTITIONS`

This is the default number of partitions for new topics. This is important for scaling purposes and is the subject of a forthcoming blog post.

`KAFKA_CFG_DEFAULT_REPLICATION_FACTOR`

This is the number of copies of a partition exist. Setting this to two will made each partition have one leader and one follower. There is more to cover here and will be the subject of a forthcoming blog post.

`KAFKA_CFG_TRANSACTION_STATE_LOG_REPLICATION_FACTOR`

This sets how many replicas Kafka will maintain for the transaction state log. This is an internal topic used for exactly-one semantics and idempotent producer guarantees.

`KAFKA_CFG_TRANSACTION_STATE_LOG_MIN_ISR`

This sets the number of in-sync replicas required to commit the transaction state log.

Only the first node has these values defined because default configuration values only need to be defined once; usually by the first node that initializes the cluster.

Running the containers and more

This is just like every other Docker project that uses docker compose; simply run docker compose up and your containers should start. There is much more to cover when it comes to Kafka but those are subjects for more blog articles.

Kafka In Docker - Installation Time

Which Image Should I Use?

Popular Kafka Docker Images

Apache/Kafka Image

Bitnami/Kafka Image

A note about the kafka-1 node

KAFKA_CFG_NUM_PARTITIONS

KAFKA_CFG_DEFAULT_REPLICATION_FACTOR

KAFKA_CFG_TRANSACTION_STATE_LOG_REPLICATION_FACTOR

KAFKA_CFG_TRANSACTION_STATE_LOG_MIN_ISR