In this third article we finally get to installing Kafka in Docker. This installation is focused on installing in development environment but may concepts can be carried over to deployment ready scenarios.
The application that I’m using in this example is a simple Ruby on Rails application. I mention this only for completeness sake since the focus on this is Kafka and Docker.
Which Image Should I Use?
That depends entirely on your needs but the following is a list of possible images to use.
Popular Kafka Docker Images
Image | KRaft | ZooKeeper | Open Source? | Best For |
---|---|---|---|---|
bitnami/kafka | ✅ | ✅ | ✅ Apache 2.0 | Most users, easy switch between modes |
confluentinc/cp-kafka | ✅ | ✅ (< 8.0) | 🚫 Mostly OSS, but requires Confluent license for full features | Full Confluent Platform users |
apache/kafka | ✅ | 🚫 | ✅ Apache 2.0 | Minimalist, official |
wurstmeister/kafka | ❌ | ✅ | ✅ Apache 2.0 | Legacy tutorials only |
landoop/fast-data-dev | ❌ | ✅ | 🚫 Mixed | Quick sandboxing, not prod-ready |
Apache/Kafka Image
services:
app:
build:
context: .
stdin_open: true
tty: true
zookeeper:
image: zookeeper:3.8
ports:
- "2181:2181"
volumes:
- kafka_zookeeper_data:/data
kafka-1:
image: confluentinc/cp-kafka:7.9.2
ports:
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-1:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
volumes:
- cp_kafka_1_data:/var/lib/kafka/data
depends_on:
- zookeeper
kafka-2:
image: confluentinc/cp-kafka:7.9.2
ports:
- "9093:9093"
environment:
KAFKA_BROKER_ID: 2
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-2:9093
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
volumes:
- cp_kafka_2_data:/var/lib/kafka/data
depends_on:
- zookeeper
kafka-3:
image: confluentinc/cp-kafka:7.9.2
ports:
- "9094:9094"
environment:
KAFKA_BROKER_ID: 3
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-3:9094
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
volumes:
- cp_kafka_3_data:/var/lib/kafka/data
depends_on:
- zookeeper
kafka-4:
image: confluentinc/cp-kafka:7.9.2
ports:
- "9095:9095"
environment:
KAFKA_BROKER_ID: 4
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-4:9095
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
volumes:
- cp_kafka_4_data:/var/lib/kafka/data
depends_on:
- zookeeper
volumes:
cp_kafka_1_data:
cp_kafka_2_data:
cp_kafka_3_data:
cp_kafka_4_data:
kafka_zookeeper_data:
Let’s break down this config a little bit. Here we are installing a very basic ZooKeeper instance and using the cp-kafka
image version 7.9.2. This version is important because as of version 8.0 they go full KRaft. Here we are installing four Kafka nodes. In a development environment you could get away with a smaller amount of nodes. That said, I’m using four nodes here to better illustrate how this might look in terms of replication and leader elections.
A quick note about the ports section here. The declaration "9092:9092"
means that port 9092 is made available to the host operating system and any traffic destined for that port will be sent to the appropriate container. I usually don’t have a mapping like this for these kinds of services but I leave it here in case someone wants to write an application locally to communicate to the internal containers.
The environment variables that are associated with listeners both have PLAINTEXT
in them. This works in development but should not be used in a deployed application. If PLAINTEXT
is used this means that there is no client authentication used when connecting to Kafka, no encryption of the data transmitted to Kafka, and it does not guarantee data integrity (it could be intercepted and modified in transit). There are a number of alternatives to using PLAINTEXT
as shown below.
SSL
- encrypts traffic using TLSSASL_PLAINTEXT
- adds authentication but still no encryptionSASL_SSL
- both encryption and authentication
Bitnami/Kafka Image
services:
app:
build:
context: .
stdin_open: true
tty: true
kafka-1:
image: bitnami/kafka:4.0
environment:
- KAFKA_ENABLE_KRAFT=yes
- KAFKA_KRAFT_CLUSTER_ID=kraft-cluster
- KAFKA_NODE_ID=1
- KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093,4@kafka-4:9093
- KAFKA_CFG_PROCESS_ROLES=broker,controller
- KAFKA_CFG_LISTENERS=INTERNAL://:9092,CONTROLLER://:9093
- KAFKA_CFG_ADVERTISED_LISTENERS=INTERNAL://kafka-1:9092
- KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=INTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT
- KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
- KAFKA_CFG_NUM_PARTITIONS=1
- KAFKA_CFG_DEFAULT_REPLICATION_FACTOR=2
- KAFKA_CFG_TRANSACTION_STATE_LOG_REPLICATION_FACTOR=2
- KAFKA_CFG_TRANSACTION_STATE_LOG_MIN_ISR=1
- KAFKA_CFG_INTER_BROKER_LISTENER_NAME=INTERNAL
volumes:
- bitnami_kafka_1_data:/var/lib/kafka/data
kafka-2:
image: bitnami/kafka:4.0
environment:
- KAFKA_ENABLE_KRAFT=yes
- KAFKA_KRAFT_CLUSTER_ID=kraft-cluster
- KAFKA_NODE_ID=2
- KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093,4@kafka-4:9093
- KAFKA_CFG_PROCESS_ROLES=broker,controller
- KAFKA_CFG_LISTENERS=INTERNAL://:9092,CONTROLLER://:9093
- KAFKA_CFG_ADVERTISED_LISTENERS=INTERNAL://kafka-2:9092
- KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=INTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT
- KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
- KAFKA_CFG_INTER_BROKER_LISTENER_NAME=INTERNAL
volumes:
- bitnami_kafka_2_data:/var/lib/kafka/data
kafka-3:
image: bitnami/kafka:4.0
environment:
- KAFKA_ENABLE_KRAFT=yes
- KAFKA_KRAFT_CLUSTER_ID=kraft-cluster
- KAFKA_NODE_ID=3
- KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093,4@kafka-4:9093
- KAFKA_CFG_PROCESS_ROLES=broker,controller
- KAFKA_CFG_LISTENERS=INTERNAL://:9092,CONTROLLER://:9093
- KAFKA_CFG_ADVERTISED_LISTENERS=INTERNAL://kafka-3:9092
- KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=INTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT
- KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
- KAFKA_CFG_INTER_BROKER_LISTENER_NAME=INTERNAL
volumes:
- bitnami_kafka_3_data:/var/lib/kafka/data
kafka-4:
image: bitnami/kafka:4.0
environment:
- KAFKA_ENABLE_KRAFT=yes
- KAFKA_KRAFT_CLUSTER_ID=kraft-cluster
- KAFKA_NODE_ID=4
- KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093,4@kafka-4:9093
- KAFKA_CFG_PROCESS_ROLES=broker,controller
- KAFKA_CFG_LISTENERS=INTERNAL://:9092,CONTROLLER://:9093
- KAFKA_CFG_ADVERTISED_LISTENERS=INTERNAL://kafka-4:9092
- KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=INTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT
- KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
- KAFKA_CFG_INTER_BROKER_LISTENER_NAME=INTERNAL
volumes:
- bitnami_kafka_4_data:/var/lib/kafka/data
volumes:
bitnami_kafka_1_data:
bitnami_kafka_2_data:
bitnami_kafka_3_data:
bitnami_kafka_4_data:
This config uses the bitnami/kafka
image as well as the KRaft instead of ZooKeeper. This config is quite a bit different from the ZooKeeper config described above. So let’s break this down a bit.
The first thing that we have to do here is enable KRaft using the KAFKA_ENABLE_KRAFT
declaration.
A note about the kafka-1
node
This node is configured differently from the others. It has KAFKA_CFG_NUM_PARTITIONS
, KAFKA_CFG_DEFAULT_REPLICATION_FACTOR
, KAFKA_CFG_TRANSACTION_STATE_LOG_REPLICATION_FACTOR
, and KAFKA_CFG_TRANSACTION_STATE_LOG_MIN_ISR
.
KAFKA_CFG_NUM_PARTITIONS
This is the default number of partitions for new topics. This is important for scaling purposes and is the subject of a forthcoming blog post.
KAFKA_CFG_DEFAULT_REPLICATION_FACTOR
This is the number of copies of a partition exist. Setting this to two will made each partition have one leader and one follower. There is more to cover here and will be the subject of a forthcoming blog post.
KAFKA_CFG_TRANSACTION_STATE_LOG_REPLICATION_FACTOR
This sets how many replicas Kafka will maintain for the transaction state log. This is an internal topic used for exactly-one semantics and idempotent producer guarantees.
KAFKA_CFG_TRANSACTION_STATE_LOG_MIN_ISR
This sets the number of in-sync replicas required to commit the transaction state log.
Only the first node has these values defined because default configuration values only need to be defined once; usually by the first node that initializes the cluster.
Running the containers and more
This is just like every other Docker project that uses docker compose; simply run docker compose up
and your containers should start. There is much more to cover when it comes to Kafka but those are subjects for more blog articles.