Kafka Guide
Key Concepts:
- topic - category or feed
- producer - sends to a topic
- consumer - receives from a topic
- broker - Kafka instance or cluster
Kafka Use Cases:
- Messaging
- Website Activity Tracking
- Metrics
- Log Aggregation
- Stream Processing
- Event Sourcing
- Commit Log
Kafka Install and Single Instance Test Setup
Download Kafka HERE. Make sure you grab the latest version. For example at the time of this writing we used this version: kafka_2.13-2.8.0.tgz. Make sure you swap in the correct/newer version number throught the rest of this guide. Note that for the version above the first part of the version number refers to the version of Scala that it was built for (2.13) and the second part of the number is the version of Kafka (2.8.0).
Optionally download the keys and asc file. Then verify the integrity of the downloaded file:
gpg --import KEYS
gpg --verify kafka_2.13-2.8.0.tgz.asc kafka_2.13-2.8.0.tgz
Unpack the tgz file:
tar -xzf kafka_2.13-2.8.0.tgz
Add the Kafka bin directory to your path:
export PATH=$PATH:/home/user1/kafka_2.13-2.8.0/bin
Also add it ot your .bashrc file so that it will persist you open a new terminal:
vi ~/.bashrcexport PATH=$PATH:/home/user1/kafka_2.13-2.8.0/bin
Launch a version of Zookeeper that is distributed with Kafka. This will continue running in the foreground.
cd kafka_2.13-2.8.0
zookeeper-server-start.sh config/zookeeper.properties
Open another terminal and start Kafka:
cd kafka_2.13-2.8.0
kafka-server-start.sh config/server.properties
Open yet another terminal and create a topic:
cd kafka_2.13-2.8.0
kafka-topics.sh --create --topic test --bootstrap-server localhost:9092
Show some details about the topic:
kafka-topics.sh --describe --topic test --bootstrap-server localhost:9092
Send events to the topic:
kafka-console-producer.sh --topic test --bootstrap-server localhost:9092
Open another terminal and consume events from the topic:
cd kafka_2.13-2.8.0
kafka-console-consumer.sh --topic test --from-beginning --bootstrap-server localhost:9092
Use [ctrl]-C to shut down each client and service in this order:
- producer
- consumer
- Kafka broker
- Zookeeper
Data is kept here by default: /tmp/kafka-logs/
You can clear / clean up your data after shutting everything down with the following:
rm -rf /tmp/kafka-logs /tmp/zookeeper
This will wipe out your events and topics.
Simple Kafka Cluster Setup
cd kafka_2.13-2.8.0
cp config/server.properties config/server-1.properties
cp config/server.properties config/server-2.properties
The ids should always be unique. The port and log dir need to change because These are running on a shared host..
Edit the config for the second cluster node. Make sure these values either added or updated. They won’t necessarily be in this order.
config/server-1.propertiesbroker.id=1 port=9093 log.dir=/tmp/kafka-logs-1
Do the same for the third cluster instance.
config/server-2.propertiesbroker.id=2 port=9094 log.dir=/tmp/kafka-logs-2
Start the second node in a new terminal:
cd kafka_2.13-2.8.0
bin/kafka-server-start.sh config/server-1.properties
Start the third node in a new terminal:
cd kafka_2.13-2.8.0
bin/kafka-server-start.sh config/server-2.properties
You can specify the replication factor on a new topic like this:
cd kafka_2.13-2.8.0
kafka-topics.sh --describe --topic test3 --bootstrap-server localhost:9092
You can verify the details of the topic like this:
kafka-topics.sh --describe --topic test --bootstrap-server localhost:9092
Beyond this publishing and consuming should work similar to our testing above. You will be able to connect to any of the three nodes in the cluster when publishing and consuming thought.
Concepts
- leader - the leader
- replicas - anything the topic is replicated onto
- isr - is the set of “in-sync” replicas
NOTE
- killing 2 out of 3 cluster nodes breaks the cluster
- starting 1 back up for 2 out of 3, it will start working again
- messages while the cluster was down will be lost
More Concepts and Features
-
each topic has a partitioned log
- ordered, immutable sequence of messages that is continually appended to—a commit log
- message
- sequential id number called the offse
- message
-
data retention configurable, not based on consumption
- each partition:
- leader and followers
- distributed and replicated
- leaders can be elected
-
each server can be a leader for some and follower for other partitions
-
producers select the topic and partition ( round-robin, etc. )
- consumer groups
- like a hybrid of queue/topic
- message goes to one consumer inctance in a consumer group
- like a queue if: consumer instances have the same consumer group ( balanced )
- like a topic if: each instance has different group
- data is delivered to consumers in oder for each partition
-
can’t have more consumers than partitions
- for total ordering: 1 partition and 1 consumer
FAQ
- how to start kafka
- how to stop kafka
- how to create a kafka topic
- how to delete a kafka topic
- how to check kafka version
- how to install kafka ( on linux )
- how to install kafka on windows
- how to install kafka on mac
- how to install kafka on Docker
- how to rollback message from kafka
- how to test kafka
- kafka vs jms
- JMS with Kafka
- Kafka Kerberos - authentication: https://docs.confluent.io/2.0.0/kafka/sasl.html
- kafka cluster sizing