Kafka Guide
- topic - category or feed
- producer
- consumer
-
broker ( Kafka cluster )
-
each topic has a partitioned log
- ordered, immutable sequence of messages that is continually appended to—a commit log
- message
- sequential id number called the offse
- message
-
data retention configurable, not based on consumption
- each partition:
- leader and followers
- distributed and replicated
- leaders can be elected
-
each server can be a leader for some and follower for other partitions
-
producers select the topic and partition ( round-robin, etc. )
- consumer groups
- like a hybrid of queue/topic
- message goes to one consumer inctance in a consumer group
- like a queue if: consumer instances have the same consumer group ( balanced )
- like a topic if: each instance has different group
- data is delivered to consumers in oder for each partition
-
can’t have more consumers than partitions
- for total ordering: 1 partition and 1 consumer
use cases:
- Messaging
- Website Activity Tracking
- Metrics
- Log Aggregation
- Stream Processing
- Event Sourcing
- Commit Log
tar -xzf kafka_2.10-0.8.2.0.tgz
cd kafka_2.10-0.8.2.0
bin/zookeeper-server-start.sh config/zookeeper.properties # quick and dirty zookeeper packaged with kafka
bin/kafka-server-start.sh config/server.properties # start kafka
# create topic
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
# list topics
bin/kafka-topics.sh --list --zookeeper localhost:2181
# run producer, send messages
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
# run consumer, dump messages
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
Simple Kafka Cluster Setup
cp config/server.properties config/server-1.properties
cp config/server.properties config/server-2.properties
config/server-1.properties # ids always unique. port and log change because of shared host.
broker.id=1
port=9093
log.dir=/tmp/kafka-logs-1
config/server-2.properties
broker.id=2
port=9094
log.dir=/tmp/kafka-logs-2
bin/kafka-server-start.sh config/server-1.properties & # start more brokers in the cluster
bin/kafka-server-start.sh config/server-2.properties &
# new replicated topic
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic
bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic # check new topic
bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test # check old topic
- leader - the leader
- replicas - anything the topic is replicated onto
- isr - is the set of “in-sync” replicas
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-replicated-topic # produce
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic my-replicated-topic # consume
kill -9 7564 # kill leader
bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic # check
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic my-replicated-topic # can still consume
NOTE
- killing 2 out of 3 cluster nodes breaks the cluster
- starting 1 back up for 2 out of 3, it will start working again
- messages while the cluster was down will be lost
#FAQ:
- how to start kafka
- how to stop kafka
- how to create a kafka topic
- how to delete a kafka topic
- how to check kafka version
- how to install kafka ( on linux )
- how to install kafka on windows
- how to install kafka on mac
- how to install kafka on Docker
- how to rollback message from kafka
- how to test kafka
- kafka vs jms
- JMS with Kafka
- Kafka Kerberos - authentication: https://docs.confluent.io/2.0.0/kafka/sasl.html
- kafka cluster sizing