In this post I want to highlight my fascination with Kafka and its usage.
Kafka is a broker just like “RabbitMQ” or “JMS”. So what’s the difference?
Difference are:
- It is distributed
- it is fault tolerant – because of messages being replicated across the cluster
- It does one thing and one thing only i.e. Transferring your messages and does it really well
- Highly scalable due to its distributed nature
- Tunable consistency
- Parallel processing of messages unlike others which do sequential
- Ordering guarantee per partition
How do you set it up?
Kafka is inherently distributed. So that means you are going to have multiple machine creating a Kafka cluster.
Kafka uses zookeeper for leader election among other things so you need to have zookeeper cluster already running somewhere. otherwise you can go to
https://www.tutorialspoint.com/zookeeper/zookeeper_installation.htm
You install Kafka on all the machines which will participate in Kafka Cluster and then open the ports where Kafka is running. Then provide configuration of all other machines in the cluster in each machine. e.g. if Kafka is running on machines K1,K2,K3 then K1 will have information of K2 and K3 and so son.
Yes its that simple
How does it work?
The way Kafka works is you create a topic, send a message and read message at the other end.
So if there are multiple machines how do you send message to Kafka? Well you keep a list of all the machines inside your code and then send message by high level Kafka Producer (which is a helper class in Kafka Driver). Kafka high level consumer class is available for reading messages.
Before you send a message create a topic first with a “replication factor”” which tells kafka hos many brokers will have the copy of this data
Some important terminologies related to Kafka are:
Topic – Where you publish message. You need to create beforehand
Partition – Number of consumers that can listen to a topic in parallel. Default is 1 but you can create hundreds
Ordering of Messages – Guaranteed for single partition
TTL – Time to live for messages on the disk – default 7 days
Group – Kafka guarantees that a message is only ever read by a single consumer in the group. so if you want that a message be delivered only once then just go and put all consumers in same group.
If you want to go deep here are some useful links
https://kafka.apache.org/08/design.html
http://www.tutorialspoint.com/apache_kafka/apache_kafka_consumer_group_example.htm