Apache Kafka – Simple Tutorial

 

In this post I want to highlight my fascination with Kafka and its usage.

Kafka is a broker just like “RabbitMQ” or “JMS”. So what’s the difference?

Difference are:

  • It is distributed
  • it is fault tolerant – because of messages being replicated across the cluster
  • It does one thing and one thing only i.e. Transferring your messages and does it really well
  • Highly scalable due to its distributed nature
  • Tunable consistency
  • Parallel processing of messages unlike others which do sequential
  • Ordering guarantee per partition

How do you set it up?

Kafka is inherently distributed. So that means you are going to have multiple machine creating a Kafka cluster.

Kafka uses zookeeper for leader election among other things so you need to have zookeeper cluster already running somewhere. otherwise you can go to

https://www.tutorialspoint.com/zookeeper/zookeeper_installation.htm

You install Kafka on all the machines which will participate in Kafka Cluster and then open the ports where Kafka is running. Then provide configuration of all other machines in the cluster in each machine. e.g. if Kafka is running on machines K1,K2,K3 then K1 will have information of K2 and K3 and so son.

Yes its that simple

How does it work?

The way Kafka works is you create a topic, send a message and read message at the other end.

So if there are multiple machines how do you send message to Kafka? Well you keep a list of all the machines inside your code and then send message by high level Kafka Producer (which is a helper class in Kafka Driver). Kafka high level consumer class is available for reading messages.

Before you send a message create a topic first with a “replication factor”” which tells kafka hos many brokers will have the copy of this data

Some important terminologies related to Kafka are:

Topic – Where you publish message. You need to create beforehand

Partition – Number of consumers that can listen to a topic in parallel. Default is 1 but you can create hundreds

Ordering of Messages – Guaranteed for single partition

TTL – Time to live for messages on the disk – default 7 days

Group – Kafka guarantees that a message is only ever read by a single consumer in the group. so if you want that a message be delivered only once then just go and put all consumers in same group.

If you want to go deep here are some useful links

https://kafka.apache.org/08/design.html

http://www.tutorialspoint.com/apache_kafka/apache_kafka_consumer_group_example.htm

Published by Marut Singh

Welcome to my blog. I am software architect, mentor, corporate trainer. Developing software for last 15 years in various domains..I work in different stacks and software architecture is my area of speciality..worked in c++, java, c#, scala, play vert.x, spring, nosql blah blah blah. And offcourse cloud technologies. Software Engineer cant imagine life without cloud :-) Always exploring new languages and tools to make sure that I do not loose touch, to ensure that I delivery high quality software in reasonable cost at all times. Love TDD, BDD, Agile and more than anything simplicity.. Normally I am very helpful so if you need some help do get in touch.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: