Why Kafka?

Today Many companies in startup world are completely dependent on AWS infrastructure. Its a good strategy since you do not have to manage your own infrastructure and saves you from lot of headache.

Today we will discuss a bit about brokers available in AWS infrastructure. AWS has mainly 2 types of broker offering

a. SQS (Simple queue service) – More like ActiveMQ, RabbitMQ

b. Kinesis (Distributed, fault tolerant, highly scalable message broker) – less features but optimized for ingesting and delivering massive number of events at extremely low latency.

Design of Kinesis is inspired by Linked-in donated Kafka. Linked in processes billions of events per day using Kafka and it’s apache top level project which is being used in many highly scalable architecture.

I want to focus in this post on some of the key differences between Kinesis and Kafka. As stated in the beginning working with AWS infrastructure is a good thing but over-reliance on AWS infrastructure has some major problems.

a. You are vendor locked-in so tomorrow if you want to shift to Digital Ocean or even own infrastructure you will not be able to do so.

b. You are limited by the restrictions put by AWS like how many transactions you can do per unit of time

so, in the light of above 2 points I will try to explain where Kafka should be used instead of RabbitMQ and in-place of Kinesis

RabbitMQ Pros:

  • Simple to install and manage
  • Excellent routing capabilities based on rules
  • Decent performance
  • Cloud Installation also available (CloudAMQP)

Cons:

  • Not-distributed
  • Unable to scale to high loads (due to non-distributed nature)

Kafka Pros:

  • Amazingly fast reads and writes (due to sequential reads and writes only)
  • Does one thing and one thing only i.e. to transfer messages reliably
  • Does provide load balancing of consumers via partitions of topic so real parallel-processing no ifs and buts
  • No restrictions on transaction numbers unlike Kinesis

Cons:

  • Complicated to setup cluster compared to rabbitmq
  • Dependency on Zookeeper
  • No Routing

So bottom line

  • Use RabbitMQ for any simple use case
  • Use Kafka if you want insane scalability and you are ready to put effort in learning kafka topics and partitions
  • Use Kinesis if setting up kafka is not your cup of tea
Kafka Kinesis RabbitMQ
Routing Basic (Topic Based) Basic (Topic Based) Advanced (Exchange based)
Throughput Extremely high Extremely high
Latency Depends on region (Not available in some regions hence Http call) Very low High (Compared to other 2)
Ease of implementation Moderate..but setting up cluster requires effort Moderate (but identifying number of shards can be tough) Easy
Restrictions on transactions None 5 reads per seconds and 1000 write/sec/shard None
Types of applications High throughput High throughput Low to medium throughput

As always drop me an email if still confused about your use case

Happy Coding !!

Why Kafka?

Why Vert.x?

In this post I will try to throw some light on capabilities and need of this new exciting framework Vert.x.

I have been following this framework for last few years and it is great to see that it is now being adopted by many large companies means it is stabilized now.

Promoted by Redhat vert.x is a very lightweight library for distributed computing.

Today almost all applications are n-tier. This is not an invention anymore but need of the hour. You can not scale a monolithic applications and those days are  long gone when you could just create a scaffolding app in Ruby on rails and keep modifying the same app to run your business.

There are many reasons for why an application should be divided into different components a.k.a services (or microservices). But I see capability to iterate and roll out new features is very important to create a distributed systems.

So today Business layer is divided in n number of different components. This is not something new. Many enterprise applications have been built that way.

What is changing now is the tools that lets you create this distributed architecture without actually going into nitty-gritty of distributed software design.

Let’s consider a scenario. You have a web applications (let’s say ruby on rails app).

You start getting lot of traffic which your single server can not handle anymore.

So what do you do?

You hide behind a load balancer and you spawn a new instance. When new requests come they get redirected to one of the server using some load balancing strategy (round-robin?)

Again if traffic spikes you repeat the process.

There is a problem here

a- It works only when you have monolithic application

b- It does not scale.

If you are looking to build up a high traffic website then you just cant keep adding up servers to load balancer and assume everything will work fine.

So many large applications will divide business layer, the layer which does most of the useful tasks into multiple different components and expose APIs which are consumed by Web Layer.All these business components will be doing some specific tasks and each one of them will be scaled independently.

Great…seems like an scalable solution. But now we have created another problem

How do we manage these different components or services? How do we discover them in system? We can go to our good old load balancer and start assigining each service a DNS and let load balancer do the job.

This seems unmanageable. How many DNS we would like to have and how many times we will have to reconfigure load balacner? And why in this world I would want to have DNS assigned to each of these services..I certainly do not want to expose them to outside world. That job is with my web application. So I should have something better. CORBA is out of question. Java RMI god save me from. Thrift is a possibility but it does not tick all the boxes outlined below

Is there any other way?

What if we could handle this at software layer rather than hardware layer? What if we could do it dynamically? What if we could interact between these services without overhead of HTTP? Turns out all this is possible now with frameworks like Vert.x.

This is what we want to achieve with Vert.x

a. Distributed architecture

b. Fault tolerant applications

c. Highly scalable application layer

d. Dynamic discovery of services (micro?)

e. Remove overhead of Http.

f. Interaction between services without installing a separate software like RabbitMQ.

Now that we have laid the foundation and defined our objectives we will start writing some code for Vert.x.

Happy coding !!

Why Vert.x?

Kafka Cluster Setup

from my own experience I find that while setting up kafka cluster on AWS we face some issues so just want to highlight them.

a. First setup zookeeper cluster. Let’s say 3 node cluster. Modify each node zoo.conf to publish ip address as internal IP address.

server.id=<internal aws ip1>:2888:3888

server.id=<internal aws ip2>:2888:3888

server.id=<internal aws ip3>:2888:3888

b. Go to kafka server.properties and change broker.id

node1 — broker.id 0

node2 — broker.id 1

node 3 — broker.id2

Change bind address to <internal ip> so that it is not accessible from outside

Change advertised.host.name to <internal ip>

List all zookeeper nodes within your own cluster under setting zookeeper.connect

Start all kafka nodes and you they should be able to create a cluster

Create a topic, publish a message using kafka-console-producer and see if there are not errors.

Kafka Cluster Setup

Vert.x Cluster

Vert.x is an extremely simple event based, non blocking,  library for distributed computing that can be easily embedded in any Java framework of your choice.

For sometime now I have been exploring Vert.x. I was looking for a vert.x cluster sample but I did not find any decent example so I decided to share that with community.

I modified one of the sample available from Vert.x examples and I will explain core parts of it here.

First the cluster has to be configured.

Config hazelcastConfig = new Config();

Config hazelcastConfig = new Config();
hazelcastConfig.getNetworkConfig().getJoin().getTcpIpConfig().addMember("127.0.0.1").setEnabled(true);
hazelcastConfig.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);

ClusterManager mgr = new HazelcastClusterManager(hazelcastConfig);
VertxOptions options = new VertxOptions().setClusterManager(mgr);
Vertx.clusteredVertx(options, res -> {
    if (res.succeeded()) {
        vertx = res.result();
        deploy(vertx,context,mode);
    }
});

This code can be part of your main function. You can also create a cluster config file called “cluster.xml” rather than creating configurations programatically. This should be on classpath of your application so it can be put inside src/main/resources I am going to test this application on my local machine so I am using TCP discovery rather than Multicast for my application. Vert.x underlying uses hazelcast (default) for all it’s clustering capabilities. Hazelcast also comes up with a special configuration for AWS. So if you are going to deploy your application on AWS then that is the configuration you should use.

<?xml version="1.0" encoding="UTF-8"?>
<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-3.2.xsd"
           xmlns="http://www.hazelcast.com/schema/config"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <properties>
       .....
        <property name="hazelcast.wait.seconds.before.join">0</property>
    </properties>

    <group>
        <name>dev</name>
        <password>dev-pass</password>
    </group>
    <management-center enabled="false">http://localhost:8080/mancenter</management-center>
    <network>
        <port auto-increment="true" port-count="10000">5701</port>
        <outbound-ports>
            <!--
            Allowed port range when connecting to other nodes.
            0 or * means use system provided port.
            -->
            <ports>0</ports>
        </outbound-ports>
        <join>
            <!--<multicast enabled="false">-->
            <!--<multicast-group>224.2.2.3</multicast-group>-->
            <!--<multicast-port>54327</multicast-port>-->
            <!--</multicast>-->
            <tcp-ip enabled="true">
                <interface>192.168.1.8</interface>
            </tcp-ip>
            <aws enabled="false">
                .....
            </aws>
        </join>
        <interfaces enabled="false">
            <interface>10.10.1.*</interface>
        </interfaces>        
    </network>
    <partition-group enabled="false"/>
    <executor-service name="default">
        <pool-size>16</pool-size>
        <!--Queue capacity. 0 means Integer.MAX_VALUE.-->
        <queue-capacity>0</queue-capacity>
    </executor-service>
    <map name="__vertx.subs">

        <!--
            Number of backups. If 1 is set as the backup-count for example,
            then all entries of the map will be copied to another JVM for
            fail-safety. 0 means no backup.
        -->
        <backup-count>1</backup-count>
      
        <time-to-live-seconds>0</time-to-live-seconds>
        <max-idle-seconds>0</max-idle-seconds>
        <!--
            Valid values are:
            NONE (no eviction),
            LRU (Least Recently Used),
            LFU (Least Frequently Used).
            NONE is the default.
        -->
        <eviction-policy>NONE</eviction-policy>
        <!--
            Maximum size of the map. When max size is reached,
            map is evicted based on the policy defined.
            Any integer between 0 and Integer.MAX_VALUE. 0 means
            Integer.MAX_VALUE. Default is 0.
        -->
        <max-size policy="PER_NODE">0</max-size>
        <!--
            When max. size is reached, specified percentage of
            the map will be evicted. Any integer between 0 and 100.
            If 25 is set for example, 25% of the entries will
            get evicted.
        -->
        <eviction-percentage>25</eviction-percentage>
        <merge-policy>
      com.hazelcast.map.merge.LatestUpdateMapMergePolicy
</merge-policy>
    </map>

    <!-- Used internally in Vert.x to implement async locks -->
    <semaphore name="__vertx.*">
        <initial-permits>1</initial-permits>
    </semaphore>

</hazelcast>

I have included only important parts and left others from configuration Once we are done with cluster configuration all we have to do is create a verticle and deploy it.

public class ServerVerticle extends AbstractVerticle {

    int port;
    public ServerVerticle(int port){
        super();
        this.port = port;
    }
    @Override
    public void start() throws Exception {
        super.start();
        HttpServer server = vertx.createHttpServer();
        server.requestHandler(req -> {
            if (req.method() == HttpMethod.GET) {
                req.response().setChunked(true);

                if (req.path().equals("/products")) {
                    vertx.eventBus().<String>send(SpringDemoVerticle.ALL_PRODUCTS_ADDRESS, "", result -> {
                        if (result.succeeded()) {
                            req.response().setStatusCode(200).write(result.result().body()).end();
                        } else {
                            req.response().setStatusCode(500).write(result.cause().toString()).end();
                        }
                    });
                } else {
                    req.response().setStatusCode(200).write("Hello from vert.x").end();
                }

            } else {
                // We only support GET for now
                req.response().setStatusCode(405).end();
            }
        });

        server.listen(port);
    }
}

Great…let’s deploy this verticle.

Once configuration has been done you need to deploy verticles in cluster

Vertx.clusteredVertx(options, res -> {
    if (res.succeeded()) {
        Vertx vertx = res.result();
       //You should deploy verticles only when cluster has been initialized
        vertx.deployVerticle(new ServerVerticle(Integer.parseInt(args[0])));
    } else {
    }
});

 

You start the application by giving a port let’s say at port 9000 and start another instance at different port let’s say 9005 and Voila can see both of them start communicating. Complete source code can be found at https://github.com/singhmarut/vertx-cluster.git Happy Coding !!

Vert.x Cluster

Why Java is my default choice?

Recently in my company this never ending debate of language has come up.
Needless to say everybody loves the language/stack they have experience with. So I can not really debate about why not other languages but I can tell why Java is my default language.

Before I start let me tell you how many languages I have command today as far server side is considered

a. C++, Java,C#.Net, Scala, Groovy, Javascript (Node.js)  and ready to explore more always

I started my career 12 long years back as a rookie and started working on a project called Rule Engine developed in VC++.

Liked them as well but C++ remained the first love..I was of the view that compared to other programmers if I learn the language deep enough to be a master (yes cause I was immensely impressed with the likes of Scott Mayers, Herb Sutter, Alexndrascu (creator of D), Don Box “the com guy” and many others. They were all C++ programmers.

Fast forward I joined a start a “web start-up” which was creating a web app. and I was asked to lead. We rearchiteted whole server side code in Java but again somebody else took responsibility for front end cause that was “web” according to C++ programmer inside me.

It’s performance is only a tad bit slow compared to C++ and is negligible for a web application where other aspects of the application architecture come into play but there is no way Java can be slower than PHP, Python, Ruby etc

So you should Java cause

  • It is high performance
  • It has huge ecosystem..So many web frameworks to choose from
  • It is enterprise grade
  • It is the only language that runs on 6 billion devices
  • It is stable and will remain a popular language in coming future
  • It’s ecosystem is well managed by apache
  • No dearth of programmers
  • JVM opens new opportunities everyday
  • Monitoring tools available aplenty
  • It keeps evolving (Functional programming in Java8)

Very surprising to see people working in languages like PHP/Python rarely do memory profiling may be because they simply do not have a tool.

To me Java is a language you can do anything with. You want web programming you got it.

You want to write high performance server? Bring it on..So many projects like Storm and Hadoop if they can rely on Java/JVM then you can too.

Whatever you can achieve in Pythons and Rubys you can do with Java. May be you wont like the syntax but vast ecosystem around Java makes sure you are never at loss while selecting a library or a tool.

and if you know Java then you know JVM and if you know JVM you can choose from any language from your choice like Scala..Don’t like Scala? Try Kotlin..Don’t like that either try groovy,Ceylon and so son.

So yes I guess you are in safe spot with Java which extremely well understood with infinite documentation and remains one of the top choice of programmers for last 2 decades and I do not see its glory fading away anytime soon.

So working with Java is like working with a world class MNC…bit old school but still better then many start ups and yet deliver what it promises. Very little surprises here.

Happy coding !!

Why Java is my default choice?

Not so many reasons Why C++ sucks..

Coded in C++ for 5 years so would apologize in advance. But this is just my opinion. Do I want to start a language war? You bet.

– Pointers suck
– Memory management sucks
– Function Pointers yeah they do
– goto statements sucks
– Friend Functions sucks
– static initialization sucks

– Macros Suck
– headers files sucks? If you don’t know then don’t bother..you can live without knowing them
– Multiple Inheritance sucks
– Diamonds suck
– Interfaces? Sorry C++ doesn’t have them so they can’t…
– reinterpret_cast I think they do suck
– Compilation suck
– Makefile suck


– Threading? sorry what? oh yeah use PThreads or boost..What is boost? some other day man..long story

– Templates? so complex they deserve 10 posts like this as they just don’t suck they suck bigtime.You think you know them? solve a simple puzzle and see if you knew the answer already

C++ Inheritance Puzzle

Did I mention it was a simple one..people interested in knowing them in detail read
alexandrescu modern design

Correction – By the way C++ did get great support for threads now simplifies the job so much
http://en.cppreference.com/w/cpp/thread  Boy, I love this language

Not so many reasons Why C++ sucks..

Why foo?

Why foo?
Somebody asked in Neal Ford presentation at thought works why we name everything Foo? Class Foo, Function Foo

He could not answer so here is my take on what other names could be adopted?

Aoo – Can’t even pronounce
Boo- Pretty romantic doesn’t look like a geeky word
Coo – Don’t know how to pronounce
Doo – Looks like a children word Doo Doo
Eoo – Can’t even pronounce

Goo – sorry doesn’t work for Indians
Hoo – Spreads negativity looks like hooting
Ioo – Can’t even pronounce
Joo – May be let’s keep it for now
Koo – Doesn’t sound good Koo – coo
Loo – Don’t want to go there during a presentation
Moo – Looks like straight from nursery rhyme
Noo – May be let’s keep it
Ooo – What is this? Doesn’t work
Poo – Hmm..who wants to utter it while coding?
Qoo – Can’t even pronounce
Roo – May be let’s keep it for now
Soo – Doesn’t work for Indians at least and that’s all I am bothered
Too – looks like counting
Uoo – Can’t even pronounce
Voo – Seems romantic again…need a geeky word
Woo – Same as above
Xoo – Can’t even pronounce
Zoo – Sorry will go there later let me code for now

So we are left with

Foo – Joo – Noo

Out of this Foo comes first and it definitely sounds better than other 2..So Foo wins Yayyy…

Why foo?