Microservice guidelines


When to expose a new microservice?

Microservice has to expose a well defined functionality with significant code. It’s difficult to define what is significant code so how to make a decision about new microservice.

  • If some service is turning into monolith. Various kind of functionality being added frequently into service
  • Performance is becoming a major bottleneck because some apis are getting hit harder compared to other and it cant be taken care in existing microservice
  • Too many frequent changes in one part of service and risk of breaking existing functionality
  • HA requirements are drastically different
  • Business wise it makes sense to break modules in different services
  • A specific technology needs to be used to solve a specific problem
  • Investment is justified and available

Microservice comes with lot of overhead so its mandatory that a thorough brainstorming has been done before adding new service in the system.

Provisioning of Microservice

Before service goes to production infrastructure provisioning ticket has to be opened 2 days before by manager with following guidelines about each resource type

Performance Measurement Guidelines

Sizing of machine/docker containers

Before you deploy your application you must understand performance characteristics of your service. Every service serves a different purpose hence their performance requirements will be different.

  • Some Services deal with lot of data hence their memory requirement will be different
  • Some services like log collector read lot of data and transport it to separate end point
  • Some services get lot of requests so responding in time is critical
  • Some services are heavily dependent on database so depending on type of database their performance characteristics change

There are some parameters to think about

1st step is to identify most critical APIs.

Depending on Cars24 load characteristics make sure you have done load test with these numbers

1. 20 requests / second for most frequently used APIs

2. Per API SLA should not exceed more than 100ms

CPU – Is your service CPU Intensive

Memory – How much data at any given point in time will be hold in memory

Networking  – Does your service require special networking capabilities? Usually these applications deal with extensive data ingestion like 100MBPs.

IOPS – Most applications are IO Bound meaning rather than CPU Database operations will consume lot of time. So you should observe your applications behavior in production to understand where is the bottleneck. Usually applications CPU will spike because underlying database and real problem is not with application code base. In this case Caching must be considered as an alternative either using Redis or Elastic Search

Kind of questions you should ask from first principle while provisioning EC2 Machine

EC2 (Fargate Cluster)

  • how many requests per second you are expecting?
  • does this require High Availability
  • Does it expose HTTP API?
  • Can it be shutdown during night?
  • Can it be reduced during night? If yes then what type? If No then why not?
  • Hard Disk required?
  • Is it temporary? If yes please specify the date on which it will be stopped


  • Just like EC2 Machines Docker Containers Sizing is necessary while deploying applications
  • RAM, CPU, Number of Tasks and Storage are main factors
  • Each Fargate container is deployed as ECS “Service”. Service deployment helps in service discovery using Route53 and is completely automated
  • CPU and RAM are defined in multiples of 1024
  • If you require 2 cores then define 2048 as CPU and RAM will be minimum 4*1024 = 4096 (4GB).
  • Fargate capacity is available on in slabs. For 1 CPU (1024) Memory comes with 2048 so be careful that you are asking about right CPU
  • Nature of Service (HTTP/TCP)
  • Health Check has to be defined
  • Health Check interval has to be defined


  • Is your application in POC Phase? If yes can it share database server with other applications?
  • Whats the default connection pool size of your application? If Its a poc can it be set to 2. Connections eat up resources at both application end and database has to use lot of CPU/Memory to maintain them
  • Connection Pool size to be limited to 10 assuming there will be 2 instances running. If a request can complete within 100ms
    • 1 connection = 10 requests per second
    • 10 connection = 10×10 = 100 requests per second per server
    • 2 servers = 100×2 = 200 requests per second.
    • How many we need? 20 requests per second?
  • Will database be deleted after few days? if yes then define a date
  • How many IOPS will be required
  • How much disk size will be required – provide estimation logic
  • By default we use MySQL for storing all microservices data. Version of MySQL has to be 5.7


  • Redis is an extremely fast and efficient caching store and provides lot of functionality
  • You must state what kind of operations you are going to do on redis because operations define whether Redis instance requires higher CPU
  • Redis throughput is measured in thousands of requests per second so application should be able to handle heavy load on small instance of redis

Logging Guidelines

  • Logs put tremendous stress on systems. They consume lot of IOPS and also result in degraded performance when used extensively. Use them in useful places
  • Use proper context in logs by capturing useful information like userid/lead-id but not personal information like phone number

Microservice Architecture and its Challenges

Microservices architecture is becoming increasingly popular while building large scale applications as it provides n number of benefits

  • Separate lifecycle of each service so services can be deployed independently which means services can evolve separately.
  • Each service can be fine tuned for different SLAs and scalability
  • Each Service can be developed using different stack
  • Each service can be monitored separately

However, Microservice service architecture is not a free lunch. It throws many challenges that must be discussed and dealt with even before venturing into this (unchartered) territory.

  • Authorization – How do you make sure that particular service and it’s APIs can be called only when user is authorised to do so.
  • Data Consistency – ACID is no longer available, deal with eventual consistency.
  • Service Discovery – How to find and interact with new services. If there are lots of services then how to make sure that they are discoverable from other services
  • Deployment – What if there is a hierarchy in microservice dependency
    • A -> B -> C
  • SLA – Multiple hops in a request will add to latency and affect your SLA
  • Fault Tolerance – How to handle Cascading Failure
  • Monitoring – (How to monitor a system which has 100s or even 1000s of services)
  • Tracing  – Logging & Request Tracing (A message will travel across many boundaries so how to nail down where message got lost/delayed)
  • Tech Stack – Selecting a stack (to go with single stack or multiple stacks?)
  • Packaging and Deploying – (How to come up with a uniform way of packaging and delivering services )
  • Debugging – During development phase how to make sure that developers are able to debug code as effectively as they do in monolithic system



Rust Concurrency

rustFor a long time I have been thinking about writing a sample program in Rust “the” new systems language. I have done coding in C++ for initial 5 years of my career before I moved on completely to Java and recently in one of my products a requirement came up that a low latency high performance component had to be developed.

As I have written by default Java was a default choice as its my first choice anyways. However I realized that this component could not afford non deterministic nature of garbage collector.

So need was to write program where I could have control over exact memory deallocation without worrying about “stop the world” garbage collection. Natural Choice was C++ but programming is all about having fun and I wanted to try out something new and C++ threading support and syntax is not all that great even in C++11.

So I decided to try out Go. but again Go had an issue of garbage collection and same fear of non determinism creeped in.

So time to try out Rust.

Program is simple but can be extended to lot of other scenarios.

One thread keeps spitting out data at some regular intervals. A vector keeps track of generated data.

Other thread keeps ticking at regular intervals (100ms or so) and whenever there are items which have elapsed time greater than a threshold those items are expired. Same as cache TTL.

use std::thread;
    use std::sync::mpsc;
    use std::time::{Duration,Instant};
    use std::collections::HashMap;

   //Define struct
    struct Item {
        created_at: Instant,
        pub description: String
//Implement Item
    impl Item {

        pub fn new(id: i64,description: String) -> Item {
            Item {
                created_at: Instant::now(),
                id: id,
                description: description

        fn created_at(&self) -> Instant {

        fn id(&self) -> i64 {

    fn main() {
        let (sender, receiver) = mpsc::channel(); //Creat  multiple publisher single receiver channel
        let sender_pop = sender.clone(); //clone sender

        //Create a thread that sends pop every 2 seconds
        thread::spawn(move || {
            //Create infinite loop
            loop {

        //Create a thread that keeps sending data every second t
        thread::spawn(move || {
            let mut val = 1;
            //Create infinite loop
            loop {
                val = val + 1;
                //Break out of loop if you want to
//                if val == 10 {
//                    println!("OK, that's enough");
//                    // Exit this loop
//                    break;
//                }
        //Create a mutable vector
        let mut vals: Vec<Item> = Vec::new(); 
        let ttl = 5; //TTL in seconds
        //Receive items in non blocking fashion
        for received in receiver {
            //let item = &received;
            let mut item = &received;
            let newItem: Item  = item.clone();
            match item.description.as_ref(){
                "Pop" => {
                    vals.retain(|ref x| Instant::now().duration_since(x.created_at).as_secs() < ttl);

                _ => {

That’s it. You have done synchronisation between threads without any race condition. That’s how cool Rust is.

In the next blog we will try to send notification whenever items are expired.

Happy Coding !!

Master Worker Architecture using Vert.x


Today I am going to explain how Vert.x can be used for creating distributed Master Worker Paradigm. In large scale systems it’s applicable to wide variety of problems.

First – Just to refresh our memories about what Vert.x is

Vert.x as we know is a lightweight framework for creating distributed microservices. It can sale up and scale out depending on your needs. It also takes away all your pain of dealing with complexity of heavily multithreaded environments, race conditions etc. etc.

Primary unit of work in Vert.x is a verticle. Verticles are thread safe and they can run locally or remotely. One Verticle interacts with other verticle using Events which carry data with them.

Now – let’s take a day to day scenario.

We are getting lot of requests. Each request is independent of each other but we are unable to process all these requests on the commodity hardware that we have. How to serve all these requests coming to our cool high traffic website?

Well one answer is serve each request in a new “thread” and keep increasing the CPU Cores (Scale Up) and hope it will work. This is what your webserver does. Problem is you can only increase no of cores to a limit (How high you can go?).

Once you reach that limit you will add more such machines and leave it to load balancer to divide all these requests equally between all machines. Sounds familiar?

Well, you will have problem relying on load balancer when every service in the system faces same issue. Every time you will have to scale these services and keep re-configuring load balancer. What if this was possible in application layer dynamically. What if we could scale up and out without any pain of load balancer.  Good news is it can be achieved using Vert.x except load balancing happens inside your application layer. There can be lot of benefits of this approach which I will discuss some other time but for now let’s just focus on how can this be achieved using Vert.x

So this problem has 2 major challenges : –

a. How to divide the work between different machines so that we can keep up with this load.

b. How to combine the result from all this processing so that we can return this result to client (master) who needs answers from all workers before proceeding further (Can this be achieved by load balancer?).

So Master is like Management whose only job is to distribute all the work to developers (like you and me) and when work is done…combine all the statuses, create a report and notify the boss and hopefully get a fat pay hike (sounds familiar?)

In terms of Vert.x We have a master Verticle which gets lot of work to do. But Master does not want to do any work..Why? Because its a “Master”. So master wants to assign all this work to Workers. Worker are also verticles in Vert.x. But then problem arises that master needs to know if all work is completed so that it can make right decision about what to do next..Right..

So here is high level architecture we are going to fllow

Vert.x Master Worker

Ok..so in order to simulate this first lets create lot of work

import io.vertx.core.Future;
import io.vertx.core.Vertx;
import io.vertx.core.eventbus.Message;

 * Created by marutsingh on 2/20/17.
public class Application {

    public static void main(String[] args){

        final Vertx vertx = Vertx.vertx();
        //vertx.deployVerticle(new HttpVerticle());
        vertx.deployVerticle(new MasterWorker());

        //DeploymentOptions options = new DeploymentOptions().setInstances(10);
        for (int i = 0; i &amp;lt; 5; i++){

        vertx.eventBus().send("vrp", "Job1,Job2,Job3,Job4,Job5,Job6,Job7,Job8,Job9,Job10");

        System.out.println("Deployment done");

Great ..We created our own master..lets see how does it look

public class MasterWorker extends AbstractVerticle {

    public void start(Future fut) {
       vertx.eventBus().localConsumer("vrp", new Handler() {
           public void handle(Message objectMessage) {
               String data =  objectMessage.body().toString();
               String[] work = data.split(",");
               String jobId = UUID.randomUUID().toString();
               List futureList = new ArrayList();

               for (String w : work){
                   Future f1 = Future.future();
                   vertx.eventBus().send("work",w + ":" + jobId,

Great so our master is doing…sending work over event bus and hoping some worker will start working upon it.

Lets see what our worker is doing

public class WorkVerticle extends AbstractVerticle {

    public void start(Future<Void> fut) {
        final String verticleId = super.deploymentID();

new Handler() {
            public void handle(Message objectMessage) {
                String[] data =  objectMessage.body().toString()
                String work = data[0];
                String jobId = data[1];
                String result = work + "Completed***";

So worker does the work and sends an event on event bus with result.

Now master needs to combine all these results. This is way cool features introduced in Vert.x 3…Composable futures It makes this so easy

CompositeFuture.all(futureList).setHandler(ar -> {
                   if (ar.succeeded()) {
                       ar.result().list().forEach((result) ->
 resultSet.append(((MessageImpl) result).body().toString()));
                       // All succeeded
                   } else {
                       // All completed and at least one failed

Thats all !!.  I hope this will be useful in some of your scenario.

Source code is available at


Happy coding !!

Code Quality Guidelines

Coding guidelines are extremely important part of a professional developer’s day to day practices.

Following these guidelines differentiate between an experienced developer and a rookie.

It surprises me that so many companies still ignore them and produce poor quality code that results in very expensive maintenance over the period and is so fragile that every time you add a new feature immediately bugs creep in.

I am sharing some of these guidelines which are far from exhaustive but are most important for me. Some things people might not agree with but these are my experiences and many of them are borrowed from classic tests.

Coding Standards

General Coding guidelines

These are general predefined standards for developing a code. This reduces

  1. Naming Conventions should be descriptive (Variable as well as functions).
  2. Your application must have separate static and dynamic parts.
  3. No Hard Coding. Find an appropriate place where you can define constants or enums.
  4. Prefer simplicity over complexity. If your code is turning out to be very complex most likely you are doing something wrong. As the saying goes its “hard to build simple things”
  5. Avoid premature optimization. Define premature optimization for your own use case. Well sounds awkward. Trust me it is. Only experience can tell you what does this really mean
  6. Always look for possibility of following a standard Design Pattern. Tweak it for your own use case
  7. Strictly prohibit repetitive code. If code is repeating it’s a candidate for refactoring.
  8. Always align your code properly before committing code

Class Design

  1. Class should not be more than 600 lines.
  2. Constructor should not have any complex logic and has to be exception safe.
  3. Prefer composition over inheritance
  4. Follow one responsibility rule everywhere
  5. Design for extensibility
  6. If in Object Oriented language always define an interface
  7. Avoid circular dependency. If working with a web framework consider using

Comments and Error Messages

  1. Write comments at all critical places in your code including variable name, their usage, function signature (input/output/parameters).W
  1. Work with error messages framework. Using error codes for displaying error messages is confusing as it’s hard to figure out which error code is coming from which place. To avoid this chaos, it is recommended to use error message framework.

If/else Statements

  1. Do not write deep nested if else statements.
  2. If nesting is getting deeper break your code into multiple functions
  3. Operator precedence for your language can introduce nasty bugs in your code which are extremely hard to debug. Follow a policy of using parenthesis while writing long if else conditions.

Implement OOPS

  1. It is recommended to implement OOPS in your code as much as possible.
  2. Program to an interface (contract),not class. Do not change the interface as much as possible
  3. Try to make an abstract class for a business service (in case of python/C++, interface in case of Java).
  4. Follow DRY Principle (Don’t repeate yourself). Use Design Patterns to promote code reusability

Java Best Practices

  1. Use interface when declaring collection variable like Map<String,Object> = new HashMap<String,Object>();
  2. Avoid using Object as much as possible. Thrive for TypeSafety
  3. Use StringBuilder for performance and safety
  4. Use java.time package when dealing with deals https://www.programcreek.com/java-api-examples/index.php?api=org.joda.time.format.DateTimeFormatterBuilder
  5. Use same timezone everywhere in application
  6. NativeQuery also takes class as a parameter so try using this overload…https://vladmihalcea.com/the-jpa-entitymanager-createnativequery-is-a-magic-wand/


  1. Function should not be more than 25 lines.
  2. Always check for valid parameters inside public functions. Throw an exception to report an error in params
  3. To group the statements logically, try to divide different sections of a function into other smaller functions. E.g. Separate function for initializing values for every possible activity.
  4. Use functional programming capabilities if your stack supports it. I.e. pass around functions to write reusable code.
  5. Follow Single Responsibility Rule as closely as possible.
  6. Functions have to be testable (I should be able to write unit test case for this function). In other words promote loose coupling via Dependency Injection or otherwise.
  7. To continue with loose coupling follow the rule “Prefer composition over inheritance”.
  8. If you are working with Java8 Never return null. Consider returning Optional
  9. Try to avoid multiple return statements. This can put nasty bugs inside programs so it’s best to avoid them as much as possible.
  10. Check Big O Complexity of algorithm you are writing. Especially for the case, where you are writing a lot of lines of code or for functions which are on critical path.

Function overloading should follow convention

  1. foo(int), foo(int,double), foo(int, double, object) i.e. least needed parameter at the last.



Follow layered architecture in true spirit. Upper Layer should call into lower layers and each layer has to be designed for specific purpose. E.g. while following MVC, Logic in views has to be related to view and all heavy lifting shall be done by service layer.

Package Structure andnaming conventions

  1. All Java Packages should start with com.broctagon. Check for specific naming convention in your stack but topmost package has to be com.broctagon.
  1. Define functions in packages instead of utility. It’s a common malpractice to put every seemingly useful function inside utility classes. And while writing code it becomes difficult to look into these packages. If it’s a business utility function then try to find a proper package for it rather than putting function inside utility classes. Utility classes generally shall have function related to common tasks like String Reverse or some Math functions or may be email format checking utility.



  1. It is recommended to use logging, wherever possible. Purpose of the logging is to diagnose any potential issues in production. Logging is useful but it incurs significant overhead on the application so it must be used wisely and only information required shall be logged.
  1. Logging should not be cluttered, it must follow same consistent pattern across the application. Identify a pattern for logging for your specific use case
  2. Logging libraries are incredibly useful. Use their package level capabilities to switch on/off selective logging at different levels.


 Eexception Handling

  1. Do not suppress exceptions
  2. If an exception is explicitly raised in a function then it should not be handled in that same function. Create a separate function to handle exception and process.
  3. Do not suppress original exception even if you have to create a new exception
  4. Try to use already available functions in logging libraries.
  5. Comment on bypassing function i.e if we are passing any exception then mention in comment why we are doing this.
  6. Try following naming convention for exceptions as per your language e.g. Exception suffix in Java
  7. Do not write complex code in handler. Lot of times this code block throws an exception and hides original exception
  8. Read about exception handling best practices for your respective language and follow same.


  1. Always follow MVC pattern
  2. Do not bloat your controllers
  3. Make your URLs simple and easily understandable by end user
    1. “/admin/orderhistory” should be changed to  “/admin/order/history”
  4. Make your services code “testable” which means loose coupling

Spring Best Practices

  • Always Follow Builder pattern for response entity
    • ResponseEntity.ok(“success”);

Releasing the software

  1. All Java services should follow microservices architecture and be packaged as Fat Jar.
  2. Docker is used to deploy the software

MicroService Architecture

Here is a microservice architecture deployment diagram. All Services are docker containers which are registered to Consul Server via Registrator. Client (External – Mobile, UI) makes a call to Bouncer which is our API Proxy. Bouncer has all permissions configured on API URLs. It makes a call to Auth Server which authenticates the request and if successful it passes the Service URL to HAProxy. HAProxy then has rules configured which redirect the URL to exact service.

Service always follow a naming convention so when service is registered in consul then consul-template refreshes the HAProxy configuration in the background.


Bouncer – API Proxy gateway…all calls come to bouncer to make sure that only authenticated requests are passed to actual services

Auth Server – Single point of authentication and authorization. All applications create permissions and save in Auth Server

External ELB – All public APIs talk to External ELB which in turn are passed to HA Proxy cluster

Internal ELB – All internal APIs are routed through Internal ELBs. There will be URLs which will only be exposed to Internal Services

HA Proxy (Cluster) – The Load balancer cum service router

Consul Server (Cluster) – Centralized Service Registry

Registrator – SideCar application running with each service which updates Consul Cluster about service health

Consul Template – Background application which updates HAProxy whenever there is a change in service configurations

ECS Cluster – AWS ECS where all docker containers are registered. Takes care of dynamically launching new docker containers based on parameters. Autoscaling is handled automatically

There you have major parts in deployment..Please share your comments..Happy Coding !!

A friend posted this question on my FB account


Question 1.

I am depicting from diagram that all microservice API will proxy through API gateway. Is it true for internal cluster communication also? If yes, then wouldn’t it be too much load on gateway server for too many micro-services in infrastructure? Or will it be as lightweight as load balancer? Can I assume this gateway as a simple reverse proxy server which is just passing through the request/response then why not use Apache or Nginx?

Answer : 1st….internal cluster communication may or may not happen through Bouncer..depending on what your security requirements are.if you want all API calls to be authenticated then yes it will go through Bouncer…Point to note is.Bouncer is very lightweight Node.JS application so it will have extremely high throughput and because its behind HA Proxy you can always add new nodes…

API Proxy is a reverse proxy but with some logic…Most of the time when you want to expose an API which interacts with multiple microservices you will have to aggregate data..that logic will reside in API Proxy..Its a common pattern in large scale architecture

Question 2.
As per your description, Auth server is also responsible for authorisation here? Then how are you making sure of isolation of microservice, if it’s authorisation logic is shared to auth server the how are you making sure of data integrity which is a security measure which protects against the disclosure of information to parties other than the intended recipient.


Auth Server is like “Google auth server”…All resource permissions reside in AuthServer..Authorization server has permissions for each application…These permissions can either be added via API by app or by an Admin UI…so each app can have different permissions…A single user will be given different permissions for different apps so isolation is guaranteed. e.g. I May have “user-create” permission in UserService but I may not have “account-create” permission in AccountService

Who creates permissions, who gives them to users.. and when depends on your design.


Serverless Architecture – AWS Lambda


I want to write this post about my views about serverless architecture (specifically AWS Lambda) which all cloud service providers like AWS are promoting as “holy grail” for solving all problems.  This post is targeting developers who understand that every technology has a limitation and its wise to make an informed decision

Before I start my discussion around this want to state some facts so that we can have a fruitful discussion.

a. All companies including cloud service providers have to make money

b. All companies are afraid of competition and do not want to loose their customers

c. There is no branch of engineering where “one site fits all” approach works.

d. No matter what tools an engineer chooses when you cant find a solution “go back to basics” is the best approach.

e. Lambda architecture in the context of “AWS” is different from lambda architecture in general as many problem with this architecture are AWS specific only.

If you want to understand some issues with “Lambda architecture”


Coming to the point Many attempts have been made in the past to find one “holy grail” to find solutions to teething problems. Let’s look at some of these

Problem 1– lets take operating systems Question is why do we have multiple operating systems? Why noone is able to solve “all the problems under sun”? Why so many

Problem 2– Write multiple programs for different OS even if program does same thing. Java solved this problem.Don’t worry about garbage collection, no worry about performance or underlying platforms. After 20 years of research and billions of dollars we only have more languages. If Java would have solved all the problems that it targeted we will never have to learn “node.js”

Problem 3. Learn multiple languages for front-end and backend development. GWT from house of google did solve it. great !!. Where is it now? Why did google decide to stick to Javascript for front end development and created angular?

Problem 4– Integration. In vast variety of protocols and hundreds of disparate systems in a sizeable organization Integration is a major problem. Hence birth of ESB. Where is it now? How many start ups use this? haven’t heard or found anyone

Problem 5 – Modeling business processes. Rule  Engine tried to solve all problems under the sun and be de-facto expert systems. No-doubt it is formidable technology but does it work in most of the scenarios. Answer is a resounding NO.

I can go on and on but you know what I am implying. One size does not fit all. Never have, never will. Probably there will be enough research into AI that technology stack and architecture can be determined by a program itself but I do not see that happening anytime soon.

There are many ways you can architect your applications. Serverless is one of those architectures.In this architecture you do not manage your own resources but someone else does. When and how many are decided by AWS for you. you do not get to choose your operating system, runtime versions etc etc.

Serverless architecture also tries to create an “illusion” that all my problems related to understanding processes behavior and problems with distributed systems will go away and it will scale on its own without “paying any cost”.

Before you find this approach or so called “architecture” fascinating give a call to your friend working for Amazon and ask how many amazon folks are actually using this. So far I have found none. Or even if they tried they do not understand its internals. Can you imagine lot of people in Amazon use “Oracle”. so much for their AWS offerings

Some people have written lengthy articles about Lambdas and its problems. Try to see if you have found answers to questions raised.


arguments in favor of lambda:


There is definitely logic in arguments posted in above blog but I see serious fundamental issue in writers approach of “why do I need to know this”. What kind of approach is this?  One that I do not agree with. Knowing  things in detail makes you better than competition and more adapt at solving problems. Knowing things help you solve critical technical issues and innovate. Yes its effort so what. AWS did not come out of thin air. It adopted many architectural solutions like Dynamo, RDS, Kinesis they are all based on cutting edge research papers.

To be more straight forward there are some points you need to be careful about while using Lambda architecture.

  • Your application will be difficult to debug
  • Encounter bugs which will be extremely hard to debug simply because your production environment will be very different from your local machine and you can never replicate it
  • Poor Error reporting. already shared a blog
  • Lambda warm up time. For batch processing lambda seems absolutely appropriate but for scalable real-time application? Well good luck if it works for you.
  • Timeouts. Lambda will always have a time-out which means that if your code takes more time then defined time-out lambda will never run and keep throwing exceptions. So keep this carefully in mind especially when making external HTTP calls. Even more serious problem is the fact that Lambda will cool down when there is not much to process. This logic of cool down is somewhere hidden without much detail.
  • Multiple environments – Create multiple lambda function..hell with code reuse
  • Application state – Well forget this as lambda has to be stateless. You have absolutely no control on when a process runs and stops.
  • JVM Optimizations – It also needs to be kept in mind that in Java techniques like JIT might be of no use in case of lambda functions. so all JVM optimizations will go out of the window.
  • Throttling – This is something we have faced recently. Throttling limits are absolutely ridiculous for high traffic apps. Yes its a soft limit you can raise requests but isn’t it contradictory to auto-scaling part. I thought this was the problem Lambda was solving in first place.

Bottom line is pretty simple. AWS Lambda is a great tool provided you use it wisely just like any other technology

  • you are fine with limited language support for Lambda.
  • you know how to deal with lambda limitations. timeouts, filepaths, limited memory, sudden restarts, multiple copies of same code,  etc.
  • you want to avoid and for good reasons complexity involved with developing distributed systems
  • analyzing your process performance is beyond your skills. e.g. you have no idea of JVM tuning
  • You are OK with Open-JDK.
  • you can be reasonably that your process will never hung or crash or have memory leak because you wont be able to login to machine and analyze dump using some advanced tool.

Apache Kafka – Simple Tutorial


In this post I want to highlight my fascination with Kafka and its usage.

Kafka is a broker just like “RabbitMQ” or “JMS”. So what’s the difference?

Difference are:

  • It is distributed
  • it is fault tolerant – because of messages being replicated across the cluster
  • It does one thing and one thing only i.e. Transferring your messages and does it really well
  • Highly scalable due to its distributed nature
  • Tunable consistency
  • Parallel processing of messages unlike others which do sequential
  • Ordering guarantee per partition

How do you set it up?

Kafka is inherently distributed. So that means you are going to have multiple machine creating a Kafka cluster.

Kafka uses zookeeper for leader election among other things so you need to have zookeeper cluster already running somewhere. otherwise you can go to


You install Kafka on all the machines which will participate in Kafka Cluster and then open the ports where Kafka is running. Then provide configuration of all other machines in the cluster in each machine. e.g. if Kafka is running on machines K1,K2,K3 then K1 will have information of K2 and K3 and so son.

Yes its that simple

How does it work?

The way Kafka works is you create a topic, send a message and read message at the other end.

So if there are multiple machines how do you send message to Kafka? Well you keep a list of all the machines inside your code and then send message by high level Kafka Producer (which is a helper class in Kafka Driver). Kafka high level consumer class is available for reading messages.

Before you send a message create a topic first with a “replication factor”” which tells kafka hos many brokers will have the copy of this data

Some important terminologies related to Kafka are:

Topic – Where you publish message. You need to create beforehand

Partition – Number of consumers that can listen to a topic in parallel. Default is 1 but you can create hundreds

Ordering of Messages – Guaranteed for single partition

TTL – Time to live for messages on the disk – default 7 days

Group – Kafka guarantees that a message is only ever read by a single consumer in the group. so if you want that a message be delivered only once then just go and put all consumers in same group.

If you want to go deep here are some useful links



Why Kafka?

Today Many companies in startup world are completely dependent on AWS infrastructure. Its a good strategy since you do not have to manage your own infrastructure and saves you from lot of headache.

Today we will discuss a bit about brokers available in AWS infrastructure. AWS has mainly 2 types of broker offering

a. SQS (Simple queue service) – More like ActiveMQ, RabbitMQ

b. Kinesis (Distributed, fault tolerant, highly scalable message broker) – less features but optimized for ingesting and delivering massive number of events at extremely low latency.

Design of Kinesis is inspired by Linked-in donated Kafka. Linked in processes billions of events per day using Kafka and it’s apache top level project which is being used in many highly scalable architecture.

I want to focus in this post on some of the key differences between Kinesis and Kafka. As stated in the beginning working with AWS infrastructure is a good thing but over-reliance on AWS infrastructure has some major problems.

a. You are vendor locked-in so tomorrow if you want to shift to Digital Ocean or even own infrastructure you will not be able to do so.

b. You are limited by the restrictions put by AWS like how many transactions you can do per unit of time

so, in the light of above 2 points I will try to explain where Kafka should be used instead of RabbitMQ and in-place of Kinesis

RabbitMQ Pros:

  • Simple to install and manage
  • Excellent routing capabilities based on rules
  • Decent performance
  • Cloud Installation also available (CloudAMQP)


  • Not-distributed
  • Unable to scale to high loads (due to non-distributed nature)

Kafka Pros:

  • Amazingly fast reads and writes (due to sequential reads and writes only)
  • Does one thing and one thing only i.e. to transfer messages reliably
  • Does provide load balancing of consumers via partitions of topic so real parallel-processing no ifs and buts
  • No restrictions on transaction numbers unlike Kinesis


  • Complicated to setup cluster compared to rabbitmq
  • Dependency on Zookeeper
  • No Routing

So bottom line

  • Use RabbitMQ for any simple use case
  • Use Kafka if you want insane scalability and you are ready to put effort in learning kafka topics and partitions
  • Use Kinesis if setting up kafka is not your cup of tea
Kafka Kinesis RabbitMQ
Routing Basic (Topic Based) Basic (Topic Based) Advanced (Exchange based)
Throughput Extremely high Extremely high
Latency Depends on region (Not available in some regions hence Http call) Very low High (Compared to other 2)
Ease of implementation Moderate..but setting up cluster requires effort Moderate (but identifying number of shards can be tough) Easy
Restrictions on transactions None 5 reads per seconds and 1000 write/sec/shard None
Types of applications High throughput High throughput Low to medium throughput

As always drop me an email if still confused about your use case

Happy Coding !!

Why Vert.x?

In this post I will try to throw some light on capabilities and need of this new exciting framework Vert.x.

I have been following this framework for last few years and it is great to see that it is now being adopted by many large companies means it is stabilized now.

Promoted by Redhat vert.x is a very lightweight library for distributed computing.

Today almost all applications are n-tier. This is not an invention anymore but need of the hour. You can not scale a monolithic applications and those days are  long gone when you could just create a scaffolding app in Ruby on rails and keep modifying the same app to run your business.

There are many reasons for why an application should be divided into different components a.k.a services (or microservices). But I see capability to iterate and roll out new features is very important to create a distributed systems.

So today Business layer is divided in n number of different components. This is not something new. Many enterprise applications have been built that way.

What is changing now is the tools that lets you create this distributed architecture without actually going into nitty-gritty of distributed software design.

Let’s consider a scenario. You have a web applications (let’s say ruby on rails app).

You start getting lot of traffic which your single server can not handle anymore.

So what do you do?

You hide behind a load balancer and you spawn a new instance. When new requests come they get redirected to one of the server using some load balancing strategy (round-robin?)

Again if traffic spikes you repeat the process.

There is a problem here

a- It works only when you have monolithic application

b- It does not scale.

If you are looking to build up a high traffic website then you just cant keep adding up servers to load balancer and assume everything will work fine.

So many large applications will divide business layer, the layer which does most of the useful tasks into multiple different components and expose APIs which are consumed by Web Layer.All these business components will be doing some specific tasks and each one of them will be scaled independently.

Great…seems like an scalable solution. But now we have created another problem

How do we manage these different components or services? How do we discover them in system? We can go to our good old load balancer and start assigining each service a DNS and let load balancer do the job.

This seems unmanageable. How many DNS we would like to have and how many times we will have to reconfigure load balacner? And why in this world I would want to have DNS assigned to each of these services..I certainly do not want to expose them to outside world. That job is with my web application. So I should have something better. CORBA is out of question. Java RMI god save me from. Thrift is a possibility but it does not tick all the boxes outlined below

Is there any other way?

What if we could handle this at software layer rather than hardware layer? What if we could do it dynamically? What if we could interact between these services without overhead of HTTP? Turns out all this is possible now with frameworks like Vert.x.

This is what we want to achieve with Vert.x

a. Distributed architecture

b. Fault tolerant applications

c. Highly scalable application layer

d. Dynamic discovery of services (micro?)

e. Remove overhead of Http.

f. Interaction between services without installing a separate software like RabbitMQ.

Now that we have laid the foundation and defined our objectives we will start writing some code for Vert.x.

Happy coding !!