Microservice Architecture and its Challenges

Microservices architecture is becoming increasingly popular while building large scale applications as it provides n number of benefits

  • Separate lifecycle of each service so services can be deployed independently which means services can evolve separately.
  • Each service can be fine tuned for different SLAs and scalability
  • Each Service can be developed using different stack
  • Each service can be monitored separately

However, Microservice service architecture is not a free lunch. It throws many challenges that must be discussed and dealt with even before venturing into this (unchartered) territory.

  • Authorization – How do you make sure that particular service and it’s APIs can be called only when user is authorised to do so.
  • Data Consistency – ACID is no longer available, deal with eventual consistency.
  • Service Discovery – How to find and interact with new services. If there are lots of services then how to make sure that they are discoverable from other services
  • Deployment – What if there is a hierarchy in microservice dependency
    • A -> B -> C
  • SLA – Multiple hops in a request will add to latency and affect your SLA
  • Fault Tolerance – How to handle Cascading Failure
  • Monitoring – (How to monitor a system which has 100s or even 1000s of services)
  • Tracing  – Logging & Request Tracing (A message will travel across many boundaries so how to nail down where message got lost/delayed)
  • Tech Stack – Selecting a stack (to go with single stack or multiple stacks?)
  • Packaging and Deploying – (How to come up with a uniform way of packaging and delivering services )
  • Debugging – During development phase how to make sure that developers are able to debug code as effectively as they do in monolithic system

 

 

Microservice Architecture and its Challenges

Master Worker Architecture using Vert.x

logo-sm

Today I am going to explain how Vert.x can be used for creating distributed Master Worker Paradigm. In large scale systems it’s applicable to wide variety of problems.

First – Just to refresh our memories about what Vert.x is

Vert.x as we know is a lightweight framework for creating distributed microservices. It can sale up and scale out depending on your needs. It also takes away all your pain of dealing with complexity of heavily multithreaded environments, race conditions etc. etc.

Primary unit of work in Vert.x is a verticle. Verticles are thread safe and they can run locally or remotely. One Verticle interacts with other verticle using Events which carry data with them.

Now – let’s take a day to day scenario.

We are getting lot of requests. Each request is independent of each other but we are unable to process all these requests on the commodity hardware that we have. How to serve all these requests coming to our cool high traffic website?

Well one answer is serve each request in a new “thread” and keep increasing the CPU Cores (Scale Up) and hope it will work. This is what your webserver does. Problem is you can only increase no of cores to a limit (How high you can go?).

Once you reach that limit you will add more such machines and leave it to load balancer to divide all these requests equally between all machines. Sounds familiar?

Well, you will have problem relying on load balancer when every service in the system faces same issue. Every time you will have to scale these services and keep re-configuring load balancer. What if this was possible in application layer dynamically. What if we could scale up and out without any pain of load balancer.  Good news is it can be achieved using Vert.x except load balancing happens inside your application layer. There can be lot of benefits of this approach which I will discuss some other time but for now let’s just focus on how can this be achieved using Vert.x

So this problem has 2 major challenges : –

a. How to divide the work between different machines so that we can keep up with this load.

b. How to combine the result from all this processing so that we can return this result to client (master) who needs answers from all workers before proceeding further (Can this be achieved by load balancer?).

So Master is like Management whose only job is to distribute all the work to developers (like you and me) and when work is done…combine all the statuses, create a report and notify the boss and hopefully get a fat pay hike (sounds familiar?)

In terms of Vert.x We have a master Verticle which gets lot of work to do. But Master does not want to do any work..Why? Because its a “Master”. So master wants to assign all this work to Workers. Worker are also verticles in Vert.x. But then problem arises that master needs to know if all work is completed so that it can make right decision about what to do next..Right..

So here is high level architecture we are going to fllow

Vert.x Master Worker

Ok..so in order to simulate this first lets create lot of work

import io.vertx.core.Future;
import io.vertx.core.Vertx;
import io.vertx.core.eventbus.Message;

/**
 * Created by marutsingh on 2/20/17.
 */
public class Application {

    public static void main(String[] args){

        final Vertx vertx = Vertx.vertx();
        //vertx.deployVerticle(new HttpVerticle());
        vertx.deployVerticle(new MasterWorker());

        //DeploymentOptions options = new DeploymentOptions().setInstances(10);
        //vertx.deployVerticle("WorkVerticle",options);
        for (int i = 0; i < 5; i++){
            vertx.deployVerticle("WorkVerticle");
        }

        vertx.eventBus().send("vrp", "Job1,Job2,Job3,Job4,Job5,Job6,Job7,Job8,Job9,Job10");

        System.out.println("Deployment done");
    }
}

Great ..We created our own master..lets see how does it look

public class MasterWorker extends AbstractVerticle {

    @Override
    public void start(Future fut) {
       vertx.eventBus().localConsumer("vrp", new Handler() {
           @Override
           public void handle(Message objectMessage) {
               String data =  objectMessage.body().toString();
               String[] work = data.split(",");
               String jobId = UUID.randomUUID().toString();
               List futureList = new ArrayList();

               for (String w : work){
                   Future f1 = Future.future();
                   futureList.add(f1);
                   vertx.eventBus().send("work",w + ":" + jobId,
                  f1.completer());
               }
           }
       });
    }
}

Great so our master is doing…sending work over event bus and hoping some worker will start working upon it.

Lets see what our worker is doing

public class WorkVerticle extends AbstractVerticle {

    @Override
    public void start(Future<Void> fut) {
        final String verticleId = super.deploymentID();

        vertx.eventBus().localConsumer("work",
new Handler() {
            @Override
            public void handle(Message objectMessage) {
                String[] data =  objectMessage.body().toString()
                .split(":");
                String work = data[0];
                String jobId = data[1];
                String result = work + "Completed***";
                objectMessage.reply(result);
            }
        });
    }
}

So worker does the work and sends an event on event bus with result.

Now master needs to combine all these results. This is way cool features introduced in Vert.x 3…Composable futures It makes this so easy

CompositeFuture.all(futureList).setHandler(ar -> {
                   if (ar.succeeded()) {
                       ar.result().list().forEach((result) ->
 resultSet.append(((MessageImpl) result).body().toString()));
                       System.out.println(resultSet.toString());
                       // All succeeded
                   } else {
                       // All completed and at least one failed
                   }
               });

Thats all !!.  I hope this will be useful in some of your scenario.

Source code is available at

https://github.com/singhmarut/vertx-master-worker-simulator

Happy coding !!

Master Worker Architecture using Vert.x

MicroService Architecture

Here is a microservice architecture deployment diagram. All Services are docker containers which are registered to Consul Server via Registrator. Client (External – Mobile, UI) makes a call to Bouncer which is our API Proxy. Bouncer has all permissions configured on API URLs. It makes a call to Auth Server which authenticates the request and if successful it passes the Service URL to HAProxy. HAProxy then has rules configured which redirect the URL to exact service.

Service always follow a naming convention so when service is registered in consul then consul-template refreshes the HAProxy configuration in the background.

microservice-architecture-1

Bouncer – API Proxy gateway…all calls come to bouncer to make sure that only authenticated requests are passed to actual services

Auth Server – Single point of authentication and authorization. All applications create permissions and save in Auth Server

External ELB – All public APIs talk to External ELB which in turn are passed to HA Proxy cluster

Internal ELB – All internal APIs are routed through Internal ELBs. There will be URLs which will only be exposed to Internal Services

HA Proxy (Cluster) – The Load balancer cum service router

Consul Server (Cluster) – Centralized Service Registry

Registrator – SideCar application running with each service which updates Consul Cluster about service health

Consul Template – Background application which updates HAProxy whenever there is a change in service configurations

ECS Cluster – AWS ECS where all docker containers are registered. Takes care of dynamically launching new docker containers based on parameters. Autoscaling is handled automatically

There you have major parts in deployment..Please share your comments..Happy Coding !!

A friend posted this question on my FB account

 

Question 1.

I am depicting from diagram that all microservice API will proxy through API gateway. Is it true for internal cluster communication also? If yes, then wouldn’t it be too much load on gateway server for too many micro-services in infrastructure? Or will it be as lightweight as load balancer? Can I assume this gateway as a simple reverse proxy server which is just passing through the request/response then why not use Apache or Nginx?

Answer : 1st….internal cluster communication may or may not happen through Bouncer..depending on what your security requirements are.if you want all API calls to be authenticated then yes it will go through Bouncer…Point to note is.Bouncer is very lightweight Node.JS application so it will have extremely high throughput and because its behind HA Proxy you can always add new nodes…

API Proxy is a reverse proxy but with some logic…Most of the time when you want to expose an API which interacts with multiple microservices you will have to aggregate data..that logic will reside in API Proxy..Its a common pattern in large scale architecture

Question 2.
As per your description, Auth server is also responsible for authorisation here? Then how are you making sure of isolation of microservice, if it’s authorisation logic is shared to auth server the how are you making sure of data integrity which is a security measure which protects against the disclosure of information to parties other than the intended recipient.

Answer:

Auth Server is like “Google auth server”…All resource permissions reside in AuthServer..Authorization server has permissions for each application…These permissions can either be added via API by app or by an Admin UI…so each app can have different permissions…A single user will be given different permissions for different apps so isolation is guaranteed. e.g. I May have “user-create” permission in UserService but I may not have “account-create” permission in AccountService

Who creates permissions, who gives them to users.. and when depends on your design.

 

MicroService Architecture

Why Kafka?

Today Many companies in startup world are completely dependent on AWS infrastructure. Its a good strategy since you do not have to manage your own infrastructure and saves you from lot of headache.

Today we will discuss a bit about brokers available in AWS infrastructure. AWS has mainly 2 types of broker offering

a. SQS (Simple queue service) – More like ActiveMQ, RabbitMQ

b. Kinesis (Distributed, fault tolerant, highly scalable message broker) – less features but optimized for ingesting and delivering massive number of events at extremely low latency.

Design of Kinesis is inspired by Linked-in donated Kafka. Linked in processes billions of events per day using Kafka and it’s apache top level project which is being used in many highly scalable architecture.

I want to focus in this post on some of the key differences between Kinesis and Kafka. As stated in the beginning working with AWS infrastructure is a good thing but over-reliance on AWS infrastructure has some major problems.

a. You are vendor locked-in so tomorrow if you want to shift to Digital Ocean or even own infrastructure you will not be able to do so.

b. You are limited by the restrictions put by AWS like how many transactions you can do per unit of time

so, in the light of above 2 points I will try to explain where Kafka should be used instead of RabbitMQ and in-place of Kinesis

RabbitMQ Pros:

  • Simple to install and manage
  • Excellent routing capabilities based on rules
  • Decent performance
  • Cloud Installation also available (CloudAMQP)

Cons:

  • Not-distributed
  • Unable to scale to high loads (due to non-distributed nature)

Kafka Pros:

  • Amazingly fast reads and writes (due to sequential reads and writes only)
  • Does one thing and one thing only i.e. to transfer messages reliably
  • Does provide load balancing of consumers via partitions of topic so real parallel-processing no ifs and buts
  • No restrictions on transaction numbers unlike Kinesis

Cons:

  • Complicated to setup cluster compared to rabbitmq
  • Dependency on Zookeeper
  • No Routing

So bottom line

  • Use RabbitMQ for any simple use case
  • Use Kafka if you want insane scalability and you are ready to put effort in learning kafka topics and partitions
  • Use Kinesis if setting up kafka is not your cup of tea
Kafka Kinesis RabbitMQ
Routing Basic (Topic Based) Basic (Topic Based) Advanced (Exchange based)
Throughput Extremely high Extremely high
Latency Depends on region (Not available in some regions hence Http call) Very low High (Compared to other 2)
Ease of implementation Moderate..but setting up cluster requires effort Moderate (but identifying number of shards can be tough) Easy
Restrictions on transactions None 5 reads per seconds and 1000 write/sec/shard None
Types of applications High throughput High throughput Low to medium throughput

As always drop me an email if still confused about your use case

Happy Coding !!

Why Kafka?

Kafka Cluster Setup

from my own experience I find that while setting up kafka cluster on AWS we face some issues so just want to highlight them.

a. First setup zookeeper cluster. Let’s say 3 node cluster. Modify each node zoo.conf to publish ip address as internal IP address.

server.id=<internal aws ip1>:2888:3888

server.id=<internal aws ip2>:2888:3888

server.id=<internal aws ip3>:2888:3888

b. Go to kafka server.properties and change broker.id

node1 — broker.id 0

node2 — broker.id 1

node 3 — broker.id2

Change bind address to <internal ip> so that it is not accessible from outside

Change advertised.host.name to <internal ip>

List all zookeeper nodes within your own cluster under setting zookeeper.connect

Start all kafka nodes and you they should be able to create a cluster

Create a topic, publish a message using kafka-console-producer and see if there are not errors.

Kafka Cluster Setup

Vert.x Cluster

Vert.x is an extremely simple event based, non blocking,  library for distributed computing that can be easily embedded in any Java framework of your choice.

For sometime now I have been exploring Vert.x. I was looking for a vert.x cluster sample but I did not find any decent example so I decided to share that with community.

I modified one of the sample available from Vert.x examples and I will explain core parts of it here.

First the cluster has to be configured.

Config hazelcastConfig = new Config();

Config hazelcastConfig = new Config();
hazelcastConfig.getNetworkConfig().getJoin().getTcpIpConfig().addMember("127.0.0.1").setEnabled(true);
hazelcastConfig.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);

ClusterManager mgr = new HazelcastClusterManager(hazelcastConfig);
VertxOptions options = new VertxOptions().setClusterManager(mgr);
Vertx.clusteredVertx(options, res -> {
    if (res.succeeded()) {
        vertx = res.result();
        deploy(vertx,context,mode);
    }
});

This code can be part of your main function. You can also create a cluster config file called “cluster.xml” rather than creating configurations programatically. This should be on classpath of your application so it can be put inside src/main/resources I am going to test this application on my local machine so I am using TCP discovery rather than Multicast for my application. Vert.x underlying uses hazelcast (default) for all it’s clustering capabilities. Hazelcast also comes up with a special configuration for AWS. So if you are going to deploy your application on AWS then that is the configuration you should use.

<?xml version="1.0" encoding="UTF-8"?>
<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-3.2.xsd"
           xmlns="http://www.hazelcast.com/schema/config"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <properties>
       .....
        <property name="hazelcast.wait.seconds.before.join">0</property>
    </properties>

    <group>
        <name>dev</name>
        <password>dev-pass</password>
    </group>
    <management-center enabled="false">http://localhost:8080/mancenter</management-center>
    <network>
        <port auto-increment="true" port-count="10000">5701</port>
        <outbound-ports>
            <!--
            Allowed port range when connecting to other nodes.
            0 or * means use system provided port.
            -->
            <ports>0</ports>
        </outbound-ports>
        <join>
            <!--<multicast enabled="false">-->
            <!--<multicast-group>224.2.2.3</multicast-group>-->
            <!--<multicast-port>54327</multicast-port>-->
            <!--</multicast>-->
            <tcp-ip enabled="true">
                <interface>192.168.1.8</interface>
            </tcp-ip>
            <aws enabled="false">
                .....
            </aws>
        </join>
        <interfaces enabled="false">
            <interface>10.10.1.*</interface>
        </interfaces>        
    </network>
    <partition-group enabled="false"/>
    <executor-service name="default">
        <pool-size>16</pool-size>
        <!--Queue capacity. 0 means Integer.MAX_VALUE.-->
        <queue-capacity>0</queue-capacity>
    </executor-service>
    <map name="__vertx.subs">

        <!--
            Number of backups. If 1 is set as the backup-count for example,
            then all entries of the map will be copied to another JVM for
            fail-safety. 0 means no backup.
        -->
        <backup-count>1</backup-count>
      
        <time-to-live-seconds>0</time-to-live-seconds>
        <max-idle-seconds>0</max-idle-seconds>
        <!--
            Valid values are:
            NONE (no eviction),
            LRU (Least Recently Used),
            LFU (Least Frequently Used).
            NONE is the default.
        -->
        <eviction-policy>NONE</eviction-policy>
        <!--
            Maximum size of the map. When max size is reached,
            map is evicted based on the policy defined.
            Any integer between 0 and Integer.MAX_VALUE. 0 means
            Integer.MAX_VALUE. Default is 0.
        -->
        <max-size policy="PER_NODE">0</max-size>
        <!--
            When max. size is reached, specified percentage of
            the map will be evicted. Any integer between 0 and 100.
            If 25 is set for example, 25% of the entries will
            get evicted.
        -->
        <eviction-percentage>25</eviction-percentage>
        <merge-policy>
      com.hazelcast.map.merge.LatestUpdateMapMergePolicy
</merge-policy>
    </map>

    <!-- Used internally in Vert.x to implement async locks -->
    <semaphore name="__vertx.*">
        <initial-permits>1</initial-permits>
    </semaphore>

</hazelcast>

I have included only important parts and left others from configuration Once we are done with cluster configuration all we have to do is create a verticle and deploy it.

public class ServerVerticle extends AbstractVerticle {

    int port;
    public ServerVerticle(int port){
        super();
        this.port = port;
    }
    @Override
    public void start() throws Exception {
        super.start();
        HttpServer server = vertx.createHttpServer();
        server.requestHandler(req -> {
            if (req.method() == HttpMethod.GET) {
                req.response().setChunked(true);

                if (req.path().equals("/products")) {
                    vertx.eventBus().<String>send(SpringDemoVerticle.ALL_PRODUCTS_ADDRESS, "", result -> {
                        if (result.succeeded()) {
                            req.response().setStatusCode(200).write(result.result().body()).end();
                        } else {
                            req.response().setStatusCode(500).write(result.cause().toString()).end();
                        }
                    });
                } else {
                    req.response().setStatusCode(200).write("Hello from vert.x").end();
                }

            } else {
                // We only support GET for now
                req.response().setStatusCode(405).end();
            }
        });

        server.listen(port);
    }
}

Great…let’s deploy this verticle.

Once configuration has been done you need to deploy verticles in cluster

Vertx.clusteredVertx(options, res -> {
    if (res.succeeded()) {
        Vertx vertx = res.result();
       //You should deploy verticles only when cluster has been initialized
        vertx.deployVerticle(new ServerVerticle(Integer.parseInt(args[0])));
    } else {
    }
});

 

You start the application by giving a port let’s say at port 9000 and start another instance at different port let’s say 9005 and Voila can see both of them start communicating. Complete source code can be found at https://github.com/singhmarut/vertx-cluster.git Happy Coding !!

Vert.x Cluster

Web Application Scalability Part-3 ES vs Redis

Its 3rd article in Web Scalability..Today I am looking at various projects going on in Indian Start ups and I am thrilled by variety of technologies being used.

While this sounds good at the same time I feel we are unable to utilize core strengths of technologies used.

Their usage in architecture will always be dependent on person to person and their experience and comfort level. But some simple things if adopted early stage can give immense benefits.

One such technology is our simple cache which is being used everywhere but alas not in the right way.

It’s called redis. Yes the kind of benefit this humble software can bring to create a highly scalable websites is awesome.

I have seen many startups are using Elastic Search as a “Cache”. Yes, I can not imagine someone doing this until you are total nuts. ES is an indexing technology and not a “Cache”. It will never update in real time which is what is expected from cache…It will never be in sync with underlying storage and hence almost always have stale data.

Using it as primary data source for anything is simply insane and shows the immaturity of the architect who chose it in the first place. It will not work except for most trivial projects.

Anyways ES has it niche but it’s not a cache but Redis is. It provides many advanced data structures and can do hell lot of thing in caching layer itself.

One needs to take a hard look at various data structures provided and how they can be used to create a “real” caching layer on top of it and get same output which one was hoping to get from ES out of the box.

It has sets/hashes/sorted sets/lists etc. etc.

Keep putting data at the rate your site might never experience and retrieve it at blazingly fast speed.

Most importantly its single thread model brings the consistency one might need for creating distributed counters.

Think about it and if you are confused how it can be used for your use case just like always drop an email..who knows I may like your use case interesting enough and help you

Web Application Scalability Part-3 ES vs Redis