Microservice Architecture and its Challenges

Microservices architecture is becoming increasingly popular while building large scale applications as it provides n number of benefits

  • Separate lifecycle of each service so services can be deployed independently which means services can evolve separately.
  • Each service can be fine tuned for different SLAs and scalability
  • Each Service can be developed using different stack
  • Each service can be monitored separately

However, Microservice service architecture is not a free lunch. It throws many challenges that must be discussed and dealt with even before venturing into this (unchartered) territory.

  • Authorization – How do you make sure that particular service and it’s APIs can be called only when user is authorised to do so.
  • Data Consistency – ACID is no longer available, deal with eventual consistency.
  • Service Discovery – How to find and interact with new services. If there are lots of services then how to make sure that they are discoverable from other services
  • Deployment – What if there is a hierarchy in microservice dependency
    • A -> B -> C
  • SLA – Multiple hops in a request will add to latency and affect your SLA
  • Fault Tolerance – How to handle Cascading Failure
  • Monitoring – (How to monitor a system which has 100s or even 1000s of services)
  • Tracing  – Logging & Request Tracing (A message will travel across many boundaries so how to nail down where message got lost/delayed)
  • Tech Stack – Selecting a stack (to go with single stack or multiple stacks?)
  • Packaging and Deploying – (How to come up with a uniform way of packaging and delivering services )
  • Debugging – During development phase how to make sure that developers are able to debug code as effectively as they do in monolithic system

 

 

Microservice Architecture and its Challenges

Rust Concurrency

rustFor a long time I have been thinking about writing a sample program in Rust “the” new systems language. I have done coding in C++ for initial 5 years of my career before I moved on completely to Java and recently in one of my products a requirement came up that a low latency high performance component had to be developed.

As I have written by default Java was a default choice as its my first choice anyways. However I realized that this component could not afford non deterministic nature of garbage collector.

So need was to write program where I could have control over exact memory deallocation without worrying about “stop the world” garbage collection. Natural Choice was C++ but programming is all about having fun and I wanted to try out something new and C++ threading support and syntax is not all that great even in C++11.

So I decided to try out Go. but again Go had an issue of garbage collection and same fear of non determinism creeped in.

So time to try out Rust.

Program is simple but can be extended to lot of other scenarios.

One thread keeps spitting out data at some regular intervals. A vector keeps track of generated data.

Other thread keeps ticking at regular intervals (100ms or so) and whenever there are items which have elapsed time greater than a threshold those items are expired. Same as cache TTL.

use std::thread;
    use std::sync::mpsc;
    use std::time::{Duration,Instant};
    use std::collections::HashMap;

   //Define struct
    #[derive(Clone)]
    struct Item {
        created_at: Instant,
        id:i64,
        pub description: String
    }
//Implement Item
    impl Item {

        pub fn new(id: i64,description: String) -> Item {
            Item {
                created_at: Instant::now(),
                id: id,
                description: description
            }
        }

        fn created_at(&self) -> Instant {
            self.created_at
        }

        fn id(&self) -> i64 {
            self.id
        }
    }


    fn main() {
        let (sender, receiver) = mpsc::channel(); //Creat  multiple publisher single receiver channel
        let sender_pop = sender.clone(); //clone sender

        //Create a thread that sends pop every 2 seconds
        thread::spawn(move || {
            //Create infinite loop
            loop {
                thread::sleep(Duration::from_millis(100));
                sender_pop.send(Item::new(-1,String::from("Pop"))).unwrap();
            }
        });

        //Create a thread that keeps sending data every second t
        thread::spawn(move || {
            let mut val = 1;
            //Create infinite loop
            loop {
                val = val + 1;
                sender.send(Item::new(val,String::from("New"))).unwrap();
                thread::sleep(Duration::from_millis(1000));
                //Break out of loop if you want to
//                if val == 10 {
//                    println!("OK, that's enough");
//                    // Exit this loop
//                    break;
//                }
            }
        });
        //Create a mutable vector
        let mut vals: Vec<Item> = Vec::new(); 
        let ttl = 5; //TTL in seconds
        //Receive items in non blocking fashion
        for received in receiver {
            //let item = &received;
            let mut item = &received;
            let newItem: Item  = item.clone();
            match item.description.as_ref(){
                "Pop" => {
                    println!("Pop");
                    vals.retain(|ref x| Instant::now().duration_since(x.created_at).as_secs() < ttl);

                },
                _ => {
                    vals.push(newItem);
                }
            }
        }
    }

That’s it. You have done synchronisation between threads without any race condition. That’s how cool Rust is.

In the next blog we will try to send notification whenever items are expired.

Happy Coding !!

Rust Concurrency

Code Quality Guidelines

Coding guidelines are extremely important part of a professional developer’s day to day practices.

Following these guidelines differentiate between an experienced developer and a rookie.

It surprises me that so many companies still ignore them and produce poor quality code that results in very expensive maintenance over the period and is so fragile that every time you add a new feature immediately bugs creep in.

I am sharing some of these guidelines which are far from exhaustive but are most important for me. Some things people might not agree with but these are my experiences and many of them are borrowed from classic tests.

Coding Standards

General Coding guidelines

These are general predefined standards for developing a code. This reduces

  1. Naming Conventions should be descriptive (Variable as well as functions).
  2. Your application must have separate static and dynamic parts.
  3. No Hard Coding. Find an appropriate place where you can define constants or enums.
  4. Prefer simplicity over complexity. If your code is turning out to be very complex most likely you are doing something wrong. As the saying goes its “hard to build simple things”
  5. Avoid premature optimization. Define premature optimization for your own use case. Well sounds awkward. Trust me it is. Only experience can tell you what does this really mean
  6. Always look for possibility of following a standard Design Pattern. Tweak it for your own use case
  7. Strictly prohibit repetitive code. If code is repeating it’s a candidate for refactoring.
  8. Always align your code properly before committing code

Class Design

  1. Class should not be more than 600 lines.
  2. Constructor should not have any complex logic and has to be exception safe.
  3. Prefer composition over inheritance
  4. Follow one responsibility rule everywhere
  5. Design for extensibility
  6. If in Object Oriented language always define an interface
  7. Avoid circular dependency. If working with a web framework consider using

Comments and Error Messages

  1. Write comments at all critical places in your code including variable name, their usage, function signature (input/output/parameters).W
  1. Work with error messages framework. Using error codes for displaying error messages is confusing as it’s hard to figure out which error code is coming from which place. To avoid this chaos, it is recommended to use error message framework.

If/else Statements

  1. Do not write deep nested if else statements.
  2. If nesting is getting deeper break your code into multiple functions
  3. Operator precedence for your language can introduce nasty bugs in your code which are extremely hard to debug. Follow a policy of using parenthesis while writing long if else conditions.

Implement OOPS

  1. It is recommended to implement OOPS in your code as much as possible.
  2. Program to an interface (contract),not class. Do not change the interface as much as possible
  3. Try to make an abstract class for a business service (in case of python/C++, interface in case of Java).
  4. Follow DRY Principle (Don’t repeate yourself). Use Design Patterns to promote code reusability

Java Best Practices

  1. Use interface when declaring collection variable like Map<String,Object> = new HashMap<String,Object>();
  2. Avoid using Object as much as possible. Thrive for TypeSafety
  3. Use StringBuilder for performance and safety
  4. Use java.time package when dealing with deals https://www.programcreek.com/java-api-examples/index.php?api=org.joda.time.format.DateTimeFormatterBuilder
  5. Use same timezone everywhere in application
  6. NativeQuery also takes class as a parameter so try using this overload…https://vladmihalcea.com/the-jpa-entitymanager-createnativequery-is-a-magic-wand/

Functions

  1. Function should not be more than 25 lines.
  2. Always check for valid parameters inside public functions. Throw an exception to report an error in params
  3. To group the statements logically, try to divide different sections of a function into other smaller functions. E.g. Separate function for initializing values for every possible activity.
  4. Use functional programming capabilities if your stack supports it. I.e. pass around functions to write reusable code.
  5. Follow Single Responsibility Rule as closely as possible.
  6. Functions have to be testable (I should be able to write unit test case for this function). In other words promote loose coupling via Dependency Injection or otherwise.
  7. To continue with loose coupling follow the rule “Prefer composition over inheritance”.
  8. If you are working with Java8 Never return null. Consider returning Optional
  9. Try to avoid multiple return statements. This can put nasty bugs inside programs so it’s best to avoid them as much as possible.
  10. Check Big O Complexity of algorithm you are writing. Especially for the case, where you are writing a lot of lines of code or for functions which are on critical path.

Function overloading should follow convention

  1. foo(int), foo(int,double), foo(int, double, object) i.e. least needed parameter at the last.

 

LAYERED ARCHITECTURE

Follow layered architecture in true spirit. Upper Layer should call into lower layers and each layer has to be designed for specific purpose. E.g. while following MVC, Logic in views has to be related to view and all heavy lifting shall be done by service layer.

Package Structure andnaming conventions

  1. All Java Packages should start with com.broctagon. Check for specific naming convention in your stack but topmost package has to be com.broctagon.
  1. Define functions in packages instead of utility. It’s a common malpractice to put every seemingly useful function inside utility classes. And while writing code it becomes difficult to look into these packages. If it’s a business utility function then try to find a proper package for it rather than putting function inside utility classes. Utility classes generally shall have function related to common tasks like String Reverse or some Math functions or may be email format checking utility.

 

 Logging/Tracing

  1. It is recommended to use logging, wherever possible. Purpose of the logging is to diagnose any potential issues in production. Logging is useful but it incurs significant overhead on the application so it must be used wisely and only information required shall be logged.
  1. Logging should not be cluttered, it must follow same consistent pattern across the application. Identify a pattern for logging for your specific use case
  2. Logging libraries are incredibly useful. Use their package level capabilities to switch on/off selective logging at different levels.

 

 Eexception Handling

  1. Do not suppress exceptions
  2. If an exception is explicitly raised in a function then it should not be handled in that same function. Create a separate function to handle exception and process.
  3. Do not suppress original exception even if you have to create a new exception
  4. Try to use already available functions in logging libraries.
  5. Comment on bypassing function i.e if we are passing any exception then mention in comment why we are doing this.
  6. Try following naming convention for exceptions as per your language e.g. Exception suffix in Java
  7. Do not write complex code in handler. Lot of times this code block throws an exception and hides original exception
  8. Read about exception handling best practices for your respective language and follow same.

WEB PROGRAMMING

  1. Always follow MVC pattern
  2. Do not bloat your controllers
  3. Make your URLs simple and easily understandable by end user
    1. “/admin/orderhistory” should be changed to  “/admin/order/history”
  4. Make your services code “testable” which means loose coupling

Spring Best Practices

  • Always Follow Builder pattern for response entity
    • ResponseEntity.ok(“success”);

Releasing the software

  1. All Java services should follow microservices architecture and be packaged as Fat Jar.
  2. Docker is used to deploy the software
Code Quality Guidelines

Serverless Architecture – AWS Lambda

aws-lambda

I want to write this post about my views about serverless architecture (specifically AWS Lambda) which all cloud service providers like AWS are promoting as “holy grail” for solving all problems.  This post is targeting developers who understand that every technology has a limitation and its wise to make an informed decision

Before I start my discussion around this want to state some facts so that we can have a fruitful discussion.

a. All companies including cloud service providers have to make money

b. All companies are afraid of competition and do not want to loose their customers

c. There is no branch of engineering where “one site fits all” approach works.

d. No matter what tools an engineer chooses when you cant find a solution “go back to basics” is the best approach.

e. Lambda architecture in the context of “AWS” is different from lambda architecture in general as many problem with this architecture are AWS specific only.

If you want to understand some issues with “Lambda architecture”

https://www.oreilly.com/ideas/questioning-the-lambda-architecture

Coming to the point Many attempts have been made in the past to find one “holy grail” to find solutions to teething problems. Let’s look at some of these

Problem 1– lets take operating systems Question is why do we have multiple operating systems? Why noone is able to solve “all the problems under sun”? Why so many

Problem 2– Write multiple programs for different OS even if program does same thing. Java solved this problem.Don’t worry about garbage collection, no worry about performance or underlying platforms. After 20 years of research and billions of dollars we only have more languages. If Java would have solved all the problems that it targeted we will never have to learn “node.js”

Problem 3. Learn multiple languages for front-end and backend development. GWT from house of google did solve it. great !!. Where is it now? Why did google decide to stick to Javascript for front end development and created angular?

Problem 4– Integration. In vast variety of protocols and hundreds of disparate systems in a sizeable organization Integration is a major problem. Hence birth of ESB. Where is it now? How many start ups use this? haven’t heard or found anyone

Problem 5 – Modeling business processes. Rule  Engine tried to solve all problems under the sun and be de-facto expert systems. No-doubt it is formidable technology but does it work in most of the scenarios. Answer is a resounding NO.

I can go on and on but you know what I am implying. One size does not fit all. Never have, never will. Probably there will be enough research into AI that technology stack and architecture can be determined by a program itself but I do not see that happening anytime soon.

There are many ways you can architect your applications. Serverless is one of those architectures.In this architecture you do not manage your own resources but someone else does. When and how many are decided by AWS for you. you do not get to choose your operating system, runtime versions etc etc.

Serverless architecture also tries to create an “illusion” that all my problems related to understanding processes behavior and problems with distributed systems will go away and it will scale on its own without “paying any cost”.

Before you find this approach or so called “architecture” fascinating give a call to your friend working for Amazon and ask how many amazon folks are actually using this. So far I have found none. Or even if they tried they do not understand its internals. Can you imagine lot of people in Amazon use “Oracle”. so much for their AWS offerings

Some people have written lengthy articles about Lambdas and its problems. Try to see if you have found answers to questions raised.

https://www.datawire.io/3-reasons-aws-lambda-not-ready-prime-time/

arguments in favor of lambda:

http://thenewstack.io/amazon-web-services-isnt-winning-problems-poses/

There is definitely logic in arguments posted in above blog but I see serious fundamental issue in writers approach of “why do I need to know this”. What kind of approach is this?  One that I do not agree with. Knowing  things in detail makes you better than competition and more adapt at solving problems. Knowing things help you solve critical technical issues and innovate. Yes its effort so what. AWS did not come out of thin air. It adopted many architectural solutions like Dynamo, RDS, Kinesis they are all based on cutting edge research papers.

To be more straight forward there are some points you need to be careful about while using Lambda architecture.

  • Your application will be difficult to debug
  • Encounter bugs which will be extremely hard to debug simply because your production environment will be very different from your local machine and you can never replicate it
  • Poor Error reporting. already shared a blog
  • Lambda warm up time. For batch processing lambda seems absolutely appropriate but for scalable real-time application? Well good luck if it works for you.
  • Timeouts. Lambda will always have a time-out which means that if your code takes more time then defined time-out lambda will never run and keep throwing exceptions. So keep this carefully in mind especially when making external HTTP calls. Even more serious problem is the fact that Lambda will cool down when there is not much to process. This logic of cool down is somewhere hidden without much detail.
  • Multiple environments – Create multiple lambda function..hell with code reuse
  • Application state – Well forget this as lambda has to be stateless. You have absolutely no control on when a process runs and stops.
  • JVM Optimizations – It also needs to be kept in mind that in Java techniques like JIT might be of no use in case of lambda functions. so all JVM optimizations will go out of the window.
  • Throttling – This is something we have faced recently. Throttling limits are absolutely ridiculous for high traffic apps. Yes its a soft limit you can raise requests but isn’t it contradictory to auto-scaling part. I thought this was the problem Lambda was solving in first place.

Bottom line is pretty simple. AWS Lambda is a great tool provided you use it wisely just like any other technology

  • you are fine with limited language support for Lambda.
  • you know how to deal with lambda limitations. timeouts, filepaths, limited memory, sudden restarts, multiple copies of same code,  etc.
  • you want to avoid and for good reasons complexity involved with developing distributed systems
  • analyzing your process performance is beyond your skills. e.g. you have no idea of JVM tuning
  • You are OK with Open-JDK.
  • you can be reasonably that your process will never hung or crash or have memory leak because you wont be able to login to machine and analyze dump using some advanced tool.
Serverless Architecture – AWS Lambda

Apache Kafka – Simple Tutorial

logo

In this post I want to highlight my fascination with Kafka and its usage.

Kafka is a broker just like “RabbitMQ” or “JMS”. So what’s the difference?

Difference are:

  • It is distributed
  • it is fault tolerant – because of messages being replicated across the cluster
  • It does one thing and one thing only i.e. Transferring your messages and does it really well
  • Highly scalable due to its distributed nature
  • Tunable consistency
  • Parallel processing of messages unlike others which do sequential
  • Ordering guarantee per partition

How do you set it up?

Kafka is inherently distributed. So that means you are going to have multiple machine creating a Kafka cluster.

Kafka uses zookeeper for leader election among other things so you need to have zookeeper cluster already running somewhere. otherwise you can go to

https://www.tutorialspoint.com/zookeeper/zookeeper_installation.htm

You install Kafka on all the machines which will participate in Kafka Cluster and then open the ports where Kafka is running. Then provide configuration of all other machines in the cluster in each machine. e.g. if Kafka is running on machines K1,K2,K3 then K1 will have information of K2 and K3 and so son.

Yes its that simple

How does it work?

The way Kafka works is you create a topic, send a message and read message at the other end.

So if there are multiple machines how do you send message to Kafka? Well you keep a list of all the machines inside your code and then send message by high level Kafka Producer (which is a helper class in Kafka Driver). Kafka high level consumer class is available for reading messages.

Before you send a message create a topic first with a “replication factor”” which tells kafka hos many brokers will have the copy of this data

Some important terminologies related to Kafka are:

Topic – Where you publish message. You need to create beforehand

Partition – Number of consumers that can listen to a topic in parallel. Default is 1 but you can create hundreds

Ordering of Messages – Guaranteed for single partition

TTL – Time to live for messages on the disk – default 7 days

Group – Kafka guarantees that a message is only ever read by a single consumer in the group. so if you want that a message be delivered only once then just go and put all consumers in same group.

If you want to go deep here are some useful links

https://kafka.apache.org/08/design.html

http://www.tutorialspoint.com/apache_kafka/apache_kafka_consumer_group_example.htm

Apache Kafka – Simple Tutorial

Why Vert.x?

In this post I will try to throw some light on capabilities and need of this new exciting framework Vert.x.

I have been following this framework for last few years and it is great to see that it is now being adopted by many large companies means it is stabilized now.

Promoted by Redhat vert.x is a very lightweight library for distributed computing.

Today almost all applications are n-tier. This is not an invention anymore but need of the hour. You can not scale a monolithic applications and those days are  long gone when you could just create a scaffolding app in Ruby on rails and keep modifying the same app to run your business.

There are many reasons for why an application should be divided into different components a.k.a services (or microservices). But I see capability to iterate and roll out new features is very important to create a distributed systems.

So today Business layer is divided in n number of different components. This is not something new. Many enterprise applications have been built that way.

What is changing now is the tools that lets you create this distributed architecture without actually going into nitty-gritty of distributed software design.

Let’s consider a scenario. You have a web applications (let’s say ruby on rails app).

You start getting lot of traffic which your single server can not handle anymore.

So what do you do?

You hide behind a load balancer and you spawn a new instance. When new requests come they get redirected to one of the server using some load balancing strategy (round-robin?)

Again if traffic spikes you repeat the process.

There is a problem here

a- It works only when you have monolithic application

b- It does not scale.

If you are looking to build up a high traffic website then you just cant keep adding up servers to load balancer and assume everything will work fine.

So many large applications will divide business layer, the layer which does most of the useful tasks into multiple different components and expose APIs which are consumed by Web Layer.All these business components will be doing some specific tasks and each one of them will be scaled independently.

Great…seems like an scalable solution. But now we have created another problem

How do we manage these different components or services? How do we discover them in system? We can go to our good old load balancer and start assigining each service a DNS and let load balancer do the job.

This seems unmanageable. How many DNS we would like to have and how many times we will have to reconfigure load balacner? And why in this world I would want to have DNS assigned to each of these services..I certainly do not want to expose them to outside world. That job is with my web application. So I should have something better. CORBA is out of question. Java RMI god save me from. Thrift is a possibility but it does not tick all the boxes outlined below

Is there any other way?

What if we could handle this at software layer rather than hardware layer? What if we could do it dynamically? What if we could interact between these services without overhead of HTTP? Turns out all this is possible now with frameworks like Vert.x.

This is what we want to achieve with Vert.x

a. Distributed architecture

b. Fault tolerant applications

c. Highly scalable application layer

d. Dynamic discovery of services (micro?)

e. Remove overhead of Http.

f. Interaction between services without installing a separate software like RabbitMQ.

Now that we have laid the foundation and defined our objectives we will start writing some code for Vert.x.

Happy coding !!

Why Vert.x?

Kafka Cluster Setup

from my own experience I find that while setting up kafka cluster on AWS we face some issues so just want to highlight them.

a. First setup zookeeper cluster. Let’s say 3 node cluster. Modify each node zoo.conf to publish ip address as internal IP address.

server.id=<internal aws ip1>:2888:3888

server.id=<internal aws ip2>:2888:3888

server.id=<internal aws ip3>:2888:3888

b. Go to kafka server.properties and change broker.id

node1 — broker.id 0

node2 — broker.id 1

node 3 — broker.id2

Change bind address to <internal ip> so that it is not accessible from outside

Change advertised.host.name to <internal ip>

List all zookeeper nodes within your own cluster under setting zookeeper.connect

Start all kafka nodes and you they should be able to create a cluster

Create a topic, publish a message using kafka-console-producer and see if there are not errors.

Kafka Cluster Setup