Web Application Scalability – Service Layer

So In our last post I promised that we will talk about how to make monolithic code into distributed SOA architecture.

Well its not easy. Once you have decided that you want to re-architect the single chunk of software in distributed manner you have to decide about different parts of the system which can be deployed on different machines and still everything will work fine.

There are many problems while doing so

a. How will you host these services (S in SOA)

b. How will you communicate and how serialization/deserialization will happen?

c. How do you make sure that you are able to implement the same workflow which you had in single monolithic component assuming you were doing things sequentially.

 

To answer the first question there are many options available but I will list down some which I have personal experience with

 

a. Design your services as RESTful and deploy them in you preferred container like weblogic

Pros

– Location transparency. You refer to the services using URI..

– Interacting with RESTful service is pretty easy in any language

Cons

– Serious overhead of HTTP protocol

– Serious maintenance issues with container management not to mention huge performance hit

b. Design your services as Java processes and interact with them using a service bus like JMS (ActiveMQ, Fiorano etc.)

c. Design your services as Thrift based and interact with them by passing pre-generated client code

Pros

– Serialization/Deserialization take care by thrift. You just interact with POJOs everywhere

– Lightweight highly scalable proven architecture (Facebook?). Forget about converting from Json to POJO and vice-versa

Cons

– Every time service or schema is changed you need to redistribute client code.

– Schema definition in thrift file has some limitations of its own

d. Design your services as separate Java Processes and have a networking library like ZeroMQ take care of communication

Pros

– ZeroMQ is battle tested and used in many mission critical projects. Takes care of networking needs

Cons

– Low level programming is needed compared to other approaches

– Many scenarios will have to be handled even with this approach so code size will swell up

For long time I had been dealign with Hub and Spoke Model by using a ServiceBus like MSMQ, ActiveMQ

ServiceBus is a popular approach especially in financial institutions who invest in enterprise grade product like TIBCO MQ.

Problem with service bus approach is that you have to write code specific to Messaging Bus. So another layer is added which increases complexity

So for some time I had been tracking if there is a way to remove this layer. Given my experience in Industry there are some areas I believe a company should not invest until absolutely necessary and one such area is networking.

Having said that one day while learning Scala I came across a project called Akka. Akka is a framework which addresses many problems with simple concept of Actors.

Networking and Concurrency provide easy way to program local or distributed services. What you get is complete location transparency as you communicate with remote actors in same way as you communicate with local actors.

There is no hub and spoke model so no new software has to be installed/maintained or written specific code to make things work.

It’s P2P…networking complexities are hidden deep inside and you never have to deal with thread. WOW !! Sounds too good to be true..It is.

Completely asynchronous with a programming paradigm that is easy to understand and concept of actors is already proven in telecom industry.

In the next post we will talk about some sample code for how location transparency can be achieved.

 

 

Web Application Scalability – Service Layer

Software architecture and it’s importance

This post is about understanding architecture of an application and it’s role in web application development.

I have worked in various domains writing softwares. All of us know that although developing software for each business domain throws new challenges and their needs differ vastly but there are many tools and systems which are used commonly everywhere.

So, most of the web applications are following this approach what we can call 3-tier applications.

You have got a presentation layer, business layer and a database layer. When you deploy you should be able to deploy on 3 boxes. Surprisingly lot of times you will end up deploying only on 2 boxes. Reason why some company might do this is that they did not segregate between physical and logical layers. One with all the logic and other one with just RDBMS running in background. Even sites like twitter and GroupOn had one big chunk of monolithic softwares where everybody was coding. Later on they decided to dismantle it into smaller blocks as it was impossible to scale this big piece.

Most of seasoned developers understand the meaning of layers but one important thing that they do not understand that a layer can be logical as well as physical.

A logical layer (separation of concerns, modular programming ..whatever you call them) sits in the same process space as other layers and just segregates logic.

You create a layer and expose it’s various functionality via interface. Client of your layer injects them using some fancy library like Guice, Spring or does a “new”.

Many start-up companies at least in India are taking this approach. They start small develop code in one big chunk and when they grow they start dismantling their code into various modules.

It might have worked for some companies but it puts a huge pressure on ROI and dependency increases on the existing developers. If one of them leaves you are doomed. And If I am not wrong you went with no documentation at all by following practices of agile programming (in wrong way). So all you can do is further increase the costs by offering higher wages :-).

On the other hand new people are feeling bad because they are not able to participate much as multiple activities are going on in parallel. Refactoring of current architecture, new feature development, pressure of management to develop new feature and bug fixing/maintenance of existing systems.

For refactoring you have probably cut a new branch and start working there but by the time you are finished your existing branch in production and your re-factored branch have been diverged so much that it becomes another exercise to merge them.

People do it though but almost on daily basis you will get regressions and all this mess will lead to another mess and increase the costs significantly. It will hamper the growth of the company.

There is no one solution to it. Different situations will require different solutions but I have also seen systems which have been managed pretty well over the years and thousands of new developers come and join the new team yet things remain in control.

So how come these two sides of same coin exists. When technology stack is same aren’t we supposed to have same kind of maintainable system?

What different one company did then others. As per me answer lies in Architecture of the systems.

One company was able to get it’s application’s architecture spot on right from the beginning  Another company was doubtful about it’s growth and quickly wanted to put something on dashboard of users and praying that when they grow they will think about it.

This is also not a bad approach it works in many cases but getting the design right in first place does not take so much effort as it looks like.

Web frameworks like Ruby on Rails, Grails and Java script library JQuery are built on the very concept of plugins. It keeps things under control and these small piece of softwares can be maintained easily. If some component starts behaving badly you just stop using it by unloading that particular component.

It is well known practice adopted by experts and computer scientists that one should write code as if he is writing a library. It automatically brings modularity in system and maintenance become very easy (comparatively). and same is true about architecture also. One should develop modules to be consumed by others. Modules or components are supposed to expose a certain functionality. Others are just consuming it.

Great..looks like this is holy grail for solving our problems. Not yet. Once we have created different components and decided to deploy them on different machines we are actually facing host of other issues Deployment, Inter process communication, Fault Tolerance, Centralized logging to name a few.

We will try to solve these problems one by one in upcoming articles

Software architecture and it’s importance