My opinion about PHP

PHP has become one of the primary technologies while developing web applications these days. While working for Talentica  I can recall at that time there was only one project on PHP and mostly people were doing Java. After PHP5 it regained it’s strength and became one of primary stack for developers. I would try to explain what is my thinking about it.
 
There are 4 stacks that are very popular these days for an internet application to be built in
 
a.) Java world – Spring/Grails ( I have worked on both)
 
b.) .Net world – Asp.Net/C# Combination
 
c.) Ruby World – Ruby on Rails
 
d.) PHP – LAMP Stack (Zend/Symfony)
 
Ruby on Rails, PHP frameworks and Grails are built around same philosophy i.e. Rapid Application Development. They follow an approach called Convention over configuration. 
 
They all follow same development style and are supported by dynamic languages Grooy/PHP/Ruby. e.g. MVC is the universally accepted pattern for developing Web Applications now. 
 
An internet application would typically have three layers
a.) Presentation Layer – All frameworks provide a rendering engine to easily create web pages. Some even provide pluggable engines so you get a choice
 
b.) Busienss Layer – Usually is written in one primary language but often interacts with external services
 
c.) DB Layer – ORM like hibernate/GORM/ActiveRecord/Doctrine.
 
Almost all stacks provide an easy to use ORM layer which conceptually works in more or less same manner.
 
I do not see how someone experienced in GORM would have any problem working with Symfony ORM or Active Record.
 
Consider search which is a very important factor for a website. You will integrate a tool like Solr/Elastic Search in your application. It hardly matters what language you use for accessing Solr, you will still be doing similar calls. 
 
As far as MySQL is concerned it is Database and is totally independent of language being used though there might be some differences in syntax of ORM layers. I have worked on C++/.Net/Java and Grails and I find myself equally at ease because underlying remains same.
 
Yes it is true that you can do faster development if you have someone who is experienced only in the stack you are using but things change when it comes to scalability.
 
Having said that largest of the applications today follow an approach called “right tool for the right job”. e.g. FB uses PHP as their Front End application but in the backed they have whole lot of languages including Java/C++/Erlang. That is how they achieve the scalability.
 
Facebook long before created a project called Thrift just to achieve seamless integration between different languages and is widely used today. I have used it myself to call Neo4J API in Java from .Net as .Net apis were not available for this graph database.
 
Even twitter story is very famous and they all solve their scalability problems on JVM and not Ruby on Rails which has been their primary stack for a long time. 
 
PHP by design is not a multithreaded language and I am sure that people when they look for scalability they more often than not try to go to JVM and apache projects. This is called Polyglot style of programming where you do not stick just to one stack but do the right thing.
It is tremendous tool for creating sites quickly and gives many features/plugins out of the box but it can not solve all the problems a successful website will face in future.
 
I know some of the e-commerce company who quickly created a website by using Magento but now find it extremely difficult to scale due to it’s design of database which follows EAV model.
 
Though scalability is again subjective and it depends on what you are looking for. One may argue that many sites will never reach the scale of Facebook and Twitter but my argument is simple that if using right tool is the approach then it will not harm you. It is like Test Driven Development. Some people find it wastage of time but gives you robustness undoubtedly. 
 
Take another scenario of NoSQL. I have used couple of NoSQL solutions and for a start up it is important to save cash, they do not try to Scale UP as bigger machines are quite expensive but scale out using commodity hardware. Hence NoSQL. NoSQL is analogous with Big data these days.
 
Again this is called polyglot persistence and is heavily in use today. It is easier to keep using MySQL but there are problems where MySQL does not scale well hence you look to add a new stack in your development process. 
 
I have introduced new technologies time to time very systematically and that is how I have been able to solve scaling problems. These technologies at the same time have saved considerable time of the team. e.g. Using MongoDB we were able to solve long standing problems of sending and storing bulk emails for campaigns. 
 
So I am of the opinion for different jobs different tools are required and they should be used systematically in a timely fashion. If right strategy is adopted they can give tremendous results in productivity and scalability. 
 
I have written a small article about Apache camel, one of the tools that I used on my blog as well. 
 
As far as my experience with PHP is concerned then apart from fixing some bugs in an existing PHP application. 
 
As part of my job I would not mind using or learning any technology including PHP but my effort would always be in the direction of doing the right thing keeping in mind time and budget constraints. Having worked in different stacks in my career I think this is something I really like about myself.

No SQL and it’s importance

Just attended a conference at thoughtworks office in Delhi. It was a great talk. Neal Ford was phenomenal and he really showed how technical presentations should be given.

They do not have to be boring. To my surprise he has also written a book about presentations.

Anyways, coming to the point. Talk started with introduction to No SQL, what it is and what kind of use cases it might be fit. As expected lot of people were from RDBMS background so it was very hard for them to initially understand the concept of No SQL.

Fortunately that was not the case with me as I have been exploring these technologies for last couple of years and I have delivered some successful projects using Neo4J and MongoDB.

So I would like to put my thought process forward. 

No SQL means that people in SQL world should look out for alternative persistence technologies when need arises. Lot of times when data needs to be stored SQL does not provide a natural way of storing it. 

Take for example hierarchical data and unstructured data. Many to Many relationships are not a pretty sight anyways. 

I found SQL to be very limited in features and capabilties when it comes to storing hierachical data. 

All you can do is create child parent relationship and do recursive queries. As we all know 7 out of 10 times in a big application database is the first culprit and you need real experts to fine tune SQL Queries when you start feeling that application is not behaving upto  expectations and users are drifting away.

No SQL can be divided primarily in four categories

a.) Document Based (Mongo, Couch)

b.) Key Value (Redis, Memcache, Dynamo, Riak)

c.) Columnar Database (Cassandra, HBase)

d.) Graph (InfiniteDB, neo4J).

Out of these 4 graph database have a unique place and easier to decide at least in my experience. Whenever data is hierarchical and relations can not be modeled using RDBMS easily one can go for neo4J. Hierarchical data may require deep traversals and RDBMS definitely does not rock at this. 

Document databases are easiest to use and MongoDB is a sheer pleasure to work with. It gets up and running very easily and have most features compared to any other database when it comes to querying.

So I will divide this post into some headings

When to use No SQL

My answer is always. Hardly there is any application today which does not have unstructured data. Everybody wants to grow so it is most likely that sooner or later you are going to generate data that will be large. Be it from social media, your own click stream capture. Storing Web logs or whatever. You want lot of users to come to your site.  More the merrier so yes you will generate lot of data.

So having a polyglot persistence built in right from the beginning in application is gonna help you at later stage. 

It’s easier to define what kind of use cases No SQL is not a good fit rather than finding good use cases (except big data).

When you need strong ACID support (Financial information specifically). Payments, User registration then I will never think about storing these in a No SQL. Risk is just too great. 

Some people argue like one gentleman at the conference that amazon is using Dynamo for storing user cart information. May be it can be used. But I will not agree with this 100%. Reason is simple. All NoSQL databases are eventual consistent. That means due to replication there is a delay in syncing the data on multiple machines.

So when you run a  query you do not know which copy of data will be returned whether that’s latest information or old information. So if you use Mongo may be there is a chance in theory that user will see his old cart and not latest one and next time he looks at his cart he might be seeing latest one. I would not want this. So consider this use case out of scope for mongodb. 

Some NoSQL dbs like Riak provide vector clocks but they have their own problems

http://docs.basho.com/riak/latest/references/appendices/concepts/Vector-Clocks/

So one has to be very careful in such scenarios. 

Take another use of promotional campaigns. Lot of companies do promotional campaigns and they need to store these huge emails and they even track their performances then it is a definite use case for a NoSQL. Data is huge..it does not have to be transactional and if we loose some data due to some node failure we will not loose our job.

In No SQL world two principles are very prevalent

a.) Prefer redundancy over normalization. Disk is cheap theortecally infinite and No SQL due to in built horizontal scalability have no problem handling data. So when you have to optimize your query do not change your schema but you can store redundant data in separate table suitable for this query only

 Design schema for your queries. Write down use cases and design your data storage accordingly. Do not try to do otherwise as in SQL world.  

b.) Design your app for Consistency .relations/rules/data quality are all handled in application as NoSQL does not guarantee this. There are no joins and locks are at row level. 

Let’s look at some of the use cases for each database

MongoDB

It should be first choice by default..more so when you do not have much experience with No SQL world. Most close to RDBMS supported by excellent client drivers and easily integrated with any stack PHP, Java, Python, Node JS you name it

It can be used as a general purpose database. Supports secondary indexes. Shards easily

Only problem I find with MongoDB is versioning. I never know what version of data is going to be returned to me.

Neo4J

Most suitable for Social Graphs. Deep traversals. Recommendations. Implemented a subject hierarchy using this and traversals were damn fast. Provides excellent Apis in Java..supports REST. No other fully open source graphDB comes to close to this one.

Hypergraph is comparable but lets down in Apis compare to Neo4J especially traversals.

http://www.neo4j.org/

Column Oriented

Cassandra and HBase

Both are column based with minor differences here and there. Cassandra was developed by Facebook and became an Apache incubator later on. HBase sits on top of hadoop. 

Their use case I see is only one. When you have lots of data. Hundreds of TB to PB and you just want to do Map Reduce though Cassandra provides CQL. You will know when you have that much data.  

Key Value Pair

e.g Memcache/Redis – They are damn good at what they do. Primarily used in caching layer they can server data to your clients insanely fast. Can shard on hundreds of servers easily and redis even provides many useful Data Structures in built 

So here in short I just provided my experience with No SQL. 

Comments are welcome 

 

 

 

 

 

Software Architecture – How to approach ( A simplistic view) – Part 1

While architecting software systems all we have to do is what will change and what will be static. What can be enhanced and what can be removed. I find this principle at the core of Design Patterns.

Whatever can change (and this question needs to be answered on regular basis) one needs to put it on a process space. If these processes are well defined then you can scale them, maintain them, enhance them independently. Each component exposes some functionality what we can call “Service”. Most of the time developers these days relate Service with S in SOA and when it is done they immediately assume it is a “Web Service”. This is wrong understanding from my point of view.

Any piece of software that exposes some functionality can be termed as a Service. It is a generic term and Web Service is just one way of creating and consuming services. More about this in later articles.

Once we answer above questions (which will require deep domain thinking) we can just start developing small systems (Divide and conquer eh h?) and wire them together. Once this wiring infrastructure is already in place a new developer can create a new module when a new functionality is needed and you can be at peace that even if he messes up you can just stop his system.

Easier said then done. Software Architecture had to be developed scientifically and bigger picture needs to be kept in mind. Though there can be thousands of factors which go in determining how architecture for application should be designed but in this blog we will focus on how things can be simplified.

Before we start looking at these principles we need to accept that today lot of design decisions are being influenced by what Twitter/Facebook/LinkedIn are doing.

This is not wrong but it influences our approach. These websites are not built in a day and it takes years to reach that stage. When that time comes you will automatically know what to do.

For now we focus on systematic approach.

Key guidelines

  1. Separation of concerns
  2. Single Responsibility principle.
  3. Don’t repeat yourself (DRY)

There can be many more but these 3 rules in my opinion are the guiding light for a maintainable application.

1 & 2 are complimentary. 3rd principle is made more popular by frameworks like ROR but concept is simple “do not write same kind of code again and again”.

Use shared library, create a web service, deploy in different container instance etc. etc.

We have to remember that discipline is the most important key factor in engineering and software development is know different.

If we follow basics there is very little room for error we can leave. In case of any confusion we fall back to basics again. In practice you will find many people preaching about this but very few following these principles as it comes by practice.

1 & 2 are self descriptive for a seasoned programmer. You try to think as small as possible and expose that functionality. e.g. In an E-commerce site Payment Gateway module has got nothing to do with OrderProcessing module. Though there can be dependency but their logic is totally independent.

So there is a strong reason for separating them out.

If we think small there can be many instances when we can make small modules. Don’t be afraid how modules will work together for now. We will talk about that later.

Apache Camel Sample Project

In this very simple project we will show how camel actually works. We will try to download tweets from our own account and print them in Console.

We can write them to a file if we like. We will also show how polling works. I have created a sample maven project which you can download here

https://github.com/singhmarut/camelSample

Camel’s fundamental unit is route. You define routes and once correctly configured they start doing their work of integration.

A route can only exist between two end points. So what we want to do here is download tweets from “Twitter Account” and “show them in console”. We need a route which actually does this.

So we define a route by extending RouteBuilder class

package org.marut.camel;

import org.apache.camel.builder.RouteBuilder;

//Create a route in Camel Context. A route is a new thread. A Java DSL defines route

//Our objective is to create a route which reads tweets from your twitter account every 1 minute interval and

//sends them to console

public class TwitterRoute extends RouteBuilder {

static final String consumerKey  = “<Your Key>”;

static final String consumerSecret = “<Your secret>”;

static final String accessToken =”<Your access token>”;

static final String accessSecret =”<Your access secret>”;

@Override

public void configure() throws Exception {

//Camel’s Java DSL to define a global Exception handler..

onException(Exception.class).logStackTrace(true).handled(true);

//Time interval of polling

int delay = 60; //in seconds

String twitterUrl = String.format(“twitter://timeline/home?type=polling” +

“&delay=60&consumerKey=%s&consumerSecret=%s&accessToken=%s&accessTokenSecret=%s”,

consumerKey, consumerSecret, accessToken, accessSecret);

//You can redirect tweets to file or to a bean

from(twitterUrl)//.to(“file://home/mytweets.txt”);

//Bean instantition done by Type, Specify method name which gets called when

.bean(TwitterStore.class, “storeTweet”);

}

}

 

Here is TwitterStore class which actually prints message on console. Needless to say you can have your own implementation here.

public class TwitterStore {

 

public void storeTweet(String tweet){

System.out.println(tweet);

}

}

 

That’s it. Once your route is defined you need to register it with Camel Context and start Camel Context

 

TwitterRoute twitterRoute = new TwitterRoute();

try {

CamelContext myCamelContext = new DefaultCamelContext();

myCamelContext.addRoutes(twitterRoute);

myCamelContext.start();

System.in.read();

} catch (Exception e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

Above code goes into main.

You can download maven project from above mentioned link. I have hosted project on github. Just drop me a comment if you need more help. In future we will see more sample projects on apache camel

Software architecture and it’s importance

This post is about understanding architecture of an application and it’s role in web application development.

I have worked in various domains writing softwares. All of us know that although developing software for each business domain throws new challenges and their needs differ vastly but there are many tools and systems which are used commonly everywhere.

So, most of the web applications are following this approach what we can call 3-tier applications.

You have got a presentation layer, business layer and a database layer. When you deploy you should be able to deploy on 3 boxes. Surprisingly lot of times you will end up deploying only on 2 boxes. Reason why some company might do this is that they did not segregate between physical and logical layers. One with all the logic and other one with just RDBMS running in background. Even sites like twitter and GroupOn had one big chunk of monolithic softwares where everybody was coding. Later on they decided to dismantle it into smaller blocks as it was impossible to scale this big piece.

Most of seasoned developers understand the meaning of layers but one important thing that they do not understand that a layer can be logical as well as physical.

A logical layer (separation of concerns, modular programming ..whatever you call them) sits in the same process space as other layers and just segregates logic.

You create a layer and expose it’s various functionality via interface. Client of your layer injects them using some fancy library like Guice, Spring or does a “new”.

Many start-up companies at least in India are taking this approach. They start small develop code in one big chunk and when they grow they start dismantling their code into various modules.

It might have worked for some companies but it puts a huge pressure on ROI and dependency increases on the existing developers. If one of them leaves you are doomed. And If I am not wrong you went with no documentation at all by following practices of agile programming (in wrong way). So all you can do is further increase the costs by offering higher wages :-).

On the other hand new people are feeling bad because they are not able to participate much as multiple activities are going on in parallel. Refactoring of current architecture, new feature development, pressure of management to develop new feature and bug fixing/maintenance of existing systems.

For refactoring you have probably cut a new branch and start working there but by the time you are finished your existing branch in production and your re-factored branch have been diverged so much that it becomes another exercise to merge them.

People do it though but almost on daily basis you will get regressions and all this mess will lead to another mess and increase the costs significantly. It will hamper the growth of the company.

There is no one solution to it. Different situations will require different solutions but I have also seen systems which have been managed pretty well over the years and thousands of new developers come and join the new team yet things remain in control.

So how come these two sides of same coin exists. When technology stack is same aren’t we supposed to have same kind of maintainable system?

What different one company did then others. As per me answer lies in Architecture of the systems.

One company was able to get it’s application’s architecture spot on right from the beginning  Another company was doubtful about it’s growth and quickly wanted to put something on dashboard of users and praying that when they grow they will think about it.

This is also not a bad approach it works in many cases but getting the design right in first place does not take so much effort as it looks like.

Web frameworks like Ruby on Rails, Grails and Java script library JQuery are built on the very concept of plugins. It keeps things under control and these small piece of softwares can be maintained easily. If some component starts behaving badly you just stop using it by unloading that particular component.

It is well known practice adopted by experts and computer scientists that one should write code as if he is writing a library. It automatically brings modularity in system and maintenance become very easy (comparatively). and same is true about architecture also. One should develop modules to be consumed by others. Modules or components are supposed to expose a certain functionality. Others are just consuming it.

Great..looks like this is holy grail for solving our problems. Not yet. Once we have created different components and decided to deploy them on different machines we are actually facing host of other issues Deployment, Inter process communication, Fault Tolerance, Centralized logging to name a few.

We will try to solve these problems one by one in upcoming articles

Apache Camel – Why you should learn?

Recently I encountered a problem related to integration with some third party APIs. These days it is pretty common to outsource some activity for a company, integrating with social networking website, integration with third party tools installed on company premises.

These tools can be built on various technologies and integration between different component becomes an issue when people start writing code for individual component and do not target integration as one cohesive process.

Result is cluttered tightly coupled development based on many assumptions. Necessity is to take a comprehensive uniform approach to tackle this problem.

Various open source ESB and commercial solutions are available for this but in my experience I have found that using an ESB right away is a difficult thing for developers to accept as every ESB faces a certain style of development.

So approach should be slowly introduce the concept involved in an ESB during development process.

One such core ingredient in ESB is called  Enterprise Integration Patterns into core modules and let people learn the concept and admire them.

Enterprise Integration Patterns (EIP) are patterns just like GOF patterns but context is “integration” between disparate systems.

These patterns are well documented in the book by Gregor Hohpe and Bobby Woolf.

If you want to have a quick look at what these patterns are then go to http://camel.apache.org/enterprise-integration-patterns.html

Apache Camel is an out of the box solution which implements all patterns described in EIP and provides integration with many more components which can help developers by avoiding pain of writing boilerplate code. It also gives many useful implementations which will bring a consistency in your large code base that everyone will be able to easily understand.

Patterns are a proven way of building robust scalable and easy maintenance software and following them religiously can make a developer’s life easier and make organization save huge amount of money in long term.

Camel not only provides implementation of these EIP but more importantly it’s syntax is so fluent that it brings a paradigm shift the way one writes integration code.

It is an important framework to learn for a Java developer and I do not find any direct alternative in Microsoft world.

Just to give you an idea consider a simple scenario. You want to integrate to a third party API like a social networking, Courier Service, Text to Speech service or something else. It may so happen that while calling this API network glitch occurs and your call does not succeed. What do you want to do?

You guessed it right..All we can do is retry. So probably you will write a while loop which will check the HTTP status of this API and then if it’s not 200 then just retry. OK you can do that but problem is very common to be faced by every developer in your team so all of them are writing same type of code.

If you are a well managed team then you will probably create a shared library and publish it to your local maven repo and inform about it.

Great..you did that. Let’s add another twist to the tale..You want to retry but after some time. What do you do? Put sleep somewhere in your while loop. Again enhance your library and roll out a new version. Over the period you will come up with many such problems and end up writing same code which is there in camel (only better implemented).

Camel on the other hand provides a nice fluent interface which promotes consistent coding style across the board and you have a tool in your hand which solves lot of your problems out of the box.

In the next article we will try to develop a small application for building a fault tolerant application using Apache Camel