Pragmatic Software Development

Monday, November 22, 2010

Groovy and Grails for prototyping

There is much debate about dynamic typing vs. static typing. There is much debate about new languages (Ruby/Groovy/Erlang/Scala) vs. old and generic languages (Java, C++).

Some people hold very strong opinions on these subjects while others are more open to different approaches.

Surprisingly, I found relatively little disagreement on the usefulness of dynamic languages such as Ruby or Groovy for prototyping. Ease of use and speed of development are the most important benefits of these languages. Their potential weaknesses such as performance and maintainability of the code are not that important in prototypes. (more on these potential weaknesses below).

These rapid development tools are perfect for flashing out the vision and model of the product without spending too much time on the underlying technological “plumbing”. Business people are impressed to see a mocked prototype of the product in few days after the initial discussion. Only by seeing it they can come up with a useful feedback and refine the product in their minds. Implementing these changes is as easy as doing the initial prototyping. This rapid feedback and adjustment loop is not only agile but also very cost-efficient because only minimal amount of effort and time is spent.

Grails really shines in creation of domain models because often a model can be described through a set of CRUD screens and their relationships. This representation is a lot friendlier than dry and “tech-y” entity diagrams. It is much more understandable to business folks.

As for possible problems that arise when developing with Groovy and Grails – I believe they can be solved and the cost of solving them is far outweighed by benefits of speed and agility. But I am not trying to sell or pitch yet another technology. I believe that it is extremely well suited for certain types of applications while it’s probably not the best choice for others. Like anything else – this is just another tool in your professional toolbox.

Of course, it all applies not only to Grails but also to Ruby/Rails, Python/Django and other dynamic languages.

Thursday, November 11, 2010

SQL now, NoSQL later

This post is about trade-offs of choosing a NoSQL database vs. a traditional relational database specifically in the context of a new business or a start-up.

In the recent years there is more and more interest in NoSQL databases. One of the main motivations for using such databases is their ability to provide massive scale needed in large applications. Naturally biggest internet sites such as Facebook, Linked-in and the like are heavily using and even developing NoSQL solutions.

As with everything else in life there are always trade-offs. A famous triangle helps to visualize main competing forces when implementing a distributed system. People going for NoSQL solution are usually aware of the trade-offs and adjust their systems accordingly.

Less often mentioned is the lack of NoSQL tooling that has been developed for more traditional relational databases. Such tools include IDE support, code libraries, ORMs, entire CRUD generators and the reporting solutions. These tools significantly expedite development of new software. Just think of Rails' Active Record or Java's Hibernate.

This bring us to the main point: a new, possibly start-up company, is usually more concerned with getting the product out of the door as soon as possible and validating the business model than preparing for “inevitable” Facebook-like success.

Many successful companies tell the same story about iterating between various ideas and models before finally hitting a successful one. In such rapid iteration and evaluation - tools are crucial in providing the necessary productivity and speed. Using ready-made tools for relational databases, ideas can be prototyped, deployed and evaluated much faster than with custom made solutions.

Developers from Facebook and Linked-in emphasize that their initial solutions were the simplest ones possible. Linked-in used to run a relational DB for most of its data. Facebook used to host photos as separate files on a simple NFS. Only when they became successful they invested in scalable NoSQL solutions.

It is not an unreasonable approach therefore to start with the simplest relational DB (mysql?) and rapidly create a working product that can validate the business model. If the model is right, it will generate enough interest and money to evolve the solution with more scalable NoSQL DBs. If the idea is not successful, however, a lot less resources were spend on it and more resources and time can be spent on trying the next idea.

Main objection to this approach is that it requires throwing away a solution and re-creating it with NoSQL. While it’s true to some extent, it is not necessary to throw away all the code. In addition to that, usually after first release, there is enough new knowledge about the system and the business that a significant part of the code will change anyways.

In conclusion: while NoSQL solutions provide huge benefits for massively scalable applications, developing with NoSQL might not be the best option when trying a new business idea. Relational databases provide easier and faster route for experimentation and evolution of a business model.

Sunday, October 31, 2010

Some NoSQL alternatives

From a very superficial view - this is my understanding of some NoSQL alternatives.

CouchDB – stores JS-documents. Light, good replication/synchronization, sometime used for offline storage and mobile devices. Can be used in low-write intensity web-sites. Implemented in Erlang.

MongoDB – stores JS-documents. Fast DB implemented in C++. Can be used in relatively high write-intensity web-sites. Supports multi-master mode. Supports basic sql-like queries. Recently added auto-sharding.

Cassandra – full blown implementation of Amazon’s Dynamo combined with column-based storage. Relatively new product designed for massive scale. Of course, complexity and “raw”-ness are potential downsides. Used by Facebook.

Voldemort. Massively scalabled, replicated, redundant DB with pluggable storage engines. Also based on Amazon’s Dynamo. Provides most sophisticated features and scale at expense of complexity. Used by Linked-in.

Hadoop – can store huge amounts of data that is accessed through map/reduce functions. Frequently used in analytics and data-mining.

Tuesday, October 26, 2010

Managing interruptions

There is no debate that interruptions and frequent context switches significantly affect productivity. This is especially true of the programming activities that require high levels of concentration.

On the other hand interruptions are unavoidable fact of modern working environments. Emails, instant messages, phone and old-fashioned knocks on the door are present everywhere. Furthermore, people in managerial positions are interrupted constantly (I read somewhere that statistically it’s something like every 5 minutes). Some people with high frequency of interruptions simply resign from the usual focused effort and work on “interrupt-basis” giving up on any serious technical work such as code development or even code-review (which also requires high level of concentration).

So what’s a tech-lead or low-level manager to do in order to optimize his time while still being responsive to the needs and requests of others?

So far I’ve discovered two ways of dealing with this major problem:

Minimize interruptions where possible (what a revolutionary idea…)
Combine interruptions and handle them in batches

Minimize interruptions where possible.

Sometimes people get “too connected”. They allow other people to constantly come to them with questions, send instant messages, etc. In such cases it’s important to encourage people to interrupt you only when they can’t find an answer by themselves. If they do otherwise, point out the way by which they could handle the problem themselves.

As for the Instant Messaging, I got completely off it (call me crazy). The problem with instant messaging is that it’s way too easy to communicate and interrupt. People don’t even have to leave their comfortable chairs. That tricks people into abusing the instant messaging with unreasonably simple and trivial matters.

If you’re not available on instant messenger, having to walk to your office or write you an email will force people to think twice whether they really need your help. If there are some people that absolutely require your presence on instant messenger (like your boss), create another account and don’t share it with everybody. I tried it and after initial shock people adjusted and even got used to the new mode of communication.

Combine interruptions and handle them in batches

The idea is to create slices of time when you simply ignore the interruptions. This means not responding to emails, telling people to come later, etc. One technique for that is Pomodoro. Basically you are focused for 25 minutes followed by 5-10 minutes used for handle the accumulated interrupts.

Personally, I even shut down the email client (outlook) so that I don’t see any indication of emails or alerts. I know that you can configure it but having it not run at all provides much stronger isolation from interruptions.

I installed a scheduler program that automatically launches the Outlook each half an hour. Then I spend some time dealing with the accumulated emails.

Before I implemented this technique I used to get interrupted with the emails every 3-5 minutes. I was receiving 100-130 emails per day, and these were not just dummy notifications, most of them required real attention. After I implemented this – I deal with the same amount of emails but much more efficiently and with less effort. Most importantly - it does not break my concentration.

Now using combination of these techniques I managed to significantly increase attention span and focus and therefore improve the overall productivity

In Conclusion:

Try to avoid interruptions whenever you can. Sometimes it means changing your habits and habits of people around you.

When you cannot avoid interruptions at least try to combine them and handle them in batches.

Good luck.

Tuesday, September 21, 2010

Bad Attitudes of Agile

Article:Bad Attitudes of Agile: "Christopher Goldsbury explores some 'bad attitudes' of Agility - assertions about management, documentation, testing, teams, and schedules that are commonly encountered, but contrary to reality. These bad attitudes find refuge and justification in Agile despite the fact they are false. Addressing these viewpoints before they, potentially, darken a good movement is essential. By Christopher Goldsbury"

Wednesday, September 15, 2010

Big Ball of Mud, Still the Most Popular Software Design

Big Ball of Mud, Still the Most Popular Software Design:
"Big Ball of Mud, is a code jungle which is haphazardly structured, sprawling, sloppy and connected by duct-tape. Over the years we have been introduced to various guidelines such as SOLID, GRASP and KISS amongst age old, high cohesion and low coupling to deal with this Mud. However, the situation still remains bleak and Big Ball of Mud seems to be a popular way to design and architect software.
...

Interestingly, as per FJ, Yoder felt that many aspects of Agile directly lead to mud. These included,
1) Lack of upfront design
2) Late changes to the requirements
3) Late changes to the architecture
4) Piecemeal growth

By Vikas Hazrati"

Monday, September 13, 2010

Optimize by measuring, not guessing.

Although it sounds obvious, it’s amazing how many times people try to deal with performance/scalability problems by guessing where the problem is coming from. I have seen it countless times. People come up with something like “it must be the XYZ module that’s taking so long, let me go and tune it”. Or “the serialization is slowing everything down – we need to switch to a binary format”.

I understand that people that developed the application justly believe that they know its good spots and its bad spots. But the problem is that performance degradation can come from so many other areas that it’s very difficult to accurately guess about the root cause of the problem even if you know the application inside and out.

In my experience the best way to improve the performance is to treat the application like a black box. This means abandoning all preconceptions about internal workings of the system. Just put the application under a good profiler (well, duh!) and observe. Many times I was surprised how unrelated the real bottleneck was to the suspected bottleneck. Often the bottlenecks are found in very low-level areas of the code that are not even considered when thinking about potential hot-spots. I remember how once after exhaustive attempts to improve performance of certain areas of application logic, the problem turned out to be in the low-level message routing loop.

Other times the problem can come from 3^rd party components such as jdbc drivers and DBs that are usually not even on the radar.

Even if people insist on “knowing” where the problem is coming from – what they actually have is a hypothesis, not an established fact. To prove the hypothesis they need to collect empirical data supporting it before proceeding on the assumption that the hypothesis is correct.

In conclusion: resist the temptation to guess about root causes of performance or scalability problems. Be scientific about it. Be methodical about it. It will save you time and frustration of chasing the wrong problem.