Thursday, April 28, 2011

Economics of Amazon Outage

As if the world needs another rant about Amazon EC2 outage, I’m going to risk adding my 2 cents to the debate.

On the one hand you have apocalyptic visions on the cloud future focusing on inherent unreliability of public clouds. On the other hand you have people explaining, reasonably enough, that there is nothing new about hardware failures, and talking about necessity of preparation for it by means of redundancy and other fail-safe methods.

I, obviously, see logic in the latter view, and understand that “outage happens:)”, and you need to prepare for it. However, often the missing ingredient in this line of reasoning is the cost of the preparation. In the end, everything boils down to economics. There is a very good description of estimating the cost of the preparation in this article: “The most straightforward approach is to estimate the cost of a failure and then multiply by the probability it will occur.”

Now, if the probability of having such failure is higher in the cloud than in private data-center, and cost of insuring against such failure is also higher, then overall costs, compared to private data center, rise significantly. Many cloud deployments do not include the cost of redundant infrastructure in their economic models. Adding that cost, which often doubles the numbers, can make it more difficult to justify the cloud strategy.

In addition to the cost of insurance against widespread outages, there is the cost of bullet-proofing your software against local “mishaps”. People working with the clouds surely experienced disappearing instances, diminishing processing resources resulting from heavy processing loads of other tenants, etc. Dealing with all this takes time and effort, i.e. money, to address. That money also goes into financial models used to estimate cloud worthiness.

Of course, elasticity of the cloud and its on-demand availability, have virtually no replacement, and will therefore be always needed by small or growing businesses. But for stable and predictable IT needs, it remains to be seen how the economics play out and what makes more sense from the financial perspective.

Thursday, April 14, 2011

How much code coverage do you really need?

This post was prompted by reading a number of categorical tweets from @unclebobmartin. In case you’re not familiar with Uncle Bob – he’s one of the most prominent Software Industry experts, author of Clean Code, signatory to Agile Manifesto. In late nineties he did a profound work on documenting best OO practices (SRP, open/closed, interface segregation, etc). So when he speaks – it’s worth at least consideration.

He takes a maximalistic approach to TDD and unit testing in general. It can be clearly seen from his tweets:
 “Two things. Repeatability and cost. Manual tests are horrifically expensive compared to automated tests.”
“Manual tests aren't tests; they are trials. And since there are humans involved, they are suspect.”
“What you are telling me is that I should be open to the possibility that some code shouldn't be tested. Hmmm..”
100% code coverage isn't an achievement, it's a minimum requirement. If you write a line of code, you'd better test it.”

He goes on to compare software testing with other mundane but critical activities that are considered mandatory in other fields:
“A surgeon on the battlefield may not have time to wash thoroughly, but the risk of death and cost of treatment will be high.”
“Do accountants cover only 80% of their spreadsheets with double entry bookkeeping?”
“How many times have you seen major outages that were due to some silly code that some silly programmer thought wasn't worth testing?”
 
While all these points certainly have merits, they show only one side of the picture. The reality is that not all applications require such a meticulous testing. Not all application are of the same importance as surgeries on a battlefield or accounting of big $$$. (not to mention the “creative” accounting employed in many cases:).

An even more important point is that thorough code coverage does not guarantee absence of bugs. Even Uncle Bob admits that:
“Tests cannot prove the absence of bugs. But tests can prove that code behaves as expected.”
This is obvious considering that same misconceptions and logical mistakes that were put in the code by the developer, are not likely to be discovered by the same developer when testing his own code.

In the end it all boils down to ROI and pragmatism. Some apps need more testing than others. Some modules need more testing than others. Some bugs need more fixing than others. There will always be a judgment call about whether additional time and money spent on automated testing and coverage are justified or are just a premature optimization.