Thursday, December 23, 2010

Trends of 2010

The year is almost over and it’s hard to refrain from nostalgic retrospective of major trends than emerged during the past 12 month. None of these trends started in 2010 but each of them seems to reach an important inflection point.

Clouds are now pretty much at the pinnacle of their hype. Everybody is doing something cloudy now. There is a good reason for it – they do seem to provide a real value in certain situations. Their sweet spot is powering development and testing environments, offloading temporary load spikes and low-investment test-bed for startup ideas. The key to their success is their on-demand elasticity and provisioning. So far it seems that Amazon is by far the clear leader in cloud providers. Other PaaS and SaaS providers like Google App Engine and SalesForce are much more nichy and restrictive.
There are still big unanswered questions with the clouds which include security, regulatory compliance, vendor lock-in, etc. Probably next few years will see rapid evolution of cloud computing that would address some of those problems.

Mobile Development is capturing huge attention and interest. Recent success of iPhone Apps and Smart Mobile devices in general look like an inflection point of a broader shift in computation devices away from the traditional desktops. It is true that many Smart Phones today pack in more processing power than desktop computers 10-15 years ago. If this trend continues then new tools and methods will need to be developed and adjusted for mobile applications that are still in their infancy.

NoSQL databases are becoming ever more widespread. Their ability to scale for today’s information and communication needs often outweighs compromises that are made when moving away from traditional RDBMS. I wrote a short blog about various popular NoSQL options. An interesting and very recent trend is support that major frameworks like Spring begin to provide for NoSQL DBs. Grails GORM now carries support for a number of NoSQL databases.
While many people read NoSQL as “NO SQL”, a better approach would probably be “Not Only SQL” because relational database are not going to go away any time soon and still provide very valuable services.

Functional Programming is getting more and more publicity. It seems that Functional Renaissance started from necessity to write programs targeted at multi-core computers. Functional Programming paradigms lend to better distribution of computations among the CPU “heads”. A proliferation of functional languages is now emerging: from grand-daddy of all functional languages Lisp, to new kids on the block - Scala, Closure, Erlang, Haskell and F#.  Some languages such as Scala attempt to bridge Object Oriented and Functional programming by allowing the developer to choose the most suitable tools within the language.

There are many more interesting developments in such areas as Social programming and Agile methodologies but they are somewhat less pronounced than the trends that I have mentioned.

Certainly year 2010 feels like an important stage in development of computing and software in many important areas. Major paradigm shifts are occurring in the whole information processing universe. It will be interesting to see how these trends evolve and what new trends are in store for 2011 and beyond.


Wednesday, December 8, 2010

Vertical silos vs. Horizontal affinity

It is safe to say that Agile methodologies are considered mainstream this day and age. One of the main precepts of the Agile methodologies is using of cross-functional teams and delivering applications in vertical slices for rapid feedback and early exercise of integration.

There can be no debate that the feedback loop and early integration is critical for the success of the project. However, what happens in the systems where infrastructure is significant, perhaps biggest, part of the system? It is hard to create slices of the infrastructure based on vertical slices of functionality being developed for the current iteration/sprint/user-story.

Such a low-level and application specific infrastructure has much higher affinity within itself than with any of the vertical silos where it might be used. It makes sense therefore to develop this infrastructure separately from the vertical slices and leverage it later with the usual Agile iterations.

An important point in this is that by infrastructure I don’t mean DAO layer or service layer. In typical web-apps such layers are nothing but an amalgamation of parts of vertical pieces of functionality. By infrastructure here I mean something like low-level networking, threading models and abstractions, core messaging abstractions, or low-level application-specific services.

Sometimes parts of an infrastructure can be harvested through an evolutionary architecture and emergent design. But in any big and especially non-plain web-systems there is a significant amount of upfront infrastructure that needs to be in place in order to start developing the “end-user” functionality.

Focusing on a single level of abstraction (SLAP), in this case the low-level infrastructural services increases productivity and helps creating a consistent system.
It is therefore important to carefully examine which parts of the system exhibit highest levels of affinity and develop them together as a fundamental block upon which rest of the system will be built later.

Friday, December 3, 2010

Is Java suitable for today’s web-development?

I know it’s a provocative subject and I might get flamed for it but I have to share my thoughts on the suitability of Java for today’s web development.

As always, the applications are developed to solve business or other human problems. Therefore the closer the application implementation is to the domain of the problem – the faster and better the application development is. If you’re programming bank application you want to operate with bank accounts, not object arrays. If you’re developing gaming application you want to develop players profiles and game rules, not low-level 3d rendering.

If you’re developing a web-application, for example facebook app, you want to deal with RESTful Facebook Graph API, with social features, and at least with somewhat interactive pages. What you don’t want to do is to deal with low-level plumbing like mechanics of reading the URL content. Such details are nothing but an accidencial complexity within the system.

A simple example:
Let’s say you need to read content of RESTful URL something like https://graph.facebook.com/me. All you really want is to get a string with the URL's content.
In Groovy it would look like this: “https://graph.facebook.com/me”.toURL().getText()

After spending few hours trying to find something similar in Java I have reached a conclusion that the best I can do is either:
  1. Use apache HttpClient and deal with hierarchy of its objects. This is a bit more than few lines of code.
  2. Try to find another obscure library.
  3. Write custom code like this:
public static String getText(String url) throws Exception {
        URL website = new URL(url);
        URLConnection connection = website.openConnection();
        BufferedReader in = new BufferedReader(
                                new InputStreamReader(
                                    connection.getInputStream()));

        StringBuilder response = new StringBuilder();
        String inputLine;
        while ((inputLine = in.readLine()) != null) 
            response.append(inputLine);
        in.close();
        return response.toString();
    }

Now, of course, I can write it and encapsulate it in some helper utility object etc.
But the problem is that this code has nothing to do with the problem that I’m trying to address in my application. It represents a much lower level of abstraction. At yet I’m forced to resort to spending time on researching, implementing, testing, maintaining such a basic functionality.

For comparison, Ruby/Groovy/Python developer does not need to know anything about BufferedReaders, InputStreams, using StringBuilder for loops etc. All he really needs to know is the URL, and the string where to put the content.

It might seem that I’m blowing this issue out of proportion but I think it exemplifies a much deeper problem. Java was invented and evolved in a different era. Today’s web applications use different and higher abstractions. This is normal cycle of development of computer languages. Each new generation abstracts low-level details of the previous generation allowing developers to focus more on their problem domain. 15 years ago java simplified development by allowing programmers not to care about pointer arithmetic, destructors and other C++ plumbing.

I know that Java is a generic language and it will continue to be used for a very long time. But guess what: C++, and shockingly even COBOL are still in use, but in only very narrow domains (embedded or low-level programming for C++ and ancient legacy systems for COBOL).

I suspect that Java is going in the same direction. Today I would be hard pressed to start development of a decent web-application in pure Java.

By the way – I had exact same argument about C++ and Java about 10 years ago with my former colleagues.

Monday, November 22, 2010

Groovy and Grails for prototyping

There is much debate about dynamic typing vs. static typing. There is much debate about new languages (Ruby/Groovy/Erlang/Scala) vs. old and generic languages (Java, C++).
Some people hold very strong opinions on these subjects while others are more open to different approaches.

Surprisingly, I found relatively little disagreement on the usefulness of dynamic languages such as Ruby or Groovy for prototyping. Ease of use and speed of development are the most important benefits of these languages. Their potential weaknesses such as performance and maintainability of the code are not that important in prototypes. (more on these potential weaknesses below).

These rapid development tools are perfect for flashing out the vision and model of the product without spending too much time on the underlying technological “plumbing”. Business people are impressed to see a mocked prototype of the product in few days after the initial discussion. Only by seeing it they can come up with a useful feedback and refine the product in their minds. Implementing these changes is as easy as doing the initial prototyping. This rapid feedback and adjustment loop is not only agile but also very cost-efficient because only minimal amount of effort and time is spent.

Grails really shines in creation of domain models because often a model can be described through a set of CRUD screens and their relationships. This representation is a lot friendlier than dry and “tech-y” entity diagrams. It is much more understandable to business folks.

As for possible problems that arise when developing with Groovy and Grails – I believe they can be solved and the cost of solving them is far outweighed by benefits of speed and agility. But I am not trying to sell or pitch yet another technology. I believe that it is extremely well suited for certain types of applications while it’s probably not the best choice for others. Like anything else – this is just another tool in your professional toolbox.

Of course, it all applies not only to Grails but also to Ruby/Rails, Python/Django and other dynamic languages.


Thursday, November 11, 2010

SQL now, NoSQL later


This post is about trade-offs of choosing a NoSQL database vs. a traditional relational database specifically in the context of a new business or a start-up.

In the recent years there is more and more interest in NoSQL databases. One of the main motivations for using such databases is their ability to provide massive scale needed in large applications. Naturally biggest internet sites such as Facebook, Linked-in and the like are heavily using and even developing NoSQL solutions.

As with everything else in life there are always trade-offs. A famous triangle helps to visualize main competing forces when implementing a distributed system. People going for NoSQL solution are usually aware of the trade-offs and adjust their systems accordingly.

Less often mentioned is the lack of NoSQL tooling that has been developed for more traditional relational databases. Such tools include IDE support, code libraries, ORMs, entire CRUD generators and the reporting solutions. These tools significantly expedite development of new software. Just think of Rails' Active Record or Java's Hibernate.

This bring us to the main point: a new, possibly start-up company, is usually more concerned with getting the product out of the door as soon as possible and validating the business model than preparing for “inevitable” Facebook-like success.
Many successful companies tell the same story about iterating between various ideas and models before finally hitting a successful one. In such rapid iteration and evaluation - tools are crucial in providing the necessary productivity and speed. Using ready-made tools for relational databases, ideas can be prototyped, deployed and evaluated much faster than with custom made solutions.

Developers from Facebook and Linked-in emphasize that their initial solutions were the simplest ones possible. Linked-in used to run a relational DB for most of its data. Facebook used to host photos as separate files on a simple NFS. Only when they became successful they invested in scalable NoSQL solutions.

It is not an unreasonable approach therefore to start with the simplest relational DB (mysql?) and rapidly create a working product that can validate the business model. If the model is right, it will generate enough interest and money to evolve the solution with more scalable NoSQL DBs. If the idea is not successful, however, a lot less resources were spend on it and more resources and time can be spent on trying the next idea.

Main objection to this approach is that it requires throwing away a solution and re-creating it with NoSQL. While it’s true to some extent, it is not necessary to throw away all the code. In addition to that, usually after first release, there is enough new knowledge about the system and the business that a significant part of the code will change anyways.

In conclusion: while NoSQL solutions provide huge benefits for massively scalable applications, developing with NoSQL might not be the best option when trying a new business idea. Relational databases provide easier and faster route for experimentation and evolution of a business model.

Sunday, October 31, 2010

Some NoSQL alternatives


From a very superficial view - this is my understanding of some NoSQL alternatives.

CouchDB – stores JS-documents. Light, good replication/synchronization, sometime used for offline storage and mobile devices. Can be used in low-write intensity web-sites. Implemented in Erlang.

MongoDB – stores JS-documents. Fast DB implemented in C++. Can be used in relatively high write-intensity web-sites. Supports multi-master mode. Supports basic sql-like queries. Recently added auto-sharding.
 
Cassandra – full blown implementation of Amazon’s Dynamo combined with column-based storage. Relatively new product designed for massive scale. Of course, complexity and “raw”-ness are potential downsides. Used by Facebook.

Voldemort.  Massively scalabled, replicated, redundant DB with pluggable storage engines. Also based on Amazon’s Dynamo. Provides most sophisticated features and scale at expense of complexity. Used by Linked-in.

Hadoop – can store huge amounts of data that is accessed through map/reduce functions. Frequently used in analytics and data-mining.






Tuesday, October 26, 2010

Managing interruptions

There is no debate that interruptions and frequent context switches significantly affect productivity. This is especially true of the programming activities that require high levels of concentration.

On the other hand interruptions are unavoidable fact of modern working environments. Emails, instant messages, phone and old-fashioned knocks on the door are present everywhere. Furthermore, people in managerial positions are interrupted constantly (I read somewhere that statistically it’s something like every 5 minutes). Some people with high frequency of interruptions simply resign from the usual focused effort and work on “interrupt-basis” giving up on any serious technical work such as code development or even code-review (which also requires high level of concentration).

So what’s a tech-lead or low-level manager to do in order to optimize his time while still being responsive to the needs and requests of others?

So far I’ve discovered two ways of dealing with this major problem:
  1. Minimize interruptions where possible (what a revolutionary idea…)
  2. Combine interruptions and handle them in batches
Minimize interruptions where possible.
Sometimes people get “too connected”. They allow other people to constantly come to them with questions, send instant messages, etc. In such cases it’s important to encourage people to interrupt you only when they can’t find an answer by themselves. If they do otherwise, point out the way by which they could handle the problem themselves.

As for the Instant Messaging, I got completely off it (call me crazy). The problem with instant messaging is that it’s way too easy to communicate and interrupt. People don’t even have to leave their comfortable chairs. That tricks people into abusing the instant messaging with unreasonably simple and trivial matters.
If you’re not available on instant messenger, having to walk to your office or write you an email will force people to think twice whether they really need your help. If there are some people that absolutely require your presence on instant messenger (like your boss), create another account and don’t share it with everybody. I tried it and after initial shock people adjusted and even got used to the new mode of communication.

Combine interruptions and handle them in batches
The idea is to create slices of time when you simply ignore the interruptions. This means not responding to emails, telling people to come later, etc. One technique for that is Pomodoro. Basically you are focused for 25 minutes followed by 5-10 minutes used for handle the accumulated interrupts.
Personally, I even shut down the email client (outlook) so that I don’t see any indication of emails or alerts. I know that you can configure it but having it not run at all provides much stronger isolation from interruptions.
I installed a scheduler program that automatically launches the Outlook each half an hour. Then I spend some time dealing with the accumulated emails.
Before I implemented this technique I used to get interrupted with the emails every 3-5 minutes. I was receiving 100-130 emails per day, and these were not just dummy notifications, most of them required real attention. After I implemented this – I deal with the same amount of emails but much more efficiently and with less effort. Most importantly - it does not break my concentration.

Now using combination of these techniques I managed to significantly increase attention span and focus and therefore improve the overall productivity

In Conclusion:
Try to avoid interruptions whenever you can. Sometimes it means changing your habits and habits of people around you.
When you cannot avoid interruptions at least try to combine them and handle them in batches.

Good luck.

Tuesday, September 21, 2010

Bad Attitudes of Agile

Article:Bad Attitudes of Agile: "Christopher Goldsbury explores some 'bad attitudes' of Agility - assertions about management, documentation, testing, teams, and schedules that are commonly encountered, but contrary to reality. These bad attitudes find refuge and justification in Agile despite the fact they are false. Addressing these viewpoints before they, potentially, darken a good movement is essential. By Christopher Goldsbury"

Wednesday, September 15, 2010

Big Ball of Mud, Still the Most Popular Software Design

Big Ball of Mud, Still the Most Popular Software Design:
"Big Ball of Mud, is a code jungle which is haphazardly structured, sprawling, sloppy and connected by duct-tape. Over the years we have been introduced to various guidelines such as SOLID, GRASP and KISS amongst age old, high cohesion and low coupling to deal with this Mud. However, the situation still remains bleak and Big Ball of Mud seems to be a popular way to design and architect software.
...

Interestingly, as per FJ, Yoder felt that many aspects of Agile directly lead to mud. These included,
1) Lack of upfront design
2) Late changes to the requirements
3) Late changes to the architecture
4) Piecemeal growth

      By Vikas Hazrati"

      Monday, September 13, 2010

      Optimize by measuring, not guessing.


      Although it sounds obvious, it’s amazing how many times people try to deal with performance/scalability problems by guessing where the problem is coming from. I have seen it countless times. People come up with something like “it must be the XYZ module that’s taking so long, let me go and tune it”. Or “the serialization is slowing everything down – we need to switch to a binary format”.

      I understand that people that developed the application justly believe that they know its good spots and its bad spots. But the problem is that performance degradation can come from so many other areas that it’s very difficult to accurately guess about the root cause of the problem even if you know the application inside and out.

      In my experience the best way to improve the performance is to treat the application like a black box. This means abandoning all preconceptions about internal workings of the system. Just put the application under a good profiler (well, duh!) and observe. Many times I was surprised how unrelated the real bottleneck was to the suspected bottleneck. Often the bottlenecks are found in very low-level areas of the code that are not even considered when thinking about potential hot-spots. I remember how once after exhaustive attempts to improve performance of certain areas of application logic, the problem turned out to be in the low-level message routing loop.
      Other times the problem can come from 3rd party components such as jdbc drivers and DBs that are usually not even on the radar.

      Even if people insist on “knowing” where the problem is coming from – what they actually have is a hypothesis, not an established fact. To prove the hypothesis they need to collect empirical data supporting it before proceeding on the assumption that the hypothesis is correct.

      In conclusion: resist the temptation to guess about root causes of performance or scalability problems. Be scientific about it. Be methodical about it. It will save you time and frustration of chasing the wrong problem.

      Monday, August 23, 2010

      Importance of independent testing

      In this age of TDD, people are getting an impression that TDD or other forms of developer-testing virtually guarantee absence of bugs. They think that 100% code coverage means that bugs have nowhere to hide.

      The problem is that developer-tests are made with the same assumptions and logical errors as the code itself. If you don’t think about a particular scenario that would cause your code to malfunction, you are not going to write a test for it.

      Of course, modules/classes/services can and should be unit and integration-tested. That provides guarantee against basic errors and allows conducting safe code-refactoring by verifying that existing functionality is not broken. Personally I’m a big fan of integration-testing and intend to blog on it later.

      People should also keep in mind that despite wide-spread beliefs in extereme effectiveness of developer-testing, empirical studies show moderate rate of bugs detection though developer-tests. In Code Complete 2 Steve McConnel reports that “Individual testing steps (unit test, component test, and integration test) typically find less than 50 percent of the errors present each. The combination of testing steps often finds less than 60 percent of the errors present (Jones 1998).”[McConnell, 2004]
      That compares with about 80% detection rate achieved by modeling and prototyping, and 75% detection rate achieved by formal code reviews. You might not agree with the conclusions but the numbers are accumulated over a large number of projects and chances are that they are at least somewhat reflective of the reality.

      I truly believe that in addition to all developer testing (include unit-testing, reviews and prototyping), independent QA/Functional testing is indispensable in finding real bugs that would otherwise make it into the production.

      First of all, a person or a team of testers are free from the assumptions that led to the bugs in the first place. Second, a fresh pair of eyes and an experience (and talent) in finding faults of the system are what is needed to uncover the dark corners of the software.
      Third, I’m a firm believer that a division of labor leads to a higher productivity. Without going into long economic theories, it is known that people that specialize in a particular skill tend to be more productive in it. From this stems the overall increased productivity of the team and the company. A tester is a specialist, well…in testing. That’s what he knows to do best. Just as a developer, he accumulated a bag of techniques and tricks that help him solve testing puzzles. Because of this he can find more bugs, more boundary-conditions, more unexpected inputs than the developer.

      For these reasons I believe that rigorous testing by an independent team of testers is absolutely necessary in addition to all forms of developer-testing.

      Sunday, August 15, 2010

      Steve McConnell rules!

      As person with over 15 years in software business I must admit that the biggest mistake I made is that I assumed that general programming and later management knowledge just comes to you with the experience.

      Even if it is partially so, you’re inevitably exposed to a relatively small sample of problems and solutions. Furthermore, you’re bound to learn only from your own mistakes. In many cases people see those mistakes and the mistakes made by other people around them (including management) and assume that these mistakes are somehow unique to their project or their company.

      But as the saying goes: “There is nothing new under the sun” (Ecclesiastes 1:9). Most of the mistakes that you or others around you made had already been made countless times before. It pays to read serious literature about avoiding such mistakes through use of best software development practices. Further more, this literatures should preferably be read before engaging is big development projects.

      After reading quite a few books on general subjects of software development (besides technology and language-specific books) I can say that I found most valuable information in the books of Steve McConnell.

      When I started reading “Code Complete 2” – it was love at first sight. The book made me completely rethink my approach to writing code. The book includes both low level advice such as recommended naming conventions and sizes of modules as well as high level concepts such as managing complexity though consistent levels of abstraction that is essential to intelligibility of systems. I believe that any serious software professional ought to read this book, preferably before he spends years on making and correcting his own mistakes.

      The next book by McConnell that is more intended to team-leads/architects is “Rapid Development”. Although the book was published in 1996, it’s still very relevant today. Concepts and ideas in this book apply to today’s “agile” world as much as they did to “waterfall” world of past decades. It contains an invaluable list of practices and classic mistakes that every developer/manager encountered at some point. Same as with Code Complete 2, this book is essential reading for any professional software practitioner.

      A more specialized book by McConnell is “Software Estimation”. The book contains exhaustive description of various estimation methodologies along with advice on applicability of each methodology within a given context. It also contains information on estimation of “iterative” (aka agile) projects. I believe that serious software estimation is impossible without basic knowledge of techniques described in this book.

      A common feature of McConnell’s books is reliance on large studies of projects and code-bases. This is particularly valuable because it takes the information from realm of personal experience and opinion into realm of scientific and empirical knowledge that is so much missing in many books on software. I don’t want to start a flame war or invite angry feedback but books such as Uncle Bob’s “Clean Code” lack this kind of empirical proof and are mostly based on author’s opinion (with which I mostly agree, btw). I believe that software profession has evolved for long enough time that we can use information accumulated in countless successful and failed projects to better plan and execute new projects.

      Lastly, I regret that very few colleges and universities include McConnell’s books as part of mandatory learning material. IMHO they contain infinitely more useful information to future software engineers than many abstract and outdated subjects in computer science.

      Wednesday, August 11, 2010

      Caffeine Dependency is Easier to Develop Than You Think [Caffeine]

      This article confirms my personal observation:
      "Does Coffee Work" for Seed Magazine, and comes to mostly the same conclusion we did: yes, if you don't drink it regularly, and go slow with it.

      Caffeine Dependency is Easier to Develop Than You Think [Caffeine]

      Click here to read Caffeine Dependency is Easier to Develop Than You Think


      Java technology zone technical podcast series

      Java technology zone technical podcast series: "For years, the Java zone has brought you top-quality technical content by some of the best minds in the industry. But taking the time to read an in-depth, code-heavy article can be difficult, even if it's about a topic that's critical to your day job. This new podcast series, led by the engaging and technically curious Andrew Glover, provides a new way to get information from the sources you trust most. Each week, we'll publish a new discussion with an expert on the topics that are important to your job."