Development estimation is hard

By Barnaby Golden , 13 December, 2012

...there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know.

- Donald Rumsfeld US Secretary of Defense

Estimating the resource use and likely end date of a development project is notoriously difficult. This is particularly true when the project is large. Let's take a look at why that is.

What You Know

The first type of estimating error is on the estimates of work that you know is in scope.

For example:

Task 1 estimate 2 hours (actual 3 hours)

Task 2 estimate 5 hours (actual 4.5 hours)

Task 3 estimate 1 hour (actual 2 hours)

Total error: +1.5 hours

Typically these types of errors come out the estimating phase. An example would be: we thought it would take 2 hours to write the login page, but instead it took 3 hours.

The error size with this type of error should decrease with accumulated experience (particularly if historical metrics are being used). However, it is unlikely that this kind of error will be completely eliminated no matter how experienced the estimation team. One reason for this is that there may be random events such as a server crashing, development tools that have to be re-installed, etc.

An experienced team might get these errors down to +/- 50% or lower. As the estimates for tasks may be under as well as over the overall error should be not as bad as the sum of all the errors (unless there is a systematic problem with estimating).

What You Don’t Know

The second type of error is again on work that you know is in scope, but where you did not fully appreciate what was involved.

For example:

Task 1 estimate 2 hours (actual 3 hours)

Task 2 estimate 5 hours (actual 4.5 hours)

Task 3 estimate 1 hour (actual 2 hours)

Task 4 (unplanned) actual 2 hours

Total error: +3.5 hours

This type of error is usually related to the need to estimate before all unknowns are resolved (e.g. design, architecture, infrastructure, relationship with customer, etc.). As an example, you might only realise when you start implementation that a data export is required from a legacy system.

The unplanned work can be substantial, although you would hope that it is less than a third of the initial estimate.

This type of error has in the past lead to the classic waterfall development approach. An attempt is made to get all the architecture and design known up front.

What They Didn’t Tell You

A third type of error is introduced by the customer (internal or external).

Classic examples of this would be a change of scope or a failure to meet dependencies.

In a contractual arrangement these errors can result in a change request and additional payment. They still represent errors on the initial estimate, but can viewed more as a delta on the original estimate.

Secondary Errors

The final type of error is what I would describe as a secondary error.

These errors cover additional work that is a result of the development process itself. An example of this would be the team completes some new functionality but then realises the changes have made the application run too slowly. They are then forced to re-architect the application to raise performance to acceptable levels.

These are often the most damaging errors as they are very difficult to plan for and usually they occur in the latter stages of a project. Secondary errors can be huge, perhaps even doubling the original estimate.

Sum of All Errors

Add all the errors together and you get the total error on the estimate.

One way to reduce the size of development estimation errors is to reduce the size of work being estimated. This is one reason why agile frameworks often encourage the use of short iterations.

In his book Software Estimation: Demystifying the Black Art Steven McConnell talked about the cone of uncertainty. This describes how at the beginning of a piece of work the development estimation errors are large and uncertainty is high. As the project progresses the uncertainty slowly collapses down and that is when the real value of estimation comes through.