Estimating work effort in agile projects is fundamentally different from traditional methods of estimation. The traditional approach is to estimate using a “bottom-up” technique: detail out all requirements and estimate each task to complete those requirements in hours/days, then use this data to develop the project schedule. Agile projects, by contrast, use a “top-down” approach, using gross-level estimation techniques on feature sets, then employing progressive elaboration and rolling-wave planning methods to drill down to the task level on a just-in-time basis, iteratively uncovering more and more detail each level down. This paper will elaborate on two common techniques for agile estimation (planning poker and affinity grouping), as well as touch on how the results of these exercises provide input into forecasting schedule and budget.
Top-down vs. Bottom-up
The traditional method for estimating projects is to spend several weeks or months at the beginning of a project defining the detailed requirements for the product being built. Once all the known requirements have been elicited and documented, a Gantt chart can be produced showing all the tasks needed to complete the requirements, along with each task estimate. Resources can then be assigned to tasks, and actions such as loading and leveling help to determine the final delivery date and budget. This process is known as a bottom-up method, as all detail regarding the product must be defined before project schedule and cost can be estimated.
In the software industry, the use of the bottom-up method has severe drawbacks due to today's speed of change. Speed of change means that the speed of new development tools and the speed of access to new knowledge is so great that any delay in delivery leaves one open to competitive alternatives and in danger of delivering an obsolete product (Sliger, 2010).
The top-down method addresses this key issue, by using the information currently available to provide gross-level estimates. Rolling-wave planning is then used to incorporate new information as it's learned, further refining estimates and iteratively elaborating with more detail as the project progresses. This method of learning just enough to get started, with a plan to incorporate more knowledge as work outputs evolve, allows the project team to react quickly to adversity and changing market demand.
Gross-level estimation techniques are in use by teams using agile approaches such as Scrum and Extreme Programming, and this paper will cover two of the most popular techniques: Planning Poker and Affinity Grouping. Estimation units used will also be examined, as these units should be such that they cannot be confused with time.
The most popular technique of gross level estimation is Planning Poker, or the use of the Fibonacci sequence to assign a point value to a feature or item (Grenning, 2002). The Fibonacci sequence is a mathematical series of numbers that was introduced in the 13th century and used to explain certain formative aspects of nature, such as the branching of trees. The series is generated by adding the two previous numbers together to get the next value in the sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, and so on. For agile estimation purposes, some of the numbers have been changed, resulting in the following series: 1, 2, 3, 5, 8, 13, 20, 40, 100.
These numbers are represented in a set of playing cards (see Exhibit 1). Team members play “Planning Poker” (Exhibit 2) to provide an estimate in the form of a point value for each item. Here are the steps:
- Each team member gets a set of cards.
- The business owner (who does NOT get to estimate) presents the item to be estimated.
- The item is discussed.
- Each team member privately selects a card representing his/her estimate.
- When everyone is ready, all selected cards are revealed at the same time.
- If all team members selected the same card, then that point value is the estimate.
- If the cards are not the same, the team discusses the estimate with emphasis placed on the outlying values:
- The member who selected the lowest value explains why he/she selected the value.
- The member who selected the highest value explains why he/she selected the value.
- Select again until estimates converge.
- Should lengthy or “in-the-weeds” conversations result, team members may use a two-minute timer to timebox the discussion, selecting again each time the timer runs out, until conversion.
- Repeat for each item (Cohn, 2006, pp. 56–57).
There are several reasons Fibonacci numbers are used, and used in this format. First is the notion that once teams eliminate time as the estimate base, they are less likely to demand more detail and pad estimates. These numbers instead represent relative size, not time. As a result, the estimation exercise goes quite quickly. Teams generally spend roughly two minutes on each item, allowing a backlog of 30 items to be estimated in an hour. The fact that teams are limited to only 9 choices (i.e., point values or cards) also helps speed up the process.
The sequence also provides the right level of detail for smaller and better-understood features, while avoiding a false sense of accuracy for higher estimates. For example, an item with a high estimate (20 or higher) means the item is large and not yet well understood. Debating whether the item was a 20 or a 19 or a 22 would be a waste of time as there simply isn't enough data available. Once the item gets closer to the iteration in which the item will be worked, it can be broken down into smaller pieces and estimated in more granular numbers (1–13). Items with point estimates from 1–13 can generally be completed within a single iteration (1–4 weeks).
It is important to note that points do not have the same meaning across teams; for example, one team's “five” does not equal another team's “five.” Thus team velocity, which is derived from points, should not be used to compare productivity across teams.
An even faster way to estimate, and one used when the number of items to estimate is large, is affinity grouping. Team members simply group items together that are like-sized, resulting in configuration similar to the one in Exhibit 3. The method is simple and fast:
- The first item is read to the team members and placed on the wall.
- The second item is read and the team is asked if it is smaller or larger than the first item; placement on the wall corresponds to the team's response (larger is to the right, smaller is to the left).
- The third item is read and the team is asked if it is smaller or larger than the first and/or second items; the item is placed on the wall accordingly.
- Control is then turned over to the team to finish the affinity grouping for the remainder of the items.
Teams may choose to continue in the same fashion, placing one item at a time on the wall after group discussion. However, a faster way is to have each team member select an item and place it based on their own best understanding. This is done with all team members working in parallel until all items have been assessed and placed on the wall. Several hundred items can be estimated in a relatively short time. Once all items are on the wall, the team reviews the groupings. Items that a team member believes to be in the wrong group are discussed and moved if appropriate.
Once affinity grouping is complete, estimation unit values such as points can be assigned. In Exhibit 3, the first set on the far left would be labeled as having a value of 1 point, the second set would be 2 points, the third set 3 points, the fourth set 5 points, and the last set 8 points.
Affinity grouping can also be done for other estimation units, such as T-shirt sizes. Exhibit 4 shows an example of affinity grouped items labeled with T-shirt sizes instead of points.
The use of T-shirt sizes (Extra Small [XS], Small [S], Medium [M], Large [L], Extra Large [XL]) is another way to think of relative sizes of features. This is an even greater departure from the numeric system, and like all good gross-level estimation units can in no way be associated with a specific length of time.
Other arbitrary tokens of measurement include Gummi Bears, NUTS (Nebulous Units of Time), and foot-pounds. Teams may create their own estimation units, and as you can see, they often have a bit of fun in doing so.
This paper does not cover the use of time-based units such as ideal development days and/or hours. These are already common and well understood, so their explanations were not included. It is worth noting however that gross-level estimating has the potential to be more successful when decoupled from the notion of time. Because time estimates are often turned into commitments by management and business, team members feel more pressure to be as accurate as possible. As a result they request more and more detail about the item being estimated. This turns gross-level estimation into the more time-consuming detail-level estimation and defeats the original intent and purpose.
Forecasting Schedule and Budget
Once gross-level estimates and team velocity are determined, schedule and budget can be forecast. Teams determine their velocity by adding up the total number of points for all the items they completed in an iteration. For example, a team may have selected five items with a total point value of 23 points (see Exhibit 5). At the end of their two-week iteration, they were only able to complete four of the five items. Their velocity is 15, or the sum of the point values of items 1–4. Teams do not get “partial credit” for completing portions of an item, so even if they had started on item 5, it would not count, as it was not completed.
Determining the Schedule
A team's average velocity is used in forecasting a long-term schedule. Average velocity is calculated by summing the velocity measurements from the team's last three iterations, and dividing that total by three. So if a team completed 15 points in its first iteration, and 20 points in each of two subsequent iterations, the team's average velocity is 18 (15+20+20 / 3). If a team can do 18 points in one iteration on average, and there are 144 points worth of work to be completed in the project, it will take the team eight iterations to complete the work (144 / 18). If each iteration is two weeks, then the forecast completion is 16 weeks. This method allows us to answer the question, “When will we be done with all this work?”
If the team has a track record of velocity data, it is possible to determine the most optimistic completion date, the most pessimistic, and the most likely. The team's average velocity number is used to calculate the most likely scenario, while velocity numbers from the team's worst-performing iterations are used to calculate the most pessimistic forecast completion date. Using velocity from iterations where the team was able to complete more than expected provides the most optimistic forecast.
We can also use these numbers to answer the question, “We must deliver something by this date—of these features, how many will we have done by then?” See Exhibit 6 for an example of the most likely amount forecast to be complete, the pessimistic forecast, and the optimistic forecast. This example is for a team whose average velocity is 20, and which has a worst-performance velocity of 12 and a best-performance velocity of 25. Given this and only six weeks (three iterations), how much can be completed? The pessimistic forecast is that only items 1–8 will be done in six weeks. The optimistic forecast is that items 1–18 will be completed. And the most likely forecast, based on the team's average velocity of 20, is that items 1–13 will be completed in six weeks.
If teams were using non-numeric estimation units such as T-shirt sizes, the algorithms for forecasting will be more complex. It is recommended that the sizes be converted to a numeric system to more easily generate similar data. For example, a Small could be converted to a NUT of 3, a Medium to a NUT of 5, and so on. These may also be converted to time ranges (a Small could be 1–3 days, for example) but this is inherently risky, due to issues already cited in the Estimation Units section.
In this section we look at answering, “We only have this much money—how long will it last and how much will we have done before we run out?” First, a simple formula is used to determine the cost per point:
Σ (loaded team salaries for period n) / points completed in period n
Take the sum of the team's salaries (loaded) for a period of time, say three two-week iterations, and divide that by the number of points the team completed in the same time frame. So a team whose total loaded salaries are $240,000 over six weeks, and completed 60 points of work in those three iterations, would have a cost per point of $4,000. Now use the following formula to determine budget:
(Cost per point x total point value of items to be completed) + other expenses = forecast budget
Quite often not all features for a product are defined at the outset of a project, which is as expected for agile projects. So budget estimates are based on what we know today, plus a forecast algorithm that is based on historic data or expert guidance. For example, say there are only 20 features listed so far, but the business won't be able to provide any additional feature requests or refinements until after seeing how the first release is received by the customer. The budget for the project, which is slated for three releases, would only have forecast data available for the first release and not the entire project. The team could use the algorithm above to forecast budget for the first release, then assume an additional 20% for the second release and an additional 5% for the last release, based on past experience.
Like velocity, budget forecasts and schedule forecasts are revised each iteration. This is part of the rolling-wave planning process that agile projects ascribe to.
Notably not covered in this paper is the lack of structured estimation techniques used in lean approaches such as Kanban. Rather than spend (waste) time estimating items, average cycle and lead times are calculated based on the team's actual throughput rates. Kanban uses the mathematical theorem Little's Law as the basis for their formulas. Using lead time calculations derived from cumulative flow diagrams, teams forecast project schedules without spending any up-front time preparing estimates. The reader is encouraged to do independent research on this topic, which could be a separate paper by itself.
Agile projects are intended to deliver a product or product increments early and often, in order to incorporate customer feedback and other learnings into the next release. By spending more time on experimenting, executing, and learning, and less time on speculation, the cycle time for delivery is reduced. Agile teams are better able to compete in the marketplace and keep pace with the ever-increasing speed of change.