Combining critical chain planning and incremental development in software projects
Cutting content seems to be the prevalent way of meeting deadlines in projects that are running late. But why waste effort and time working on things that, most likely, are going to go at the first sign of trouble? Why not make the decisions about what is important and what is not up-front, and only start work on the latter if we have the necessary time to do it?
By combining critical chain (CC) and Incremental Development (ID), two well-known techniques, we can create a new approach to plan and execute projects, which guaranties, with a set probability, the delivery of an agreed subset of the total functionality by a stipulated date.
A time-bound project is a project that is constrained by hard deadlines. Hard deadlines are those in which the date of delivery is as important as the delivery itself. If the project delivers after the deadline, the delivery loses much of its value. Examples of hard deadlines are exhibition dates, government regulations, a competitor's announcement and the customer's own business plans.
Most of these projects start with more requirements than can realistically be handled within the imposed time constraints and, consequently, midway through the development, they find necessary to start slashing some of them. These un-planned cuts result in customer frustration and wasted effort. A much better approach would be to define the requirements’ priorities up-front, allocating their development to successive releases of the project in such way that we could be almost sure that the project will deliver all the important requirements, that the second less important will still have a fair chance of being delivered, and with the gold plated ones only to be done if there is any time left.
While the lack of requirements prioritization is one of the reasons most of these projects are late, it is certainly not the only one. The inability of traditional planning methods to deal with the uncertainty present on the estimates on which the plans are based, and the failure to recognize that development work does not progress in linear fashion, andthe infamous 90% complete syndrome, are also to blame.
As will be explained later, traditional critical path calculations involving uncertainty produce considerably shorter schedules than those that should be realistically expected. With a shorter schedule as a starting point, being late is a tautology.
The second problem, assuming that a task progresses at a constant rate, prevents project managers from seeing the early signs of delay in tasks until it is too late to take any other action than trim down features, compromise on quality or re-schedule the project.
The method (Miranda, 2002) presented here addresses these problems by combining ideas from critical chain planning (Goldratt, 1997; Newbold, 1998), incremental development (McConnel, 1996) and rate monitoring (Pisano) into a practical approach for planning and executing time-bounded projects.
This method is not a one-stop solution for all software development problems. It just focuses on how to best schedule work to guarantee that a working product, with an agreed subset of the total functionality, could be delivered by a required date.
The sections that follow explain the fundamentals of planning under uncertainty, the use of rate monitoring to track progress, and, finally, the application of these concepts in planning and executing projects.
Uncertainty in the planning and execution of projects
Probabilities in Project Management
Uncertainty is the reason project management is needed. Things are neither black nor white; things always depend on other things. In this context, we shall think of a probability as a numerical measure of the strength of a belief in a certain proposition. By convention, probability ranges from 0 to 1, where 0 means that the belief in question is certainly false and 1 means that is certainly true. A probability of .5 means that there is no reason to favor a belief over another. For example, if a project manager assigns to a task a probability of .7 of finishing in 30 days, he is saying that his belief to finish the task on time is stronger than if he had assigned a .5 probability, but that he is not completely certain to be able to, which would imply a probability of 1.
The mathematical theory of probability specifies how the probability on one belief should be constructed from the probabilities of other beliefs in order for them to be consistent with one another. For example if the project manager says that the probability of finishing a task in 30 days is .7, the probability of not finishing on time would be .3 (1 - .7) and not some other arbitrary value like .4 or .5.
The estimates on which project schedules and resource allocations are based are never single numbers; whether spoken or not, there are many assumptions behind each of them. Some of these assumptions concern the complexity of the tasks, others our ability to carry them out. Some of them, if true, will contribute to an early completion of a task, others will add to the execution time. Intuitively we could see, that for a task to finish at the earliest possible time, all the “favorable” assumptions must be true and all the “inauspicious” ones false. The probability of this happening is very low. The same could be said for the latest possible date. The most likely date corresponds then to a situation in which the most probable “good” assumptions are true and the most probable “bad” ones are false. Numerically, the situation can be expressed by a triangular probability distribution such as the one shown by Exhibit 1 (Strictly speaking, the caption for the “y” axis should read f(x) since this is a continuous distribution. The term probability is used instead for its intuitive appeal).
Since the actual probability distribution function for the duration of the task is unknown, the choice of a simple triangular distribution is a sensible one (Pisano). Its right skewedness captures the fact that while there is a limited number of things that will shorten the duration of a task, the number of things that can go wrong is virtually unlimited.
From the project manager's point of view, more important than the probability of finishing on a specific date, is the probability of completing the task on or before a certain date. This probability, called the on-time probability of the task, can be derived from the cumulative distribution shown in Exhibit 2.
Exhibit 1. If all the favorable assumptions are true and all the gloomy are false, the task will be completed in 10 days, this is the Earliest Completion Date. The Most Likely duration is 20 days. If everything that can go wrong, short of abandoning the task, goes wrong the task could be completed in 40 days. This is the Latest Completion Date.
Exhibit 2. Cumulative probabilities. The Most Likely completion date has an on-time probability of less than 40%. The Expected completion date is of around 23 days. If we want to be 75% sure of completing the task on time we would have to schedule 27 days.
In general, the larger the number of assumptions behind the estimated task duration, the larger the spread between the earliest and the latest completion dates. The effect of such an uncertainty results in very different on-time probabilities, as shown by Exhibit 3.
Figure 3. Two tasks with the same Earliest and Most Likely, but different Latest Completion dates have different levels of risk. The Expected completion dates for the less risky task is 17 days, while for the other is 23 days. By the same token, the on-time probability of the Most likely date is around 37% in the first case and under 20% in the second.
From tasks to projects
A common approach used to assess uncertainty in projects, is to calculate the expected duration of the project as the sum of the expected duration of the tasks along the critical path, with an standard deviation equal to the square root of the sum of the squares of the standard deviation of the same tasks, and then to use a normal distribution to calculate the on-time probability for the project. This approach is based on the central limit theorem, which states that the distribution of the sum of a number of independent random variables approaches a normal distribution as the number of variables (tasks) grows larger.
Assuming independent tasks duration, as required by the central limit theorem, although a very common assumption, is perhaps one of the most dangerous a project manager can make. In practical terms, this assumption expresses the belief that the lateness of some tasks is compensated by the early completion of others and that in the end everything balances out. This might be a valid assumption when dealing with events such as rain in a construction project or a late delivery from a supplier, but not in situations such as the underestimation of the system's complexity or the overestimation of the team capabilities, which will affect the duration of most tasks and in the same direction. If there is an underlying cause that could shift the duration of several tasks in the same direction, the tasks are not independent but correlated. The practical consequence of dealing with correlated tasks duration is an increase in the project's standard deviation, which translates into higher risks. See Exhibits 4-a and 4-b.
Figure 4-a. Relationship between the number of tasks summed, the coefficient of correlation (ρ) and the standard deviation of the sum.
Figure 4-b. Simulation of a simple project showing the difference between the assumptions of independent and correlated task durations. See the difference on the shape of the distribution.
Another problem not addressed by traditional critical path calculations, is the problem of merging paths, see Exhibit 5, where the earliest start of the integration task always corresponds to end the latest development path. This results in a mechanism that passes delays, but seldom passes savings!
Figure 5 – In the presence of uncertainty, the expected project duration is not equal to the sum of the expected duration of the tasks in the critical path. The integration task cannot begin until both development tasks have been completed.
Measuring Progress Using Rate of Changes
When measuring the progress of a task in terms of its main output, i.e. requirements defined, LOC, errors found, pages of documentation written, etc, it is possible to observe that the rate of growth of the output is not constant throughout the duration of the task and that it more closely resembles the shape of Exhibit 6. This “S” pattern (Billai & Nair, 1997; Putnam, 1992; Martino, 1993; Miranda, 19998; Gaffney, 1984; Miranda, 2003), typical of many intellectual activities could be explained by the existence of a number of actions and thought processes at the beginning and end of the task which, although value adding, do not contribute directly to the quantity being measured. Examples of such actions and thought processes are: learning, team formation and work reviews. Whatever the true reasons for this effect, it is so common and noticeable that has a name of its own: “the 90% complete syndrome”.
Exhibit 6. The “S” curve. Production does not grow at a constant rate. At the peak of productivity, between weeks 3 and 5, the percentage complete soars 20% in just one week. Towards the end of the task it takes three more times to go from 80 to 100%
The result of extrapolating completion dates from the rates of progress observed through the half-life of the task using a straight line is the announcement of optimistic completion dates that are never met. Exhibit 7 shows some examples of work progress from real projects, and Exhibit 8 the error incurred in forecasting the task completion by using a linear model instead of the “S” curve paradigm.
Exhibit 7. Progress, measured in terms of its visible output is not constant thru the duration of a task or project.
Exhibit 8. Assuming that the task output is 250 units of production (Requirements, FP, Errors detected, etc) a linear projection would forecast its completion by week 7.5 while the “S” curve will put it at week 9. Assuming the task duration was originally estimated to be 7 weeks, according to the linear projection it will be completed almost on time, but according to the “S” curve it will be 2 weeks late.
Combining Critical Chain and Incremental Development
Exhibit 9 illustrates the proposed project model. The Increment Planning task uses statistical techniques to break down the project scope into a series of Development Increments in such a way that it is almost certain that all requirements allocated to the first increment will be implemented on time; that there is a fair chance to implement those allocated to the second increment and so on. System Engineering encompasses requirements, value and tradeoff analysis from a user perspective; this is the activity where the prioritization takes place. System Architecting is responsible for the general form of the solution, interface definitions and the analysis of dependencies between requirements. The system architecting activity shall take an all encompassing view in order to prevent the surfacing of inconsistencies later in the development process. All three activities take place concurrently as there is a need to balance what needs to be done from the user perspective with what could be done from a technical perspective. Each Increment Development is a self-contained mini-project. We do not assume or impose any particular approach beneath this level, so development could be organized according to a waterfall or an iterative life cycle as deemed appropriate. All increments, but the last, are isolated from the project delivery date by a buffer whose purpose is to absorb any overrun in their execution.
During execution, work progress is forecasted using models that more closely resemble the way people work than a simple extrapolation of last week's results. As shown by Exhibit 10, the output from the models is used to forecast the activities’ completion dates and to take corrective actions. Work in one increment does not start until the previous one is completed. This prevents people from wasting time developing things that might never be finished anyway.
Exhibit 9. Combining CC and ID in a single project Model
Exhibit 10. Project tracking
Once the feasibility of the project has been established, the next step is to define the duration of the development tasks in terms of its Best, Most Likely and Worst case scenarios as functions of the increment's scope. Second, the content of the increment is adjusted so it will have a high probability, i.e. 95%, of being completed in the allotted time. Third, the tasks are re-scheduled using the duration that corresponds to a 50% on-time probability, allocating the difference between the high and the lower confidence dates to a buffer. The next increment is then planned using the length of the buffer as the time allotted.
Two aspects that need to be considered in the selection of the requirements to be developed in each increment are: the technical dependencies that might exist between them and the need to provide functionally complete subsets to the user.
Exhibit 11 illustrates the overall process .
Exhibit 12 shows the approximate probabilities of delivering the content of each increment when planned according to the proposed approach. Compare this to a conventional plan, in which every requirement has the same probability, let's say 50% irrespective of its importance to the user.
Exhibit 12 - Success Probabilities
Exhibit 11. Increments are planned to fit within the allotted time
Estimating the Minimum, Most Likely and Maximum durations
Although the specific techniques for estimating the minimum, most likely and maximum duration of the tasks will depend on whether the estimation is done using a cost model, an expert approach or a Delphi process, it is crucial to the success of the method, that all completion dates that could reasonably be expected, be included between the minimum and the maximum duration.
In the case of a parametric cost model like CoCoMo, this could be done, for example, by changing the value of key cost drivers such as SLOC, PCAP or CPLX and, in the case of the Delphi process, by recording not only the converging value, but the optimistic and pessimistic estimates as well. Exhibit 13 shows a calculator implemented at Ericsson Canada to help with the required calculations.
Figure 13. Buffer calculator
In a time-bound project there is very little room for recovery, so once a problem manifests itself, it is almost too late. Controlling a project under these circumstances requires a mechanism that:
- Identifies the early the signs of a delay;
- Minimizes false alarms;
- Minimizes disturbances to ongoing work;
- Provides a clear definition of what will be delivered and by when.
While the first three properties are important to the people working and managing the project, the fourth is of utmost importance to the customer who depends on the project's deliverables to execute his own business plan.
The early identification of a delay is achieved by updating the buffers, not with the actuals but with the estimates at completion (EAC) of the individual tasks. The estimates are computed by fitting a Rayleigh curve to the progress reported, and then projecting it into the future.
False alarms and disturbances to on-going work are prevented by the use of buffers, which isolate workers from overreactions to small variations, by absorbing up to a 25% variance before sending a signal.
Exhibit 14 describes the control approach. Depending on the specific task being monitored, the units in which the work performed is measured will be Requirements Defined, LOC produced per week, number of errors detected, etc.
The re-planning of the next increment, if necessary, should take into consideration whether the factors that affected the development of the current increment will also have an effect on it, and the duration an effort adjusted accordingly.
Figure 14. Monitoring progress and triggering of re-planning
Rewards, recognition and price incentives
How can all project stakeholders be sure that the best effort will be applied towards implementing all requirements and that people will not just get by implementing those in the first increment? The answer lies with the reward and recognition system.
Whether employee's rewards or price incentives in contracts, the incremental model provides a clear criterion by which performance can be evaluated and rewarded. The delivery of the first increment has no reward associated with it: everybody is just doing their job; subsequent increments result in increased recognition of the extra effort put into the task. The On-time probabilities shown in Exhibit 12 can be used to calculate the expected value of the reward. This calculation is important because a large amount, with a very small probability, will result in a low expected value and could be perceived as a lottery by the employees, thus failing to act as motivator.
As an example, a $5,000 reward for “Increment 2” has an expected value of $2,375. The same amount applied to “Increment 3” has an expected value of $625. Clearly, the motivational value of the reward is not the same in both cases.
As mentioned at the beginning of the paper, the proposed approach brings together several existing techniques. Its value resides precisely in this. Specifically we combine a general project management approach like Critical Chain with a well-known software development method, the incremental model, to realize a new approach specially conceived to deal with time-bounded projects. We also provide a decision rule to calculate the size of the increments to be developed, a reward model based on the expected value of the increments and a recommendation to track the project based on forecasts rather than in actual progress. Furthermore, we do not presume independent tasks’ duration, which leads to significant differences in the size of the buffers and addresses one of the main issues raised by the critics of the Critical Chain approach.
The premise in which the method is based, is that businesses are better off when they know what could, realistically, be expected than when they are promised the moon, but no assurances are given with respect as to when they could get it.
By taking a probabilistic, rather than a deterministic approach, the method recognizes that in any development project there are hundreds of things that can go right and thousands that can go wrong and makes them an intrinsic part of the planning and control processes.
Although still in an experimental stage, the method proposed in this paper has received a warm welcome when presented both, within and outside Ericsson.
Up to today, the main obstacles found to the wider acceptance of the techniques proposed, has nothing to do with the validity of the arguments cited or the rationale behind the method, but rather with a “can do attitude” that rejects the existence of things over which we have limited control and the prevalence of a business culture which seems to reward wild promises over a bounded rationality.
Gaffney J. (1984) On Predicting Software Related Performance of Large-Scale Systems, CMG XV, San Francisco, California.
Goldratt E. (1997) Critical Chain, Boston, MA:The North River Press.
Grey S. (1995) Practical Risk Assessment for Project Management, Indianapolis, IN: John Wiley & Sons.
Martino J.(1993) Technological Forecasting for Decision Making. Colombus, OH: McGraw-Hill.
McConnell, S (1996) Rapid Development, Taming Wild Software Schedules, Microsoft Press
Miranda E. (1998, November), The Use of Reliability Growth Models in Project Management, 9th International Symposium in Software Reliability, IEEE, Paderborn, Germany.
Miranda E., (2002, March) Planning time bounded projects, IEEE Computer 35(3), 73-79
Miranda E. (2003) Running the Successful Hi-Tech Project Office Norwood, MA: Artech House
Newbold R. (1998) Project Management in the Fast Lane, Boca Raton, FL: St. Lucie Press,
Pillai K. & Nair S. (1997) A model for software development effort and cost estimation, IEEE Transactions on Software Engineerin 23(8)
Pisano N. (1995) Technical Performance Measurement, Earned Value and Risk Management: An Integrated Diagnostic Tool for Program Management, http://www.acq.osd.mil/pm/paperpres/nickp/nickpaso.htm
Putnam G. (1992) Measures for Excellence – Reliable Software On Time, Within Budget, Upper Saddle River, NJ: Prentice-Hall.
© 2004, Eduardo Miranda
Originally published as a part of 2004 PMI Global Congress Proceedings – Europe