Origins of schedule overrun
Schedule risk modelling has proved a useful aid to the planning and management of major projects, yet it seems rarely to expose or help to explain the widely held expectation that schedules will be skewed to the right, i.e., that there will be more scope for a schedule to overrun than to complete early. This paper explores several possible mechanisms that might lie at the heart of such behavior and concludes that a link between schedule slip and progressive loss of productivity may be at work. The implications for major projects are discussed, and recommendations are made for explicit analysis of such behavior.
Over the past 30 years, interest in project risk assessment has expanded from a simple examination of individual risks to increasingly sophisticated quantitative modelling of total costs and durations, usually based on Monte Carlo simulation. The basic principles of modelling uncertainty in costs and schedules are fairly well established. This paper addresses a dilemma that has emerged from many years' application of simulation modelling to project schedules and has benefitted from discussions with members of the PMI Risk Management SIG (RiskSIG), the author's colleagues in Broadleaf Capital International, and Broadleaf's clients.
While schedule risk modelling is generally found to improve project planning and management, one feature of model forecasts is often felt to be unrealistic. This is the absence of marked skew or asymmetry in forecast schedule distributions. Models built up from realistic inputs with experienced people tend to show relatively little skew, while the same experienced people feel that there should be a marked skew to the right, as illustrated in Exhibit 1. Experience suggests that large or complex projects have substantial scope to overrun their forecast durations and little scope to improve upon them, yet this is not predicted by simple schedule risk models.
Exhibit 1: Difference between expectations and model forecasts.
This paper considers common attempts to explain the dilemma, and reports on modelling exercises that have tested alternative explanations. The potential use of these findings in project planning is then discussed. The models used here do not represent real projects. The purpose of examining simple models is to see if there may be a mechanism at work in the real world that is being overlooked in standard schedule risk modelling since, if there is, this might be exploited to make more realistic quantitative assessments of schedule risk and improve the management of projects.
The Usual Suspects
When a project's schedule risk assessment shows little or no skew and the project owners believe there should be a significant amount, the usual response is to assume that something has been left out or modelled incorrectly. When seeking the reason for symmetrical forecast distributions where skew is expected, suspicion commonly falls on four possible sources of the discrepancy: failure to recognize skew in the inputs to the model, nodal bias and structural effects, inadequate correlation, or the presence of extreme events that can cause a significant extension of time.
The distributions of individual task durations input to a schedule risk assessment are usually skewed in themselves and there is an expectation that this skew will show through in the output. The effect of skewed inputs is to make the mean duration exceed that estimated from the most likely input values. However, despite this, the sum of a series of distributions along a critical path will tend towards a Normal distribution no matter how skewed the inputs might be. This is illustrated in Exhibit 2, which shows the highly skewed distribution of a single task and the distribution of ten such tasks in series. The distributions are normalized to a mean of one to focus on the shape of the curves rather than absolute values.
Exhibit 2: Skew disappears when several tasks are in series.
Skewed inputs cannot explain or be the source of positive skew in a project duration distribution. Obviously, if the tasks' durations were heavily correlated, their total would have a similar skew to the inputs. However, this will show up in a total project output only if there is one dominant source of uncertainty driving all activities. In this case, the analysis might be better addressed by an examination of that underlying factor rather than by a simulation modelling approach.
Nodal Bias and Structural Effects
One of the defining characteristics of critical path analysis, which is at the heart of schedule risk modelling, is the interaction between alternative paths through an activity network. In probabilistic modelling, this gives rise to a phenomenon known as nodal bias. Because the start time of a task with two or more predecessors will be the end of the latest of the predecessors, the duration of a network subject to uncertainty will, on average, be longer than the deterministic duration of the same network. It is worth asking if, through nodal bias, the structure of a network can generate skew and whether the problem might arise from the way risk model networks are constructed.
Exhibit 3 shows the overall duration of several tasks in parallel, which highlights the effect of nodal bias. Each individual task has a minimum duration of 8 and a maximum of 20 time units with a most likely value of 10 time units, highly skewed to the right. The results are normalized to the mean of a single task's duration to make it easy to see the amount of skew. In passing, although it is not the main focus of this paper, they demonstrate the power of multiple parallel paths in a network to drive out the mean duration through nothing more than the presence of nodes on critical and near-critical paths.
Exhibit 3: Nodal bias causes negative skew.
Clearly, nodes within schedule models might be very influential, but they do not appear to have the capacity to produce positive skew, i.e., skew to the right. Rather, as the number of predecessors at a node increases, the fact that the overall duration is driven by the longest predecessor means that the right-hand end of the distribution becomes increasingly likely to equal the longest duration of any one task. Any skew that nodal bias does generate is to the left!
Correlation is often handled poorly in risk models. Some models are constructed so that strongly correlated components are artificially separated, or overlapping sources of correlation are incorporated in a way that makes them difficult to model realistically. Even when these structural faults are avoided, correlation is often ignored, resulting in artificially narrow distributions of the forecast overall duration. It is appropriate to ask, though, if mishandled correlations could be the cause of underestimating skew.
Correlation can have unexpected effects in schedules. Correlation between parallel activities with a shared successor can actually reduce the mean duration of a simulated project, as it reduces the probability of one task pushing out the project duration. Correlation between predecessors to a node removes the effect of multiple predecessors, as the correlated tasks effectively merge into a single source of uncertainty instead of two or more independent factors.
In principle, correlation between sequential tasks can allow input skew to show through in the output. However, as mentioned earlier, for this to become significant and to overwhelm any remaining nodal bias, which tends to cause skew the other way, all of the activities in a project need to be very strongly correlated with one another. In such circumstances, one phenomenon effectively drives the entire schedule. Such phenomena are not unknown. The Gulf War affected titanium supply and delayed some mineral processing projects that needed special corrosion-resistant materials; the resources boom of a few years ago soaked up skilled labor, industrial production capacity, and materials, leading to a general loss of productivity as unskilled personnel entered the market, unprecedented escalation in equipment prices, and shortages of structural steel and other materials. Looking further back, there was a time when IT resources came under such pressure that historical expectations of staff availability and productivity were overturned and both costs and schedules blew out even more than is common in this sector.
Despite these examples of pervasive influences driving entire projects, this does not happen on every project, and yet there is a belief that every complex project has an inherent tendency towards a skewed distribution of outcomes. It is difficult to see how incorrect modelling of correlation could be the hidden factor predisposing a wide range of projects to schedule overrun.
Historically, risk assessment was often addressed in terms of major events such as task or equipment failure, external disruptions, or the discovery of unforeseen challenges—that is, events that might happen or might not happen, and which, if they do not happen, have no effect. Being one-sided, with some chance of making things worse and no chance of making things better, event risk is a candidate for the source of skew, at least in principle.
Some events are regularly addressed in project schedule risk modelling. For instance, extreme weather and industrial relations disruption are almost always considered in major construction projects within Australia and included in risk models. High-technology projects will often include some danger of a novel element failing to perform as forecast, and this too will usually be included in such a project's schedule risk assessment. However, most of the uncertainties affecting projects are better characterized as continuous variations rather than as "on-off" events. Uncertainties in productivity, work rates, quantities of material, and similar factors are the main sources of schedule risk, and these are generally each characterized by uncertainty within a range rather than by events that will either happen or not
In principle, a large delay event can create a large amount of positive skew. In practice, events that have the capacity to create so much delay that they will skew a project's distribution markedly to the right are unusual. Examples of projects running a long way over time due to such events are much fewer than those that appear to “die a death of a thousand cuts” with many individual problems combining to drive the schedule far beyond what was expected.
Cost risk modelling is relatively straightforward, not least because it is rarely necessary to go far beyond basic arithmetic functions in combining inputs. Schedule modelling is fundamentally different because of the logical connections between activities. Precedence effects are more difficult to understand than arithmetic relationships and can give rise to unexpected outcomes, such as the way increasing correlation can reduce the mean duration of a schedule. It is worth exploring the way that a project and its task structure can interact to see if a mechanism can be found to explain the inherent propensity for duration distributions to be skewed to the right.
Two mechanisms have been considered. One focuses on the effect of a network structure, which represents the interaction between task duration uncertainty and the nodal bias effect described earlier. The other looks at the decline in productivity as work slips and efficient sequencing and work practices are overtaken by a fire-fighting approach to project implementation.
Many projects take the form of parallel lines of work proceeding one behind the other so that, for instance, each area in a major construction site is occupied by civil works, structural steel work, electrical works, and then precommissioning activity. As each discipline completes its work, an area becomes free for the next discipline to move in. So long as the earlier disciplines move on at the expected rate, the later disciplines can proceed more or less independently and the project can be modelled as four long tasks in parallel. However, if the earlier tasks run slowly, nodes come into play, and perhaps this could accentuate the slippage by compounding it with nodal bias. Furthermore, the common practice of focusing attention on tasks that are on or very near the critical path for risk modelling can, if it is not handled very carefully, remove some of this structure from the analysis. If the structure is a source of skewed outcomes, this might explain the difference between models and expectations.
To examine this mechanism, a model network has been constructed as shown in Exhibit 4. The civil works are followed by structural tasks and on through electrical activities to precommissioning in each of ten areas. Each task has a minimum duration of 8 units, a maximum of 15 units, and a most likely value of 10. The mobilization milestones, which are usually linked to external dependencies or contracted dates rather than driven by other tasks within the project, are each set to occur 12 time units after the start of the preceding line of work. This means that, when the preceding line of work runs early, there is no interaction from, say, civil to structural work; however, as the preceding lines of work slip, nodes should start to come into play.
Exhibit 4: Model network to examine structural effects.
This network has been simulated and its duration distribution has been normalized to the mean to focus on its shape rather than on its absolute value. The output is shown in Exhibit 5, along with a plot of the number of nodes at which the preceding discipline delayed a line of work, such as civil works delaying structural works in one area. There is very little skew in the output, despite the fact that if slip were to increase, more nodal bias would arise, and a statistical analysis of the output confirms this assessment. The second plot shows that there is clearly no systematic relationship between the effect of structure in the network and the overall duration. It is, therefore, hard to imagine that structural mechanisms could be a cause of schedule skew.
Exhibit 5: Effect of structure on skew.
Progressive Loss of Productivity
A common theme of project schedule overrun is the progressive development of delays that accumulate into an irrecoverable loss while work is rearranged to try to compensate for the loss and overcome the circumstances that have caused it. A construction project might set out with a well-ordered sequence of work and arrangements that maximize the productivity of the construction labour force. As individual tasks slip, due to delays in predecessor activities onsite or late delivery of materials and equipment, the resources that would have been applied to them may be diverted to other work that is taken out of sequence or fragmented into smaller tasks. A similar pattern can be seen in IT projects. As tasks are delayed and prevent the completion of successor tasks, resources will be devoted to any work available to try to maintain progress.
When tasks are fragmented, the number of mobilization and demobilization events grows and eats into productivity. When work is taken out of sequence, facilities, equipment, or personnel who had been expected to be available to support the work might not be on hand, and so productivity will again decline.
In addition to the direct effect that task fragmentation and changing the sequence of activities have on productivity, more rework will usually be incurred than was anticipated when the original plan was drawn up. In an effort to make some progress, work will be based on assumptions about the outcome of predecessor activities rather on than solid outputs. In construction projects, temporary work will be undertaken to provide physical access where it would not have been required. In IT projects, code stubs will be developed that permit software to be partially tested before the components with which it is to interface have been completed. Rework amounts to a further loss of productivity.
By these means and others, as work slips from its initial schedule and pressure is applied to recover lost time, tasks are likely to be executed less efficiently than initially assumed. The productivity factor (PF), which describes the ratio of a task's level of effort to a benchmark or estimating metric, will grow and tasks will take longer unless additional resources can be brought to bear. Even if more resources can be used to save the schedule, unplanned costs will rise. Worsening PF is a common feature of projects that slip from their initial plans. It rarely strikes from the first day but is more likely to grow as schedule performance slips more and more.
The question for this paper arising from these phenomena is whether a progressive decline in productivity, arising from mechanisms such as those just described, can cause a skewed duration distribution. To test this, the model shown in Exhibit 4 has been extended to apply a productivity penalty to each activity depending on the relationship between its actual and planned start dates. This relationship is shown in Exhibit 6, with a plot of normalized durations showing the effect on the output distribution's shape.
Exhibit 6: Effect of productivity loss arising from slippage.
If a task starts early, on time, or only slightly late (taken here to mean 10% late), it is assumed that there is no loss of productivity. The penalty for slippage then grows in proportion to the additional slippage up to a level at which it is assumed it can get no worse. In the case shown in the graph, the limit or saturation point is set to 50%, at which the PF is assumed here to have grown by 50% so tasks take half as long again as had been planned. The model was run with the threshold set at 10%, the saturation point set at 50%, and the maximum increase in PF set to 0%, 20%, 50%, and then 100%.
The effect is dramatic. With the increase capped at 20%, the duration distribution is already visibly skewed. When it is allowed to rise to 50%, the skew is very marked and when it can reach 100%, which is not unheard of, the distribution appears to be bimodal. This mechanism, which is simple and grounded in the behavior of real projects, can explain skew in duration distributions. It can explain the skew that experienced teams expect to see without invoking any extraordinary or exotic behavior.
Implications for Projects
It is worth considering whether the gap between expectations and model forecasts simply indicates systematic underestimation of project durations. Benchmarking and critical review techniques appear to have improved the realism of plans along with front-end loading (FEL) and detailed project execution planning (PEP). Nevertheless, examples still abound of projects with well-founded plans that dissolve into unexpected schedule extensions not through one single event or dominant influence but through what is generally seen as a loss of control.
The potential to lose productivity as work slips from its initial plan is generally understood but rarely considered explicitly in planning and estimating. The potency of this mechanism, indicated by a simple yet realistic modelling exercise, suggests that it might warrant further attention. It raises the hypothesis that some projects lose control over their schedule because insufficient effort was devoted to understanding the effect of small deviations from the plan and to preparing to prevent it from snowballing into a progressive decline. This is consistent with the move towards FEL and detailed PEP over the past few years, which is credited with improving the success rate of major projects, although many still embark on execution with limited planning and are poorly placed to deal with the normal course of events, let alone deviations from their nominal plans. At present, FEL and PEP are generally framed within the baseline plan for implementation, but there is no reason why they should not extend into scenario and contingency planning to prepare projects to respond well if they slip away from that well-ordered view of the world.
The propensity for project schedules to overrun is generally felt to be an inherent characteristic of major or complex projects. None of the elements of normal schedule modelling that can explain this are commonplace, though. If the existence of a significant amount of skew in normal projects is real, something not usually included in schedule modelling must be at work.
Several potential sources of skew have been examined to see if they could explain the effect. Only a mechanism that results in individual tasks taking longer as a project as a whole slips showed the capacity to do so. Several drivers might lie behind such a mechanism. They include a progressive loss of control as schedule slippage stimulates uncoordinated behavior, the activation of constraints that were irrelevant so long as work proceeded close to the original schedule, or being forced into inefficient work practices and higher-than-expected levels of rework arising from these work practices.
It is too early to say if nonlinear effects such as these can be modelled realistically or if doing so will assist in managing project risk. A strategic overview of a project should allow for such factors to be considered, however. The insights arising from such an analysis might be integrated with routine model-based analysis or be used to ensure that project plans are ready and able to identify any drift into this nonlinear form of behavior and to react to control it. Large and complex projects can only benefit from a consideration of how they will respond if progress slips. Planning to respond to serious deviations from the base plan should enable projects to react in a coordinated manner. just as scenario planning provides large organizations and businesses with the capacity to weather major disruptions without losing control.
The ideas that gave rise to this work have been developed over several years, through engagements with a large number of complex projects and in professional discussions with clients, members of the RiskSIG, and colleagues in Broadleaf. Special thanks are due to Mr. Paul Radford, strategic governance advisor at the Victorian Department of Transport, and Dr. Dale Cooper, director of Broadleaf Capital International Pty Ltd, who both read a draft of this paper and provided valuable comments and ideas that have been incorporated into the text.
© 2009, Dr Stephen Grey
Originally published as a part of 2010 PMI Global Congress Proceedings – Melbourne, Australia