Abstract
Many projects fail for reasons that appear unlikely when they begin. This becomes more and more common with the increase of project complexity, geographic distribution of project teams, and time pressures. This paper examines some of the many causes of catastrophic project failure, dubbed by some “black swans,” seeking some of the principal sources of project risk.
The analysis is based on a collection of about 650 specific project failure modes collected from all over the world over the past decade and assembled into the Project Experience Risk Information Library (PERIL) database. This database formed the foundation for Identifying and Managing Project Risk (Kendrick, 2003), a reference book used extensively by practitioners and for project risk management courses. The expanded PERIL database serves as the basis for the book’s second edition, scheduled to be published in early 2009.
The PERIL Database
Good project management is based on experience. Fortunately, the experience and pain need not all be personal; you can also learn from the experience of others, avoiding the aggravation of seeing everything first-hand. The Project Experience Risk Information Library (PERIL) database provides a step in that direction.
For more than a decade, I have collected anonymous data from hundreds of project leaders on their past project problems. I have compiled this data in the PERIL database, which summarizes both a description of what went wrong and the amount of impact it had on each project. The database provides a sobering perspective on what future projects may face and is valuable in helping to identify at least some of what might otherwise be invisible risks.
Projects in the PERIL Database
Slightly more than half the projects in the PERIL database are product development projects, with tangible deliverables. The rest are information technology, customer solution, or process improvement projects. The projects in the database are worldwide, with a majority from the Americas. As with most modern projects, whatever the type or location, they share a strong dependence on new or relatively new technology. There are both longer and shorter projects represented here, but the typical project in the database had a planned duration between six months and one year. While there are some very large programs in PERIL, typical staffing on these projects was rarely larger than about 20 people. The raw project numbers in the PERIL database are:
Exhibit 1 – Number of data programs in PERIL
While the PERIL database represents many projects and their risks, with only 600 examples, it is far from comprehensive. The database contains only a small fraction of the tens of thousands of projects undertaken by the project leaders from whom it was collected, but it does focus on major project risks.
Measuring Impact in the PERIL Database
The problem situations that make up the PERIL database resulted in a wide range of adverse consequences, including missed deadlines, significant overspending, scope reductions, and a long list of other undesirable outcomes that were not so easily quantified. However, the most prevalent and serious impact reported in this data, by far, was deadline slip. To ensure data comparability, I either excluded cases reporting no time impact from the database or I used very conservative assumptions to normalize the consequences to an equivalent time impact. The average impact for all records was roughly seven weeks, representing almost a 20 percent slip for a typical nine-month project. This is the average impact data (in weeks of slip) by region and project type:
Exhibit 2 – Impact for Records by Region
Risk Causes in the PERIL Database
While the consequences of the risks in the PERIL database are consistently quantified based on time, risk causes were varied and abundant. I structured the database using a Risk Breakdown Structure for causes based on hierarchies headed by scope, schedule (or time), and resource (or cost). These categories are further subdivided using the primary perceived root cause of each risk. Across the board, risks related to scope issues were dominant. They were both most frequent and, on average, most damaging. While schedule risks were next most numerous, on average resource risks were slightly more harmful
Exhibit 3 - Breakdown of Risk Sources
The total impact of all the risks is a bit over 4,500 weeks—almost 90 years—of slippage. Within each of these three categories the data is further subdivided based on root-cause categories, using these definitions:
Exhibit 4 – Root Causes of Risks
Big Risks—Black Swans
Most writing on project risk management spends a lot of time on theory and statistics. While sometimes constructive, it’s a lot more useful to understand the significant sources of actual project risk, and the PERIL database lets us do just that. It is quite eye-opening to consider the most serious problems in the database—the black swans.
Calling such risks “black swans” has been popularized recently by the writings of Nassim Nicholas Taleb. The notion of a black swan originated in Europe before there was much knowledge of the rest of the world. In the study of logic, the statement “All swans are white” was used as the example of something that was incontrovertibly true. Because all the swans observed in Europe were white, a black swan was deemed impossible. It came as something of a shock when a species of black swans was later discovered in Australia. This realization gave rise to the metaphorical use of the term “black swan” to describe something erroneously believed to be impossible.
Taleb’s primary subject matter (discussed in depth in his very good book from 2001, Fooled by Randomness) is financial risk, but his concept of a black swan as a “large-impact, hard-to-predict, rare event” is easily applied to project risk management. It is a mistake to consider a situation as impossible merely because we think it won’t happen, or to assume that it happens so seldom we can afford forget about it. These risks do occur—the PERIL database is full of them. When they happen, the same project managers who initially dismissed them often come to see them in retrospect as much more likely, sometimes even inevitable.
The definition of a “large-impact, hard-to-predict, rare event” is a useful starting point, but as the PERIL database shows, these most damaging risks may not actually be that rare, and they need not be so difficult for project managers to predict if they can be made more visible. To that end, this paper focuses on the most severe 20% of the risks in the database, singling out these project-destroying black swans. This subset of the database contains the 127 cases that caused the projects in the PERIL database to slip three months or more, and they account for over half of the total damage, almost 2,500 weeks of accumulated slip.
Scope Risks
Scope risks in the PERIL database as a whole account for more than one-third of the data and nearly half of the total schedule impact. The two broad categories of scope risk in PERIL relate to changes and defects. A Pareto chart of overall impact by more detailed subcategories is summarized in Exhibit 5.
Exhibit 5 – Total Project Impact by Scope Root-Cause Subcategories
The root-cause subcategories categories are:
Exhibit 6 – Root Cause Subcategories
Scope Black Swans
Of the most damaging 127 risks in the PERIL database, 64—just over half—were scope risks. In the database as a whole, the black swans accounted for slightly more than half of the total risk impact. The top scope risks exceeded this with nearly 60 percent of the aggregate scope risk impact. The details are:
Exhibit 7 – Details of Aggregate Scope Risk Impact
As the table shows, the black swan scope risks were dominated by change risk, both in terms of quantity and impact. Changes caused about three quarters of the black swan risks, by both quantity and total impact. When major change risks occur, their effects are very painful. Black swan defect risks were less common as well as somewhat less damaging overall, because recovery from these risks is generally more straightforward.
There were 47 black swan scope risks associated with change, dominated by scope gaps with a total of 25. Scope gaps are the result of committing to a project before the project requirements are complete. When legitimate needs are uncovered later in the project, change is unavoidable. While some of the scope gaps may have been due to the novelty of the work and were inevitable, in most of the cases these gaps were due to incomplete or rushed analysis. A more thorough scope definition and project work breakdown would have revealed the missing or poorly defined portions of the project scope. Examples of scope gaps were:
- The project manager expected the solution to be one item, but it proved to be four.
- New technology required unanticipated changes in order to function.
- Development plans failed to include all of the 23 required applications.
- End users were too little involved in defining the new system.
- Scope initially proposed for the project did not receive upper management sign-off.
- Some countries involved provided incomplete initial requirements.
- The architect determined late that the new design plan would be considerably more complex than expected.
- A mid-project review turned up numerous additional regulations.
- Manufacturing problems were not anticipated in the original analysis.
Most of the rest of the change risk black swans, with 21 incidents, were attributable to scope creep. Scope creep plagues most projects, especially technical projects. New opportunities, interesting ideas, undiscovered alternatives, and other new information emerges as the project progresses, providing enormous temptation to redefine the project and to make it “better.” While some project change might be justified, far too many of these non-mandatory changes sneak into projects because the consequences either are never analyzed or are drastically underestimated. To make matters worse, the purported benefits of the change are usually unrealistically overestimated. Scope creep is most damaging when entirely new requirements are piled on as the project runs. Such additions not only make projects more costly and more difficult to manage, they can also significantly delay delivery of the originally expected benefits. Among the scope creep risks were:
- New technology was introduced late in the project.
- The project team agreed to new requirements, some of which proved to be impossible.
- The contract required “state-of-the-art” materials, which changed significantly over the project’s duration.
- Volume requirements increased late in project, requiring extensive rework.
- A system for expense analysis expanded into redesign of most major internal systems.
- One partner on a Web design project expanded scope without getting approval from others.
- Late change required new hardware and a second phase.
- Application changed midproject to appeal to a prospective Chinese customer (who never bought).
There was a single black swan change risk caused by an external dependency. The case in this category involved a pharmaceutical project where a significant study was unexpectedly mandated.
There were fewer black swans, 17 total, in the scope defect categories. Software and hardware defects each caused eight, and one was a consequence of poor integration. Software problems and hardware failures were caused by the project deliverable failing to work or not working as specified. Integration defects were related to system problems, generally in multi-project programs. Examples of scope defect risks included:
- The system being developed had 20 major defects and 80 additional problems that had to be fixed.
- Redesign was required in a printer development project that failed to meet print quality goals.
- In user acceptance testing, a flaw sent the deliverable back to development.
- During unit testing, performance issues arose with volume loads.
- Contamination of an entire batch of petri plates required redoing them all.
- A server crashed with four months of information, none of it backed up, requiring everything to be re-entered.
- Hardware failed near the end of a three-month final test, necessitating refabrication and retest.
- Purchased component failed, and continuing the project depended on a brute force, annoying workaround.
- A software virus destroyed interfaces in two required languages, requiring rework.
Schedule Risks
Schedule risks are the second most numerous in the PERIL database after scope risks, representing almost a third of the records. They fall into three categories: delays, estimates, and dependencies. The overall impact of these schedule risk subcategories is summarized in Exhibit 8.
Exhibit 8 – Total Project Impact by Schedule Root-Cause Subcategories
Schedule root-cause categories are:
Exhibit 9 – Schedule Root Cause Categories
Schedule Black Swans
As with the black swans as a whole, the most severe of the schedule risks account for slightly more than one-half of the total measured impact. The details are:
Exhibit 10 – Most Severe Schedule Risks
As can be seen in the table, the black swan schedule risks were distributed relatively evenly, with a slight edge to risk associated with estimates.
There were 13 estimating risks, with 8 related to learning curve issues. The quality of the estimates when new technology or new people (or even worse, both) are involved was not good. Learning-curve risks were caused by cases such as the following:
- The complexity of new software was significantly underestimated.
- The development team was staffed with no regard for needed skills or knowledge.
- The neophyte project staff was inexperienced and had inadequate training.
- A key developer proved to be incompetent.
- A remote team did not have the expertise for key intermediate testing.
Judgment in estimating was the next most common estimating problem in the PERIL database. For most of these cases the work was significantly underestimated, often exacerbated by a lack of metrics. There were three cases of major project slippage due to estimating judgment, all related to inordinately optimistic assessment of project work.
Imposed deadlines were the third subcategory of estimating risks. The root cause for these is outside the project. Technical projects frequently have aggressive deadlines set in advance by managers or sponsors with little or no input from the project team. Two black swan risks were caused by imposed deadlines:
- Even after adding project staff, it was not possible to cut the schedule in half.
- Commitments for a construction project were based on promises to customers, not planning.
Schedule delays caused within the projects accounted for another 10 black swans in the PERIL database.
Information delays cause the most trouble, with five examples stemming from time differences on distributed global teams, poor access to information, and late delivery of needed reports. The worst cases included these:
- Merging multiple standards was required, and a lack of common definitions delayed the project.
- Software was developed in a country where a war broke out, limiting travel and inhibiting teleconferencing.
- Poorly defined procedures for acceptance, quality, and communications inhibited distributed development.
- A legacy application to be modified had no documentation, requiring reconstruction of the original code.
Delivery and availability problems for parts also caused problems, often related to international shipping. Delays also resulted from parts that arrived on time but were found to be defective. Four significant risks were caused by parts:
- A component ordered was too long for international shipment, so it was cut and shipped in pieces. What arrived was not usable and replacing it locally was time-consuming and very expensive.
- The required quantity of a new integrated circuit chip was unavailable, resulting in delivery delay.
- A critical software component was delivered late.
- Insufficient material was sent to the contract lab to complete testing.
Hardware, systems, and other equipment needed to perform project work also caused delay risks. One black swan was hardware-related, caused by a shipment of required servers that got stuck in customs. (None of the black swan risks were due to tardy decision-making, showing that even the slowest managers will eventually make up their minds.)
Black swan dependency risks caused by external factors also managed to cause a good deal of pain.
Dependency on other projects led to significant damage. The need to synchronize project schedules in a large program is complex, and timing problems may only surface late in the project. There were four black swan risks associated with programs:
- The manager of a related project allowed stakeholders to make frequent scope changes that caused delays.
- Interdependencies in complex program were detected late.
- The work between related projects was poorly coordinated.
- Firmware needed for a key project component was dropped by another project.
Infrastructure dependencies also interfered with project schedules. These situations included interruption of technical services, such as computer systems or networks required by the project, and inadequate access to resources such as help desks, system support, and people who understood older but necessary applications. The two most significant infrastructure examples were:
- Development platforms had six-month validations; because of a small slip, recertification was required.
- The operating environment was upgraded to a new version, requiring rework and significant overhead.
Legal and regulatory dependencies can also be problematic. Legal and other mandatory requirements can change abruptly. There was one project in the database that encountered regulatory delay because of a process change that required an unexpected lengthy recertification.
Resource Risks
Resource risks are also a substantial part of the PERIL database. There are three categories of resource risk: people, outsourcing, and money. A Pareto chart of overall impact by type of risk is in Exhibit 12.
Resource root causes categories are:
Exhibit 11 – Resource Root Cause Categories
Exhibit 12 – Total Project Impact by Resource Root-Cause Subcategories
Resource Black Swans
As with the other categories, the most severe of the resource risks account for about one half of the total impact. The details are:
Exhibit 13 – Most Severe Resource Risks
As can be seen in the table, the black swan resource risks were distributed unevenly. The money category represents a higher portion of the total, with outsourcing about as expected, and people-related risks much lower.
Not surprisingly, money issues were a substantial portion of the black swan resource risks. When funding is a problem, it was usually a very big problem. Eight cases, more than half of the risks reported in the money category, were in this group, including such problems as:
- The project budget was limited to the bare minimum estimated.
- Important parts of scope were dropped due to insufficient resources.
- Not enough staff was funded to cover the workload.
- Major cutbacks delayed fixes that lost time (and ultimately also cost a lot more money).
- Only half the resources required were assigned to the project.
There were also 10 outsourcing black swan risks. Nine were due to late or poor output from outsource partners. The growth of outsourcing in the recent past has been driven primarily by a desire to save money, and often it does. There is a trade-off, though, between this and predictability. Work done at a distance is out of sight, and problems that might easily be detected within a local team inside the organization may not surface as an issue in outsourced teams until it is too late. These cases are compounded by the added element of surprise; the problem may be invisible until the day of the default (after weeks of reports saying, “Things are going just fine…”), when it is too late to do much about it. These were some of the problems:
- The vendor was unable to complete their subproject and the work had to be redone.
- The supplier was purchased and reorganized; the project had to start over with a new supplier.
- The supplier shipped late and then the delayed deliverable failed.
- The subcontractor failed to understand technology and requirements.
- The partner on the project defaulted.
Delayed starts are also fairly common with outsourced work. Before any external work can begin, contracts must be negotiated, approved, and signed. All these steps can be very time-consuming. Beginning a new, complex relationship with people outside your organization can require more time than expected. For projects with particularly unusual needs, just finding an appropriate supplier may cause significant delays. There was one black swan outsourcing risk in this category caused when settling the terms of the agreement and negotiating the contracts took months and caused the project to begin very late.
There were 15 additional black swan people risks, representing the largest total number of severe risks.
Motivation issues represented among the highest average impact for any of the subcategories in the PERIL database. Motivation issues are generally a consequence of dropping interest on very long duration projects, or due to interpersonal conflicts. The three black swan risks associated with motivation were:
- Management mandated the project but never got team buy-in.
- The staff got along poorly and frequently quarrelled.
- The product manager disliked the project manager.
Permanent staff loss also caused a lot of pain and led the list of black swan people risks. The reasons for permanent staff loss included resignations, promotions, reassignments to other work or different projects, and staffing cutbacks. These are the five most severe examples:
- A key staff member resigned.
- The committed medical expert was no longer available.
- Staffing suffered cutbacks.
- Specialists were lost, including designers, business analysts, and QA/testers.
- There was a company-wide layoff.
Temporary loss of project staff was also a very common people-related risk. It was often due to illness, a customer “hot site” problem, travel problems, or organizational reorganizations. This led to only one “black swan” risk, due to an unexpectedly early start of conflicting peak-season support work that resulted in a protracted loss of project staff.
Queuing problems originate when organizations optimize operations by investing the bare minimum in specialized (and expensive) expertise, and in costly facilities and equipment. This leads to contention among projects for access. Optimizing organizational resources needed for projects based only on cost drives out necessary capacity and results in project delay. There were also three black swan project risks due to queuing, where projects were slowed by lack of access to specific resources:
- Insufficient QA resources were available to cover the auditing tasks and training tasks.
- Key decisions were stalled when no system architect was available.
- Several projects shared a single subject matter expert.
Necessary staff unavailable at the beginning of a project was also a delay problem. Whenever a prior project is late, contributors are unavailable. When people finally do become available, they tend to be burned out. There were three more major people risks caused by late staffing availability, all caused by people trapped on a prior project.
Conclusions
Reading these lists can be pretty depressing, although familiar. Such lists can be fertile sources for helping to identify risks for most any project. Once listed, though, the risks are no longer black swans—“large-impact, hard-to-predict, rare events.” They can be predicted, and through effective project planning and risk management you can reduce their impact and work to make them even rarer.
Dealing with scope-related risks requires effective planning. Specifically, work to fill in scoping gaps through clear deliverable definition and development of a thorough project work breakdown structure. Minimize scope creep by freezing requirements when you set the project baseline, and then impose a very strong and sufficiently formal specification change control process. Deal with scope dependencies, especially on longer projects, by including a detailed review of the project’s constraints, assumptions, and environment during periodic project planning reviews. Hardware or software defects may be better anticipated by ensuring measurable deliverable completion criteria and establishing the particulars of all required tests and evaluations as part of initial scoping (not as an afterthought at project end). Integration issues on large, multi-project programs depend on adequate architecture and systems analysis, with particular attention to project interdependencies.
Managing schedule risks also depends on thorough project analysis. Estimating flaws can be dealt with better using what-if and scenario planning, maintaining and using project metrics, and always probing for worst-case estimates. Learning curve issues require a healthy skepticism of anything new and integration of needed training and ramp-up time for new skills into the project timeline. Adjusting to unrealistic top-down imposed deadlines relies on development of solid, fact-based bottom-up planning data, and using it to revise project objectives through principled negotiation. Detecting delays due to both internal and external dependencies is a matter of understanding the workflows for the project as a whole, with particular attention to types of activity inputs that have been problematic in the past. In multi-project programs, formally identify and get sign-off for all project interdependencies. Also, obtain commitments for decisions and, when necessary, escalations from management.
Resource risks also yield to better planning. Ensure adequate support and resources by obtaining reliable commitments for sponsorship and funding continuity. Work to develop contracting competence for dealing with outside service providers, and include terms in all contracts that support your project objectives—including incentives and penalties. Strive to reuse outsource partners who have performed well in the past and seek reliable and relevant reference information on any new providers. For staffing issues, develop activity ownership and contributor assignments by name, not by function (or “TBA”). Whenever possible, co-locate project contributors and strive to build and maintain trust and motivation. Detect overcommitments both due to work within the project and outside responsibilities. Replan your work to minimize them, or resolve them by lowering other priorities. Also analyze your plans to detect situations where queuing for limited resources could be a problem. If you cannot identify better alternatives, at least plan to notify those involved in advance and work to ensure that you will have priority.
Overall, develop a realistic risk register. Where possible, avoid or mitigate all severe preventable risks. Establish contingency plans for prompt recovery from all other severe risks. Set up reserves for project schedule and/or budget based on your risk analysis, and when your project is overconstrained based on fact-based project planning data, negotiate changes that will result in credible project objectives.