Titanic lessons for IT projects
This paper takes hard-learned lessons from a “nuts-and-bolts” historical project and applies them to today's IT projects. It analyzes the project that designed, built, and launched R.M.S. Titanic, showing how compromises during the project design, construction, and testing phases, made to accommodate various business interests, led to serious flaws in this supposedly “perfect ship.” In addition, this paper explains how major mistakes made during the outset of the ship's operation led to disaster. All of these disastrous compromises and mistakes reduced the effectiveness of safety systems and provided faulty operational data upon which to base management decisions. While no one could predict that the ship was going to run into ice, the compromises and mistakes almost guaranteed that such a collision was going to be a serious one and result in a catastrophic failure. This historical example cuts away the layers of modern IT jargon and complexity, illustrates the consequences of seemingly innocuous project and operational decisions, and helps the reader avoid repeating the mistakes of the past.
Imagine yourself on one of Titanic's lifeboats being picked up by Carpathia. It is 1912 and you have just survived the most infamous disaster of the 20th century. You question how this could happen. Why was the ship traveling so quickly? Where were the lookouts? Wasn't Titanic supposed to be the safest ship ever built? The answer to these questions lies as much in the four-year construction project as in the four days of operation. Many attribute the disaster to bad luck and incompetence at sea, but examination of the evidence presented at the two subsequent inquiries shows that, because prestige overtook safety as the primary principle in Titanic's design, the ship many thought invincible had a fate that was inevitable.
In today's world, IT project failures still hit the headlines and are more common than most people suspect. The most expensive project failures are those after the implementation—maybe days, weeks, or months into the operation. Many are typically due to compromises made through the project that seem sometimes so innocuous at the time that they barely register in the project logbook.
Project Failure Rates Today
In 1994 the Standish Group brought attention to the success and failure rates of IT projects through a landmark report, Chaos, which kicked off much research that continues today with the Standish Chronicles. In the last decade numerous studies and surveys on IT projects have shown that the success rate is around 25%, the failure rate is about 25%, and partial successes and failures fall somewhere in the middle.
Success and Failure Rates of Today's IT Projects
Typically, there are two types of project failures:
- A project that consumes resources but fails to deliver an acceptable return on investment (ROI), is terminated before completion, or is poorly scoped, so that resource allocation is insufficient. This results in low adoption or produces insufficient value and no lessons learned.
- A project that consumes resources but fails to deliver as proposed, exceeds budget, exceeds time, or doesn't meet specifications.
The following list of failures happened within the project itself, supporting the Standish Group's claim that close to 50% of projects are seriously challenged:
- The IRS project on taxpayer compliance took over a decade to complete and cost taxpayers an unanticipated $50 billion.
- The Oregon DMV conversion to new software took eight years to complete, the budget grew by 146% ($123 million), and public outcry eventually killed the entire project.
- In September 2006, Department of Homeland Security admitted project failure and closed the Emerge2 program, a $229 million new financial IT system.
- In May 2006, the Australian Navy Seasprite helicopters were grounded due to software problems, with billion spent.
- In April 2005, interdepartmental warfare played a significant role in the failure of a $64 million Federal IT project.
- In 2005, the U.S. Justice Department Inspector General report stated that the $170 million FBI Virtual Case File project was a failure, after five years and $104 million in expenditures. Over one 18-month period, the FBI gave its contractor nearly 400 requirements changes.
- In 2005, the UK Inland Revenue produced tax payment overpayments of $3.45 billion because of software errors.
- In May 2005, a major hybrid car manufacturer installed a software fix on 160,000 vehicles. The automobile industry spends $2 billion to $3 billion per year fixing software problems.
- In July 2004, a new $200 million government welfare management system in Canada was unable to handle a simple benefits rate increase. The contract allowed for 6 weeks of acceptance testing and never tested the ability to handle a rate increase.
- In 2004, Avis cancelled an ERP system after $54.5 million was spent.
A more critical failure is after the implementation, maybe days, weeks, or month into operation, when the project has been deemed completed and therefore successful. These failures are unpredictable, unexpected, and by far the most costly, because of impact on services and those who rely on them, such as customers or employees. With the arrival of the Internet and e-commerce, businesses have become increasingly more dependent on their operational systems, to the point where if they are unavailable, it can have a massive impact on the organization at different levels. The following list of failures happened during and after implementation:
- In March 2007, US Airways struggled with a faulty reservation and ticketing system, and kept lines down by adding workers and asking travelers to use the Internet for check-ins.
- In December 2006, a computer systems outage made it difficult for air traffic controllers in Florida to identify and track more than 200 flights in the air, allowing some planes to come too close together, according to officials.
- In January 2006, Tokyo Stock Exchange Inc. was forced to halt trading 20 minutes earlier than normal because its system was close to capacity. In December, the software was questioned after an erroneous order to sell 610,000 shares of J-Com Co.
- In December 2004, Comair airline had to face a cancellation of over 1,000 flights on Christmas Day after its computer reservations systems crashed.
- In November 2004, a computer failure at the Department for Work and Pensions (DWP) stopped 80,000 staff from processing new pensions and benefits claims for several days.
- In October 2004, Avis Europe took a €45 million hit due to problems with a new ERP system. Development halted with delays and higher costs due to implementation and design problems.
According to Hollaway (2005),
“KPMG International's survey of 600 organizations across 22 countries revealed that 86% of respondents reported the loss of up to a quarter of their targeted benefits across their project portfolios. Nearly half of respondents reported at least one project failure in the past year, an improvement from KPMG's 2003 survey where 57% experienced one or more project failures in the previous 12 months. 86% of projects have a business case but over 60% ignore it.”
Sainsbury's $526 Million Project Failure
In October 2005, giant British food retailer Sainsbury's wrote off $526 million invested in an automated supply-chain management system. Merchandise was stuck in the company's depots and warehouses and not getting through to many of its stores. Sainsbury's was forced to hire about 3,000 additional clerks to stock shelves manually (Why software fails, 2005). “If an ERP project costs more than $10 million, your chances of coming in on time and on budget are statistically zero,” said Jim Johnson, chairman of Standish Group International, which surveyed more than 8,000 software-application projects over the past few years. “You also have a 50/50 chance of its being canceled before it's completed after you've spent 200% of your budget.” (Standish, 1994)
UK Air Traffic Control Upgrade Project
In September 2004, flights across the UK were grounded after an air traffic control computer failure at West Drayton control center. National Air Traffic Services' Flight Data Processing System failed at around 0600 BST for an hour, after overnight testing of an upgrade. Thousands of passengers have been experiencing delays as airlines work to clear the backlog of flights. Planes had to be grounded at airports including Gatwick, Heathrow, Manchester, and Inverness. By mid-afternoon, delays at Heathrow and Gatwick were still 90 minutes, while at Stansted and Scottish airports, the delay was about 30 minutes. (BBC, 2004a)
Root Causes of Project Failures
There are many reasons why projects fail, from the lack of a business case to not having a contingency plan. The Titanic case study below will highlight some of these.
Analysis of Titanic's Construction Project
To date, most of the research on Titanic has been on the four-day maiden voyage and the disaster. To extract lessons for today's business world requires going back to 1909, the outset of Titanic's project, and examining each stage of the project.
In 1909, White Star was facing stiff businesses pressures no different to organizations today. For several years, White Star had been losing ground to competition. Its executives responded with a strategy based on replacing the aging fleet with three super-liners, using the latest in emerging technology. Larger ships meant larger accommodation and public space, and therefore more luxury. The Atlantic crossing would be slower by a day but the quality of the trip and the customer experience were paramount. No expenses were spared, and this became a mantra for the project. Director Bruce Ismay initiated the construction project. In the requirements stage, Europe's best craftsmen were contracted at Harland and Wolff. The business case was solid, with a two-year payback, admirable by today's standards.
Today, IT projects are rarely cancelled at this stage of a project. However, many projects fall into the trap of not looking at the solution implementation and operation closely enough and assessing the risk of something going wrong. Understanding the user dependency on a solution and what the loss of services would mean to the organization requires creating a business case to accurately establish the potential lost revenue. The organization can then start to define mitigating strategies and where to apply resources to the project. It can also set some service-level targets that will guide the architect in the subsequent stages.
The architects transferred the business requirements into functional and non-functional requirements, that is, what the solution does versus how it does it. The former (what) defined transportation and hospitality. The latter (how) defined the operational characteristics and included safety, performance, stability, security, maintainability, and the environment to deliver the ship's functions. White Star invested in a shipbuilder's model (IT pilot) which was used to analyze all exposures to the possibility of loss. Based on the mantra of the project and the solid business case, the architects went with the highest level of safety and incorporated the latest safety technologies: a double-skin hull (the bottom space divided into 73 watertight compartments), 15 bulkheads and electric doors, 48 lifeboats, and advanced water pump technology. However, these were undermined by executive pressure from Bruce Ismay, who pushed for the ultimate passenger experience. For example, the need for a spacious 200-foot ballroom cut straight across bulkheads in the centre of the ship. Similarly, a desire to give a clear ocean vista to the first class suites on the promenade/lifeboat deck was at odds with triple-stacked lifeboats. Titanic's overconfident architects conceded and four bulkheads barely reached 10 feet above the water line (Exhibit 1), while the 48 triple-stacked lifeboats were reduced to a single-stack of 16, too few for the numbers on board. The double-skin hull ended well under waterline as so not to infringe on the all important capacity of the ship.
Today, this stage of the IT project is centered on the how, or the non-functional requirements, and needs to pay close attention to critical application dependencies, interdependency of data, and enterprise application integration. Things can start to go wrong if the project is under tightening financial constraints. Non-functional requirements get “chopped” in favor of functional requirements deemed more important to the business, if the solution architects do not push back on these pressures or the project manager does not intervene.
Although the ship's non-functional requirements had been severely compromised, there was little acknowledgement that anything was seriously wrong. Titanic's architects still believed Titanic was practically unsinkable and could survive any situation because of the aggregated effect of safety features, the broad hull design, sheer size and the use of latest technologies. This was used actively in the marketing. The lifeboats were viewed as an added safety feature, should Titanic have to rescue another ship in distress.
Today, IT projects can continue to go wrong at this stage if compromises to the non-functional requirements are not acknowledged or mitigated. Very often, the non-functional requirements are sacrificed because they are less visible and their importance is not highlighted to the executive steering committee and decision makers. With hundreds of micro-decisions being made weekly, project managers need to aggregate the impact of these and the risks for the business groups to understand.
Plan Test Stage
This stage was compromised by Titanic's sister ship, R.M.S Olympic. In service in June, 1911, she had a track record deemed adequate for launching an identical ship into service without extensive sea trials. But the track record was spotty, with several incidents, the most serious being a collision with a British cruiser. The cruiser pierced Olympic's outer skin and caused considerable damage that required four weeks of repair: plating and a bent propeller shaft were replaced, at one-sixth of the original total cost. Work stopped on Titanic as workers and components were transferred. The business pressures for Titanic to sail were enormous, considering the large investments in the four-year construction. The scope of Titanic's sea trials (Exhibit 2) was dramatically reduced so the launch window could still be realized, after all the world's richest people had booked for the social event of 1912. If you were one of the financial elite you had to be on board—White Star had completed a brilliant marketing job.
Today, IT projects at this stage need to adequately assess the risk and determine an appropriate test strategy to ensure that not just the functional requirements are tested, but more importantly, the non-functional requirements are. Moreover, this is done by motivated testers (not developers), in an adequate test environment, and supported by robust processes. The project should also ensure that adequate change and problem management processes are in place.
A perception grew with both White Star and the public that Titanic was invincible. The ship underwent one day of sea trials in April, 1912. With the staged delivery of three ships, Ismay saw a marketing opportunity to promote each ship as an improvement over the last. By beating Olympic's best crossing time of six days, he could market Titanic as superior. To promote this, he published a shipping announcement in the New York Times that Titanic would arrive a day earlier than the published schedule. This was a publicity stunt, but in reality, Ismay was writing out a new service-level objective without verifying it with his captain and officers. This was fateful, as it pushed the ship to her operational limits.
Today, IT projects at this stage need to ensure that a battery of tests is invoked and completed successfully. Also, metrics should be collected and service-level objectives and agreements need to be refined. The team should look at the operation and test any early warning systems and base automation.
By the end of the Titanic project, the arrogant view evolved that Titanic was a huge lifeboat. In short, the people “who should have gotten it”—the architects—allowed the compromises to pass. The Titanic project team made the mistake of believing the initial design assumptions, and not testing these far enough. This set a high level of confidence for the maiden voyage (or production). Such was the confidence in the safety of the ship, that by the end of the project, disaster recovery and business continuity plans were considered superfluous. As the ship went into operation, a perception emerged that even if things did go wrong operationally, the ship had enough safety features to protect it. This instilled a mindset in the crew and passengers that the ship was unsinkable. Why else were 53 millionaires aboard?
Similarly, many of today's IT project failures are typically due to compromises made throughout the project, almost innocuously, where they are not picked up, understood, and identified as risks.
Analysis of Titanic's Voyage
On leaving Southampton, Titanic had a near collision, similar to the incident between Olympic and Hawke. The steamer New York broke her own moorings and came within four feet of Titanic, indicating the challenges in handling the large ship. At Queenstown, the last port before the Atlantic crossing, Board of Trade inspectors checked Titanic for safety. A lifeboat drill was performed to determine crew readiness, with two lifeboats lowered. The poorly executed test failed to highlight that the crew was not prepared for a disaster that would require the launch of all 16 lifeboats, as would be necessary in a calamity.
Today, IT projects' business executives and project managers need to know the impact of the implementation on business services, the risk of remaining live with it, and a well laid out plan of alternatives. The project manager needs to have at least reviewed the disaster recovery plan and business continuity planning prior to implementation.
At sea, Titanic's maiden voyage, or operations stage, was riddled with problems. First, an ice detection test was fudged because a mariner failed to report a problem with it. Second, Titanic received eight reports warning of icebergs and icefloes. However, the radio operators sporadically relayed these to the bridge, because they were preoccupied with the flood of outgoing commercial radio messages. The radio operators were employed and paid by Marconi to transmit messages for first class passengers. Third, the lookouts were missing binoculars, and had repeatedly reported that since leaving Southampton, but were ignored. On top of all this, Ismay was patrolling the ship and ignoring operational procedures by pushing the crew to reach maximum speed. The ship's officers failed to piece together the extent of the ice field and understand the true danger, as the feedback systems went awry.
The collision was probably inevitable, with the compromised safety features, the failure of feedback systems, and the belief that Titanic was invincible. But what was scandalous was that bad management turned what could have been mere embarrassment into an outright disaster. Passenger evidence at the inquiries was consistent: hundreds described Titanic innocuously coming to a halt with a quiver or grinding noise that lasted a few seconds, rolling over a thousand marbles (Brown, 2003). For example:
- “At the time of the collision I was awake and heard the engines stop, but felt no jar. My husband was asleep.” – Emily Boise Ryerson
- “It was like a heavy vibration. It was not a violent shock.” – Walter Brice, Seaman
- “I was dreaming, and I woke up when I heard a slight crash. I paid no attention to it until the engines stop[ped].” – C. E. Henry Stengel
- “We felt it under the smoking room. We felt a sort of stopping, a sort, not exactly a shock, but a sort of slowing down.” – Hugh Wooner (Brown, 2003, p. 79, 92)
There was no “crash stop” and no fatalities, broken bones, or even minor injuries. There was no violent jolt sideways or repeated strikes along the ship's length, or rebound effect, as are common with a side-swipe against an ice spur when a ship is turning very hard away from it. The breakfast cutlery in the dining salons barely trembled and drinks remained unspilled in first class smoking rooms. All the evidence indicates a grounding onto an underwater ice-shelf at the base of the iceberg, or a “soft landing” (Exhibit 3).
Ismay was hell-bent on saving face, and what greater feat than Titanic saving herself? His anxiety over White Star's reputation created an atmosphere where mistakes were easily made. Coupled with inaccurate information, bad decisions were made as Ismay telegraphed the engine room “dead slow ahead” in the hope of recovering the situation, in the belief Titanic could limp back to Halifax. He only succeeded in turning the situation into a horror. Engineers later testified the ship sailed forward at three knots with a grinding noise. This forward motion further ruptured Titanic's double hull and the design flaws compromised the ship, as it could not handle the increased rate of flooding.
Post Mortems Through the U.S. and U.K. Inquiries
Following the disaster, the U.S. and British authorities conducted post mortems in competition with each other. The U.S. inquiry forced Ismay to stay in the U.S., and grilled him over his role. Ismay and the remaining officers concocted the ice spur coverup. The U.S. inquiry came close to uncovering the truth. It recommended lifeboat space for every person on all ships from U.S. ports, lifeboat drills, adequate manning of boats, and 24-hour operation of radiotelegraph. The British government assisted in the cover up and saved White Star from bankruptcy. After all, with the Great War looming, Britain needed large ships for transportation. It condemned Captain Lord for not responding to flares and criticized the British Board of Trade for not updating lifeboat regulations.
Review of Main Concepts and Why IT Projects Fail
There are many comparisons to modern projects. For example:
- Non-functional requirements can get overshadowed by functional requirements if not maintained in a healthy balance
- The executive sponsors can unknowingly and unwittingly negatively influence and compromise the project
- Project over-confidence can negate diligence in subsequent stages, such as not ensuring testing is adequately completed
- Warning signals from earlier projects can be ignored
- The project chain of command can be compromised by exuberant executive sponsors
- Project problems and issues often surface days, months or even years after the project is completed and in production.
Today's projects may be successful on deployment and pass a broad number of “standard” tests (system, performance, and acceptance), yet still fail catastrophically when in operation. After all, only 25% of all projects are successful. The success of projects should not be measured at deployment, but rather after the solution has been in production for awhile and carefully measured. Metrics should be closely tied to the overall impact on the business.
The roots of Titanic's disaster were in the project, compromises to safety features, and elevation of expectations that allowed business pressures to override operational procedures. This lead to numerous violations of the “rules of good seamanship,” and the probability of failure increased because of the inability to recognize introduced risks.
BBC News (2004, June 3a) Massive air disruption across UK. Retrieved from http://news.bbc.co.uk/2/hi/uk_news/3772077.stm
BBC (2004, June 3b) UK Air Traffic Control Upgrade Project Retrieved August 15, 2007 from http://news.bbc.co.uk/2/hi/uk_news/3772663.stm
Bonsall, T. E. (1988). Great Shipwrecks of the 20th Century. New York: Gallery Books.
Bristow, D. (1995). Titanic: Sinking the Myths. South Africa: Katco Literary Group
Brown, D. (2003). The Last Log of the Titanic. Columbus, OH: McGraw-Hill.
Davie, M. (1987)The Titanic: The Full Story of a Tragedy. New York NY: HarperCollins Publishers.
Hollaway, K. (2005, 25 Nov). KPMG highlights IT project failures [Electronic version]. Accountancy Age, Retrieved July 10, 2007 from http://www.accountancyage.com/accountancyage/news/2146792/kpmg-highlights-project
Hyslop, D., Forsyth, A. & Jemima, S. (1998). Titanic Voices. New York: St. Martin's Press.
Kozak-Holland, M. (2006). Avoiding Project Disaster: Titanic Lessons for IT Executives. Oshawa, Ontario Canada: Multi-Media Publications Inc.
Lord, W. (1955). A Night to Remember. New York: Holt, Rinehart, & Winston.
Lord, W. (1985). The Night Lives On. New York: Holt, Rinehart, & Winston.
Lessons from History (2006b) Functional versus non-functional requirements . Retrieved July 10, 2007 from http://www.lessons-from-history.com/Level%203/functional%20vs%20non-functional.html
Lessons from History (2006b) What Determines a Projects Success or Failure Retrieved July 10, 2007 from http://www.lessons-from-history.com/Level%202/Project%20Success%20or%20Failure.html
Spignesi, S. (1998). The Complete Titanic. New York: Birch Lane Press Group.
Newswise (2005a) Sainsbury's $526 Million Project Failure. Retrieved August 15, 2007 from http://www.newswise.com/articles/view/513919/
NewsWise (2005b) Why software fails. Retrieved July 10, 2007 from http://www.newswise.com/articles/view/513919/
The Standish Group. (1994). Chaos. Retrieved July 6, 2007 from http://www.standishgroup.com/chaos_resources/index.php
Thompson, H. (2000). Customer Value Management. New York, NY :McGraw-Hill.
Wade, W. C. (1979). The Titanic: End of a Dream. New York: Rawson, Wade.
Wels, S. (2000). Titanic: Legacy of the World's Greatest Ocean Liner. Alexandria, VA: Time-Life Books.
Illustrations were used courtesy of the Ulster Folk & Transport Museum
© 2007, Mark Kozak-Holland
Published as a part of 2007 PMI Global Congress Proceedings – Atlanta, GA, USA