Do most project managers still live under the bell curve?
Recent developments in mathematics and research results of the last few decades have shown that experts using the most statistically sophisticated models are usually no more accurate than a novice using the simplest models when managing risks or making decisions.
These findings raise a number of questions about the role and usefulness of forecasting when uncertainty cannot be measured with our usual Gaussian probabilistic models and the bell curve. This raises the question of what we can do to face future uncertainty realistically and rationally and how we should look at managing project/program risks and decisions in the 21st Century?
In this paper, I will present the advantages and disadvantages of using a traditional approach when the timeframe is limited (e.g., in projects). I will then explore alternate methods of decision-making at the program level, where uncertainty rises because of the increased number of variables.
Hasset and Steward (1999) stated that “Probability theory is used for decision-making and risk management throughout modern civilization. Individuals use probability daily, whether or not they know the mathematical theory…” (p. 1). Indeed, most of us make daily personal and professional decisions in regard to weather, finances and health based on probability statements. But “how much do we really know about statistics and the theory of probability?”
Taleb (2007), the author of The Black Swan, goes as far as stating that one cannot be a modern intellectual without thinking probabilistically. However, we will see that it is often impossible to use the teaching of this very young science in a wise way. As with any other tool, poor use or misuse will reduce its effectiveness or worse still, might lead to serious harm.
Many of us use statistics to make small decisions about what to wear given weather predictions, but also more important ones like how we invest our money. Most people would not be able to explain the validity of the mathematics behind the claims they use to make these decisions. Even as a PhD student, I remember most of my colleagues feeling bound by the statisticians’ recommendations without really understanding the basis of the statistical analysis necessary to support their research projects. This led to interesting situations during which I was too often the quiet observer of poorly designed questionnaires being subjected to the most rigorous mathematical analysis. But if the statistical analysis looked good, this was considered ““good research.”“ Statistics and probability theories have partially shaped our way of thinking and understanding problems, however, most people admit to having very restricted knowledge and understanding of the field at large. Often, from fear of looking “dumb” many, from layperson to PhD, will refrain from asking questions, let alone, contradict “the statistical expert.” Many of the tools used in business decision-making use statistical analysis and decisions are often based on quantitative data. I sometimes feel that I have left academia to find many professional colleagues faced with the same difficulties as those I had witnessed among my PhD colleagues and students.
This paper is an effort to clarify when statistical knowledge is useful, when it can help in risk analysis and decision-making activities, but also how it is sometimes misunderstood and why sometimes it is best left aside to the advantage of other tools better suited to the situation at hand.
Probability—A Young Discipline
One trip to Macau is enough to convince anybody that humans are fascinated with chance games. Archeologists tell us that ancient civilizations were playing dice games using the heel bones of animals (Bernstein, 1998). We could reasonably assume that our ancestors did not grasp the concepts of odds or probabilities. In fact, the astragali bone has two narrow sides and two wider sides. A basic statistical analysis would award a greater number of points to the more difficult task of landing on the narrow side. However, according to David (1962), studies of classical writings point to the contrary.
There have been a number of theories to try and understand the reasons for the late development of probability as a science. The most widely accepted cause is the fact that ancient people may have lacked in basic arithmetic because the Greek and Romans used letters for numbers and did not have a zero (Seife, 2007). It is only in the 13th century that the Hindu-Arabic system started facilitating calculations (Hacking, 1975).
Pascal was one of the first to solve a gaming problem with the help of Fermat (Hasset & Steward, 1999). However, their contribution was limited to cases in which a finite number of equally likely possible outcomes could be enumerated. It was only in the 18th century that Bernoulli examined statistical sampling when dealing with uncertainty. Bernoulli also looked at problems with a potential infinite number of outcomes. He first described The Law of Large Numbers (LLN) in 1713 and in 1835 it was published by Poisson under the name “La loi des grands nombres” (Hacking, 1983). This is often referred to as the first law or theorem on which the discipline is based and as a proof to the idea that uncertainty decreases as the number of observations increase.
During the 18th and 19th centuries, among the famous names that surface in the history of probability are Bernoulli, de Moivre, Legendre, Gauss, and Poisson and it was during this period that the “bell curve” was developed, indicating that random drawings distribute evenly around their average value. Ultimately, the 19th century saw the first clear definition of probability with Laplace’s “classical definition” in his “Theorie analytique des probabilités” (Hasset & Steward, 1999; Stigler, 1986).
Probability and Decision-Making
If the science of probability was initially developed by people who wanted to gamble intelligently, statistics is based on probability but goes beyond it to try and solve problems involving inferences based on sample data (Hasset & Steward, 1999 p. 3). Hasset and Steward (1999) attributed the application of probability to risk management to Markovitz in the early 1950s with the publication of his portfolio theory and to the probabilistic study of mortgage prepayments that was developed in the late ’80s (to study financial instruments that were created in the ’70s and early ’80s).
Many scholars point to 1662 as being the real start date to our modern statistics with Graunt’s publication of Observations on the Bills of Mortality. In its early applications, statistical thinking mainly focused on the needs of governments and states to base their policies on demographic and economic data (Heyde, 2001). It is only recently that statistics have been used in business and the natural and social sciences. Because of its empirical roots, it is considered a branch of applied mathematics, not pure mathematics.
One of the pillars of statistics is the Law of Large Numbers (LLN) described by Bernoulli in 1713 (Hacking, 1983). The LLN is important because it “guarantees” stable long-term results for random events. For example, while a casino may lose money in a single spin of the roulette wheel, its earnings will tend toward a predictable percentage over a large number of spins. There is no principle that a small number of observations will converge to the expected value or that a streak of one value will immediately be “balanced” by the others.
However, as in the well-documented “gambler’s fallacy,” knowledge, or sometimes partial understanding, of LLN can affect people’s understanding of life in general, and more particularly their decision-making processes and their attitude toward risk. For example, it is often the belief among gamblers that if a coin is tossed repeatedly and tails comes up more than half the time, heads is more likely in future tosses (Baron, 1998). But, statistics tell us that the probability of getting heads on a single toss is always one in two. Kahneman and Tversky (1984) interpreted this to mean that although many may have been taught statistics and probability in school, most people believe short sequences of random events are representative of longer ones, which is false.
Recently, authors have explained this attitude in the light of what they have labeled “Naïve theories” of belief. These involve systems of beliefs that result from incomplete thinking. They are analogous to scientific theories, but what makes them naïve is that they have been superseded by better theories and many modern theories will become naïve in the light of future theories (Baron, 1998).
In his book, The Black Swan, Taleb (2007) warns us about the pitfalls of the tendency to base the significance and reliability of results on the amount of data. The author gives the practical example of a turkey that is fed for 1,000 days—every day confirms to its statistical department that the human race cares about its welfare “with increased statistical significance.” On the day 1001, the turkey has a surprise: it is Thanksgiving! In this evocative case, the turkey would have been better off with one single Thanksgiving day documented than its 1,000 other “normal days.” This single day event of Thanksgiving is what Taleb has labeled a “Black Swan.”
As he explains:
“Go ask your portfolio manager for his definition of “risk,” and odds are that he will supply you with a measure that excludes the possibility of the Black Swan hence one that has no better predictive value for assessing the total risks than astrology (we will see how they dress up the intellectual fraud with mathematics).” (p. xviii)
Another great pillar of statistics is the central limit theorem (CLT). The first version of this theorem was postulated by the French-born mathematician de Moivre in 1733, who used the normal distribution to approximate the distribution of the number of heads resulting from many tosses of a fair coin (Tijms, 2007). This theorem expresses the fact that any sum of many independent identically distributed random variables will tend to be distributed according to a particular “attractor distribution.” Because many real populations yield distributions with finite variance (height, weight…), this explains the prevalence of the normal probability distribution. However, Taleb (2008) warns of an important distinction that needs to be made between two classes of probability domains that are very distinct qualitatively and quantitatively.
▪ In the first class, variables have a finite range of values such as human weight, height (no human has ever measured 10 metres…), exceptions occur, but do not carry large consequences. If you add the heaviest person on the planet to a sample of a 1,000, the total weight would barely change, nor would your overall results.
▪ In the second class, variables have no known range of results (for example, wealth can be infinite). Here, one exception can change all your results and in time represent everything. For example, add Bill Gates to your sample of 1,000 and the wealth will jump by a factor of >100,000.
The first class of probability is often well documented in textbooks and popular books on randomness, research, and business texts. It concerns issues that are primarily “Gaussian – Poisson” problems. The second class of probability is rarely mentioned.
Around 1794, Karl Friedrich Gauss, the mathematical genius, was the first to describe the method of least squares and later the “normal” or “bell” curve. It seems that the discovery of the planet Ceres by Piazzi led to his discovery. Mathematical tools of the time were not able to extrapolate the future position of a planet from a small amount of data. Gauss, who was 23 at the time, heard about the problem and tackled it. Least squares is now often applied in statistical contexts, particularly regression analysis and can be interpreted as a method of fitting data.
The method actually corresponds to the maximum likelihood criterion if the experimental errors have a normal distribution. Gauss proved the method under the assumption of normally distributed errors. It states that in a linear model in which the errors have expectation zero and are uncorrelated and have equal variances, a best linear unbiased estimator of the coefficients is given by the least-square estimator. As one can notice, this is only true given the start point assumption and in particular circumstances.
Best linear unbiased prediction is used in linear models to predict random effects. These best linear unbiased predictions are often referred to as “empirical Bayes estimates.” Empirical Bayes methods are used to estimate quantities (probabilities, averages…) about one member of a population by combining information from their measurements and those of the entire population (Robinson, 1991).
The Bayesian approach is a popular business application of statistics in which we consider the problem of estimating some probability (such as a future outcome), based on measurements of our data, a model for these measurements, and some model for our prior beliefs about the system. A popular example is how insurance companies evaluate accident rates.
Unfortunately, most things in life occur with rare but consequential jumps, while most things are studied in the light of the “normal” or “bell curve.” This can be quite risky because the bell curve ignores large deviations, and can make us confident that we know more than we actually know.
Limits of Probability
The limits of the “classical definition” of probability by Laplace have been outlined by many authors to date. As we have seen, probability is a relatively recent area of study. One of these limits, as reported by Spanos and Hendry (1986), is that “…probability is used to define the idea of probability” (p. 33). It is not within the scope of this paper to retrace a full history of statistics or probability, but rather to try and appreciate the impact of the domain on both our risk management and decision-making attitudes and skills.
As we have all noticed, in many cases, the word probability is simply used in contexts where it has little to do with physical randomness and we often hear claims such as: “global warming is probably caused by pollution….” These are characteristic statements that mean “Hypothesis A is probably true” in lieu of presently available empirical evidence supports A to a high degree. This is called the “logical or epistemic or inductive” probability of A given the evidence.
The main point of disagreement between the different views, logical, epistemic, or inductive, warrants some caution. Bayesians or followers of epistemic probability, give the notion of probability a subjective status by regarding it as a measure of the “degree of belief” of the individual assessing the uncertainty of a particular situation.
This difference in point of view has implications both for the methods by which statistics is practiced, and for the way in which conclusions are expressed. When comparing two hypothesis and using some information, frequency methods would typically result in the rejection or non-rejection of the original hypothesis at a particular significance level, and would agree that the hypothesis should be rejected or not at that level of significance. Bayesian methods may suggest that one hypothesis was more probable than the other, and individual Bayesians might differ about which was the more probable and by how much (by virtue of having used different priors) (Cox, 2006). It ends up being somewhat of an understatement to say that probability can be confusing in terms of its theory, its concepts and even its vocabulary. The very simple example of evaluating percentage against odds makes this obvious (from blog: http://www.childrensmercy.org/stats/journal/oddsratio.asp):
- There is some confusion about the use of the odds ratio versus the relative risk. Can you explain the difference between these two numbers?
- Both the odds ratio and the relative risk compare the likelihood of an event between two groups. Consider the following data on survival of passengers on the Titanic. There were 462 female passengers: 308 survived and 154 died. There were 851 male passengers: 142 survived and 709 died.
- Clearly, a male passenger on the Titanic was more likely to die than a female passenger. But how much more likely? You can compute the odds ratio or the relative risk to answer this question.
- The odds ratio compares the relative odds of death in each group. For females, the odds were exactly 2 to 1 against dying (154/308=0.5). For males, the odds were almost 5 to 1 in favor of death (709/142=4.993). The odds ratio is 9.986 (4.993/0.5). There is a ten-fold greater odds of death for males than for females.
- The relative risk (sometimes called the risk ratio) compares the probability of death in each group rather than the odds. For females, the probability of death is 33% (154/462=0.3333). For males, the probability is 83% (709/851=0.8331). The relative risk of death is 2.5 (0.8331/0.3333). There is a 2.5 greater probability of death for males than for females.
- There is quite a difference. Both measurements show that men were more likely to die. But the odds ratio implies that men are much worse off than the relative risk. Which number is a fairer comparison?
Implications for Risk Management
There is certainly a big part of the limit of probability theory that is not due to its mathematics or tools, but simply due to life and the way our brain understands things.
Taleb (2007) has labeled single random events such as Thanksgiving for the turkey a “Black Swan” event. Until Australia was discovered, people believed that all swans were white. Fundamental logic dictates that no number of observations of white swans can permit us to affirm with certainty that “all swans are white” unless we are sure that we have observed the entire population of swans in the world. However, it takes only one Black Swan to contradict the “all swans are white” theory. Risk management is mainly about trying to predict the future and we can extend the swan reasoning to our turkeys. No matter how many observations made about their first 1,000 days, this is not informative about the future because these observations could lead us to conclude that the turkey is immortal with a high degree of significance!
Taleb (2004) gives another example that is interesting for us in business because it relates more precisely to past performance as a predictor for future performance, which is also a popular belief, especially among project managers:
“If one puts an infinite number of monkeys in front of typewriters, and lets them clap away, there
is a certainty that one of them would come up with the Iliad.” (p. 135)
Put in another way, this means that there are so many people out there doing business, that some are bound to get it right. The next question being: “How much can past performance be relevant in forecasting future performance?” Common wisdom among people with a budding knowledge of probability laws is to base their decision-making on the principle that it is unlikely for someone to perform considerably well in a consistent fashion without doing something right. This is a perfect example of how a small knowledge of probability can lead to worse results than no knowledge at all!
In terms of probability, it all depends on the randomness content of the profession and the number of monkeys in operation. The greater the number of businessmen, the greater the likelihood of one of them performing to a very high standard just by luck, not to mention that in real life the other monkeys cannot be accounted for. They are hidden away, as one sees only the winners. If reading a book on how to become a millionaire could make you rich, we would all be today. These biases can be outlined as follows: (a) The survivorship biases arising from the fact that we see only winners and get a distorted view of the odds, (b) the fact that luck is most frequently the reason for extreme success, and (c) the biological handicap of our inability to understand probability (for the detailed counterintuitive properties of performance records and historical time series using a Monte Carlo simulation see Taleb, 2004, chap. 9, p. 149). The concept presented is well known for some of its variations under the names survivorship bias, data mining, data snooping, over-fitting, regression to the mean, etc., basically situations where the performance is exaggerated by the observer, owing to a misperception of the importance of randomness. Clearly, this concept has rather unsettling implications. It extends to more general situations where randomness may play a share, such as the choice of a medical treatment or the interpretation of coincidental events.
Conclusions: Risk and Decision-Making in Projects and Programs
We can now ask ourselves: “How does this lead to us better understanding risk and decision-making in projects and programs?”
The first thing is to avoid being gullible like our turkeys and beware of stats presented to us by the “experts.” Taleb (2008) stated that there are two distinct types of decisions, and, as discussed earlier, two distinct classes of probability domains.
▪ The first type of decision is a simple “binary” (true or false) format. Someone is either pregnant or not pregnant. A statement is “true” or “false” with some confidence interval.
▪ The second type of decision is more complex. You do not just care for the frequency, but for the impact as well. For example, when you invest you do not care about how many times you win or lose money, but about the amount won or lost.
The two classes of probability domains discussed earlier are as follows:
▪ Class 1, where variables have a finite range of values
▪ Class 2, where the variable being studied has no known range of results.
Taleb (2008) then uses these distinctions in a four quadrant matrix destined to understand the limits of the usefulness of probability:
- First Quadrant: Simple binary decisions in class 1 probability: Statistics does wonders.
These situations are, unfortunately, more common in academia, laboratories, and games than real life. These are the situations in casinos, games, dice, and we tend to study them because we are successful in modeling them.
- Second Quadrant: Simple decisions, in class 2 probability: some well-known problem studied in the literature.
- Third Quadrant: Complex decisions in class 1: Statistical methods work surprisingly well.
- Fourth Quadrant: Complex decisions in class 2: Welcome to the Black Swan domain, the scope of program management. Here you cannot base your decisions on statistically based claims. Alternatively, try to move your exposure type to make it third-quadrant style.
This is an area that has been tackled by Thiry (2004a, 200b) in many writings on program management, who has described the uncertainty/ambiguity relationship in the program management chapter of the Wiley Guide to Managing Projects. Thiry says that programs are often ongoing or long-term and are subjected to both uncertainty and ambiguity. Strategic decision-making and change situations often involve multiple stakeholders, with conflicting needs and expectations that are competing with each other, this creates ambiguity. The situation requires a strategic decision management paradigm with a systems view and an ambiguity reduction approach. Like Taleb (2008), Thiry recommends to resolve ambiguity first, then uncertainty. Whereas projects are essentially planned strategies, which often can be dealt with traditional decision-making and risk methods, programs combine both planned strategies and “emergent” strategies and therefore cannot rely on statistical and historical data, but should rely more on intuitive and experiential decision-making, which in a lot of recent research has been associated to the way managers make decisions (Thiry, 2004a, 2004b).
Effective strategic decision-making requires an “ambiguity reduction” process, based on a learning paradigm, to take place before any attempt is made at uncertainty reduction; otherwise, it will lead to results that are not necessarily in line with stakeholders’ needs. This process can effectively be supported by value management (VM), which uses a range of “soft” methodologies and techniques like sensemaking, stakeholder analysis, functional analysis, ideation, soft systems analysis, and others.
Like Taleb, Thiry (2004a, 2004b) suggested a four quadrant map with uncertainty and ambiguity as variables. I have superimposed both maps in Exhibit 1 as a guide to the use of quantitative normative paradigm for risk management and decision-making in both project and program environments.
Exhibit 1: Combined Ambiguity-Uncertainty and Probability-Decision Matrix
Baron, J. (1998). Thinking and deciding (3rd ed.). Cambridge, UK: Cambridge University Press.
Bernstein, P. L. (1998). Against the Gods: The remarkable story of risk. New York: Wiley & Sons.
Cox, D. R. (2006). Principles of statistical inference. Cambridge, UK: Cambridge University Press.
David, F. N. (1962). Games, Gods and gambling: The origins and history of probability and statistical ideas from the earliest times to the Newtonian era. New York: Hafner Pub. Company.
Hacking, I. (1975). The emergence of probability: A philosophical study of early ideas about probability, induction and statistical inference. London: Cambridge University Press.
Hacking, I. (1983). 19th-century Cracks in the Concept of Determinism. Journal of the History of Ideas, 44(3), 455-75.
Hassett, M. J., & Stewart, D. (1999). Probability for risk management. Winsted, CT: Actex Publications.
Heyde, C. C. (2001). John Graunt, Statisticians of the Centuries (ed. C. C. Heyde and E. Seneta), pp. 14-16. New York: Springer.
Kahneman, D., & Tversky, A. (1984). Choices, values, and frames. American Psychologist, 39, 341-350.
Robinson, G. K. (1991). That BLUP is a good thing: The estimation of random effects. Statistical Science, 6(1), 15–32.
Seife, C. (2007). Zero: The biography of a dangerous idea. London: Souvenir Press.
Spanos, A., & Hendry, D. (1986). Statistical foundations of economic modelling (p. 33). Cambridge, UK: Cambridge University Press
Stigler, S. M. (1986). The history of statistics: The measurement of uncertainty before 1900. Cambridge, MA: Belknap Press of Harvard University Press.
Taleb, N. N. (2004). Fooled by randomness. London: Penguin Books.
Taleb, N. N. (2007). The black swan: The impact of the highly improbable. London: Penguwin Books.
Taleb, N. N. (2008). The fourth quadrant: A map of the limits of statistics, edge the third culture. Retrieved December 11, 2008 from http://www.edge.org/3rd_culture/taleb08/taleb08_index.html
Thiry, M. (2004a). Program management: A strategic decision management process. In P. W. G. Morris & J. K. Pinto (Eds.), The Wiley guide to project management (chap. 12, p. 257). New York: John Wiley and Sons.
Thiry, M. (2004b). Value management. In P. W. G. Morris & J. K. Pinto (Eds.), The Wiley guide to project management (chap. 36, p. 876). New York: John Wiley and Sons.
Tijms, H. (2007). Understanding probability: Chance rules in everyday life (p. 17). Cambridge, UK: Cambridge University Press.
© 2009, Manon Deguire & Valense Ltd.
Originally published as a part of 2009 PMI Global Congress Proceedings – Kuala Lumpur, Malaysia