REPRESENTING JUDGMENTS about uncertainty is key to using stochastic (probabilistic) project models. Usually, the most qualified people available are asked to provide their opinions about values that go into the model. Where a significant parameter is not known with confidence, a probability distribution can completely express an expert's judgment about uncertainty.
Probability is the language of uncertainty. Here we examine the most popular distributions and situations where and why they apply.
Probability Distributions. Probability refers to a number, 0–1, representing chance of occurrence. Most people are comfortable with this everyday idea. A phrase such as “a 40 percent (.4) chance of delivery delay” seldom presents confusion so long as we are clear about what “delivery delay” means.
Often a risk is a binary event. It will either be true or false. A contingency event either happens or not.
Probability distributions represent the range of possible values and the probabilities of these values within this range. The binary event represents the simplest possible probability distribution. We'll soon look at the distribution shapes that are most popular in project management.
John Schuyler, PMP, of Decision Precision in Aurora, Colo., provides training and assistance in economic decision analysis and in project risk management. Questions about this article should be directed to [email protected]. Comments on this series should be directed to [email protected].
Exhibit 1. A collection of independent events, each with the same probability of success, exhibit a binomial distribution.
Exhibit 2. Sometimes events happen randomly in a time or other interval. A Poisson distribution, with the average (mean) number of events as the parameter, describes these events.
Exhibit 3. The simplest continuous distribution is the uniform.
Exhibit 4. Just three values fully describe the triangle distribution: minimum, most likely (mode), and maximum.
Discrete Distributions. Does the risk event have two or perhaps several distinct possible outcomes? If so, we're talking about a discrete probability distribution. The potential outcomes are frequently integers, such as “number of work interruptions.” The parameter name often begins with “number of.” Examples include number of equipment breakdowns, number of people available, number of units needed, and so forth. Each possible outcome is assigned a probability, and these probabilities must sum to 1.
Simple systems often produce discrete distributions. Exhibits 1 and 2 show the most popular discrete distributions, after the binary risk event. Though I don't present the distribution formulas here, for space, they are simple enough. Software tools, even some hand calculators, are widely available with functions for the distributions described in this article.
A collection of independent events, each with the same probability of success, would exhibit a binomial distribution, as shown in Exhibit 1. For example, suppose we have 10 research projects, and each is judged as having a 0.2 chance of success. Though elements in actual systems may not be fully independent or have the same probability of success, the binomial distribution is often a reasonable approximation for a simple portfolio model.
Sometimes events happen randomly, such as defects and work interruptions. A Poisson distribution describes the number of such events in a time or other interval, as in Exhibit 2. The only parameter needed to specify this distribution is the average (mean) number of events during the interval.
Continuous Events. In project management and other types of planning and analysis, uncertainties will most often be continuous. Cost, time, and quality metrics are prime examples. Exhibits 3 through 8 show the most popular continuous distribution types.
Normally a scale is omitted from the y-axis. It is possible to show a scale, though the units are of little use. You should remember that the y-axis is scaled such that the total area under the curve—representing probability—equals 1. The probability of an x-axis value is proportional to the height of the curve.
The simplest continuous distribution is the uniform distribution. If we know the upper and lower bounds of the range of possible values and every value within that range is equally probable, then we are describing a uniform distribution, as shown in Exhibit 3. Many people incorrectly assign this distribution shape to uncertainties. However, it is a rare system that generates values that are uniformly distributed. More often, values in a central part of the range are more probable. Examples of uniform distributions include position of a rotating shaft that stops randomly, position at which a cable breaks, and a small section of a broad distribution.
The triangle distribution in Exhibit 4 is very popular, though I've never seen a system in business or nature generate values with a triangular shape. Simplicity is the appeal. Just three values fully describe the triangle distribution: minimum, most likely (mode), and maximum.
The normal distribution shown in Exhibit 5 is the most commonly observed distribution. Galileo found the familiar bell-shaped distribution when measuring the positions of stars and planets. When many independent, continuous chance events are summed together, the total is normally distributed. Simple project models often assume that activities are independent. Then if the activities have similar duration, the distribution for time to complete a chain of activities is approximately normal. Amazingly, this is true regardless of the shape of the component distributions. In statistics, this key behavior is described by the central limit theorem.
We often observe data where the frequency distribution is positively skewed, that is, having a longer tail to the right, as in Exhibit 6. When two or more distributions are multiplied, such as time x cost, the product is typically positively skewed. As more components are multiplied together the product becomes more lognormal in shape (also a result of the central limit theorem). When judging uncertainties that are positively skewed, the lognormal distribution is often used.
Exhibit 5. This is the most commonly observed distribution. When many independent, continuous chance events are summed together, the total is normally distributed.
Exhibit 6. We often observe data where the frequency distribution is positively skewed, that is, having a longer tail to the right.
Exhibit 7. The exponential distribution is most commonly used to represent time between arrivals of random events.
Exhibit 8. This mathematically simple distribution can assume a variety of shapes depending upon two shape parameters.
Both the normal and lognormal distributions are completely described by a mean and standard deviation. However, some tools provide alternate statistics to specify the distribution, such as the 10 percent and 90 percent confidence points.
The exponential distribution, shown in Exhibit 7, is a counterpart to the discrete Poisson distribution. The exponential distribution is most commonly used to represent time between arrivals of random events. It is widely used in discrete event simulation of queuing systems. A common use is representing the time until the next arrival of a service request.
The beta distribution, shown in Exhibit 8, is a mathematically simple distribution that can assume a variety of shapes depending upon two shape parameters. The distribution is repositioned on the x-axis and resized as needed. Beta distributions have been used since the 1950s in representing time-to-complete activities in PERT project models.
Which Distribution is Best? If you are the expert, the best distribution is the one that completely expresses your belief about the uncertainty. Often we have data that suggest that a system is producing a common distribution shape. Mathematicians have described hundreds of distributions. Fortunately, the ones listed above usually suffice. Some software tools allow the user to draw custom distribution shapes.
Without data, perhaps the best way to obtain a distribution is to model the subsystem that causes the uncertainty. If the expert can explain how the process works, we can model it and use the model to generate a distribution shape.
Usually our best available experts provide judgments that go into the project models. Delegating responsibility for these inputs is a responsibility of the decision-maker.
THE NEXT ARTICLE in this series describes a project risk management process with increased emphasis on using practices from decision analysis. ■
Reader Service Number 059