BACKGROUND
Decision analysis, sometimes called risk analysis, is the discipline for helping decision makers choose wisely under conditions of uncertainty. The techniques are applicable to all types of project decisions and valuations. Committing to fund a project does not end the decision making, for decisions continue to be made throughout project execution. The quality ‘of these decisions has important impacts on cost, timing, and performance.
This tutorial series describes the approach and principal techniques of decision analysis. These methods explicitly recognize uncertainties in project forecasts. This analysis technology, on the leading edge in the 1970s and earlier, is headed toward becoming mainstream practice. The methodology is proven, accessible, and-I hope you'll agree-easily understood.
Figure 1. Frequency Histogram of 50 Independent Project Cost Estimates
The expected value concept is fully established and is credited to 18th century mathematician Daniel Bernoulli. This is the cornerstone of decision analysis and modern evaluation practice.
Decision analysis provides the only logical, consistent way to incorporate judgments about risks and uncertainties into an analysis. Thus, when uncertainties are significant, these techniques are the best route toward credible project decisions,
GOALS OF CREDIBLE ANALYSIS
Your job may involve estimating project or activity costs. How would you evaluate the quality of your estimates? Most forecast users will recognize two principal desirable characteristics:
- Objectivity: Lack of bias. On average, over a number of projects, estimates proving neither too high nor too low.
- Precision: Reasonable closeness of a set of values, minimizing random “noise” in the estimates.
Forecast accuracy is a composite of low bias and high precision. Objectivity tells us about the estimate quality regarding where the values are located on a scale, and precision tells us about how values are dispersed about their central location. In competitive situations, precision is certainly desirable. However, for most internal purposes, objectivity is more important. Understanding the expected value concept is essential to understanding objectivity. Before introducing expected value, it is essential that you understand probability distributions.
FREQUENCY AND PROBABILITY DISTRIBUTIONS
Suppose you and 49 other professionals estimate total cost for a project. Your estimates range from $26.7 million to $76.0 million. The 50 estimates can be displayed as a frequency histogram such as Figure 1.
Figure 2. Frequency Histogram of 500 Independent Project Cost Estimates
A frequency histogram is constructed by dividing the value range into intervals, and then counting the number of occurrences within each interval. Vertical bars are drawn with heights proportional to the number (frequency) of occurrences within each interval.
More data points and smaller value partitions provide the additional detail of Figure 2. A larger number of observations leads to a more accurate estimate of the mean.
If we could obtain a great many data points and were to use smaller partitions, this frequency distribution becomes a smooth and continuous curve. This is called a probability (density) distribution when the curve represents the population of all possible events. Figure 3 shows the probability distribution for this project's cost estimates. The y-axis, whose units are not important, shows the relative likelihood of estimate values along the x-axis. This curve represents our best judgment about how additional cost estimates would be distributed.
There are two especially important statistics annotated on Figure 3:
- The most likely value is about $40 million. Statisticians call this the mode. More values are near this point than any other point.
- The expected value, $45 million, is the probability-weighted average. Statisticians call this the mean. This would be the average value sampled from a population of project cost estimates represented by this probability distribution.
Figure 3. Probability Distribution of Cost Estimates This is the frequency distribution as the number of sample values becomes very large. The y-axis scale is chosen so that the area under the curve equals 1 (unitless).
Figure 4. Cumulative Probability Distribution of Project Cost Estimates The y-axis is the probability that a sample value is less than the x-axis amount.
The “expected” term is misleading to some people. The term is because the mathematical concept is called expectation. It is very unlikely that the precise expected value will be realized.
Sometimes applicable historical or test data are available. More often, decisions must be made with sparse and incomplete data. Most assessments are made with judgments which are at least partially subjective. Key uncertain variables should enter the project model in the form of probability distributions. These distributions are elicited from professionals and represent their judgments about uncertainties. A probability distribution, represented either graphically or mathematically, succinctly expresses everything about how a variable is distributed.
The probability density curve, Figure 3, can easily be converted into a cumulative probability distribution curve, Figure 4. To make the translation, simply sum the area under the curve, moving from left to right; this is equivalent to integrating the probability density function.
The cumulative probability curve is exactly equivalent to the probability density curve in its information content. The cumulative format is useful because you can directly read confidence levels and intervals. Examples from Figure 4:
- There is an equal likelihood of a randomly-selected estimate being either above or below $43.5 million. This centermost value is called the median.
- About 74 percent of the project cost estimates lie below $50 million. The 74 percent confidence limit or confidence level is $50 million.
- About 80 percent of the project cost estimates lie between $33 million and $57 million. This $33 million to $57 million range is an 80 percent confidence interval.
Sometimes it is desirable to flip the cumulative curve vertically. The curve form is then the “exceedance” or “greater than” form of cumulative probability distribution.
Table 1. Example Evaluation Record of Estimate and Actual Costs
| Project No |
$MM Estimate |
$MM Actual |
$MM Error |
Actual/ Estimate |
Cumulative Error |
Average Estimate |
Average Actual |
Average Error |
| 1 | 4.12 | 3.56 | -0.56 | 0.864 | -0.56 | 4.12 | 3.56 | -.561 |
| 2 | 3.27 | 3.12 | -0.15 | 0.953 | -0.71 | 3.70 | 3.34 | -.357 |
| 3 | 3.54 | 4.30 | 0.76 | 1.215 | 0.05 | 3.64 | 3.66 | 0.016 |
| 4 | 4.91 | 2.86 | -2.05 | 0.583 | -2.00 | 3.96 | 3.46 | -.500 |
| 5 | 33.15 | 14.40 | -18.75 | 0.434 | -20.75 | 9.80 | 5.65 | -4.150 |
| 6 | 3.39 | 2.42 | -0.97 | 0.714 | -21.72 | 8.73 | 5.11 | -3.620 |
| 7 | 10.63 | 13.58 | 2.95 | 1.278 | -18.76 | 9.00 | 6.32 | -2.680 |
| 8 | 24.89 | 50.90 | 26.01 | 2.045 | 7.25 | 10.99 | 11.89 | 0.906 |
| 9 | 1.12 | 0.98 | -0.14 | 0.873 | 7.11 | 9.89 | 10.68 | 0.790 |
| 10 | 9.74 | 14.76 | 6.02 | 1.515 | 12.12 | 9.88 | 11.09 | 1.212 |
| 20 | 33.88 | 12.82 | 14.51 | 1.694 | ||||
| 30 | 73.11 | 13.69 | 16.13 | 2.437 | ||||
| 40 | 88.47 | 12.97 | 15.19 | 2.212 | ||||
| 50 | 76.74 | 12.27 | 13.81 | 1.535 | ||||
| 60 | 59.32 | 12.31 | 13.30 | 0.989 | ||||
| 70 | 25.12 | 12.26 | 12.62 | 0.359 | ||||
| 80 | 6.36 | 15.68 | 16.76 | 0.079 | ||||
| 90 | 15.81 | 14.56 | 14.74 | 0.176 | ||||
| 100 | 9.49 | 13.78 | 13.87 | 0.095 |
Probability Distribution
Figure 5. Distribution of Project Actual Costs
Figure 6. Frequency Distribution of the Ratio Actual/Estimate
EVALUATION ACCURACY
One key to improving project estimates is performance feedback. It is important to compare your forecasts to what actually occurs. Your forecasts are objective if:
- The average error approaches zero over many projects, and
- The above relationship holds for any data subset, e.g., stratified by project size or type.
Suppose the population of all 100 projects (or activities) for which your firm has prepared estimates and actually performed are represented in Table 1. Values for the first ten projects are shown in detail. Thereafter, only the cumulative and averages are shown for selected projects. The actual costs (column 3) are shown as a probability distribution in Figure 5. Clearly, most of this firm's projects are in the $25 million to $80 million range. The accuracy of estimating can be represented by an “Accuracy Ratio” as
These are shown in Table 1 in column 5 and plotted as a frequency distribution in Figure 6. This same data is shown as a cumulative frequency distribution in Figure 7. Reading from the cumulative curve, one finds as examples:
- There is a 60 percent confidence that the actual value will be within the range .75 to 1.2 times your estimate. The width of this range gauges our evaluation precision.
- There is a 56 percent chance that the actual value will be lower than your estimate.
Figure 7. Cumulative Frequency Distribution of the Actual/Estimate Ratio
This ratio, plotted as a time series as shown in Figure 8, is useful in monitoring and improving project estimates. Since the actual/estimate ratio is nearly 1, the estimates appear to be unbiased. With perfectly objective analysis, the ratio will approach 1 as the number of evaluations approaches infinity. This behavior is shown in Figure 8. This characteristic can be used to define evaluation objectivity. Equivalently, the average error approaches zero with perfect objectivity, i.e., lack of bias.
Even with objective evaluations, Cumulative Error, column 6 in Table 1, will usually diverge in either direction. This is contrary to what many people expect and is caused by chance deviations. However, with objective analysis the error is diluted as the number of observations (projects) in the sample increases, shown by decreasing Average Error in column 9.
BEST EVALUATION ESTIMATOR
Objectivity is the foremost goal of good evaluation practice. This means we want to make forecasts without biases. These forecasts will have a long-run average error of zero. As it happens, only one very specific forecast statistic has this characteristic of objectivity. This measure is the expected value, or mean.
Figure 8. Average Evaluation Error If evaluations are objective, the average evaluation trends toward zero over the long-run.
Suppose you are estimating material cost. In your judgment, the outcome could range from $2.2 million to $2.7 million, with a most likely value of $2.3 million. You feel the probability density can be interpolated linearly between these points. Your assessment can be represented completely and uniquely by the triangle distribution of Figure 9.
Ideally, it is this curve, or its cumulative equivalent, which should be conveyed to express the results of your evaluation. The curve may be obtained from purely subjective judgment, comparable historical data, modeling, mathematical formulation, or other means. Regardless of how it comes about, the probability distribution fully discloses your complete judgment about the project cost variable.
Figure 9. Triangle Probability Distribution Expressing a Judgment About Material Cost Uncertainty
But, what if your client insists on a single value estimate? What number should you provide? The best point estimate is the expected value, because it is objective. You would provide a $2.4 million cost estimate in this case, because you are confident that:
- Performing many similar projects would result in an average material cost of approximately $2.4 million.
- Over the long run, your average estimate error will approach zero.
Note that using the most likely or median points would result in a systematic bias toward understating material costs.
CALCULATING EXPECTED VALUE
There are several ways to calculate the expected value of a variable's probability distribution. If your distribution is expressed as a mathematical formula, you might be able to solve the integral equation
where x is the value of concern and p(x) is the probability of x. For a discrete distribution, this is simply the probability-weighted average.
Fortunately, we seldom have to attempt mathematical integration. Alternatively, empirical observations can be converted to an expected value by
For input distributions, we can easily obtain expected values by numeric integration or graphical methods. I usually calculate the expected values of input variables and use these for the base case analysis. Note that the base case cost or value should not be used for decision making because it is often substantially different from the expected value result.
For decision making, what we desire is expected value of project cost (or value). Except in simple situations, this requires that the input probability distributions be processed through mathematical or simulation model to generate the resultant outcome distribution. This will be a topic for later discussion.
In the next installment, I will discuss specifically how expected value is used -in project decisions. The expected value decision rule should guide all decisions under uncertainty. There are important implications for the cost contingency practice and for trade-offs between cost, time, and performance.