This paper aims to discuss the use of the artificial neural networks (ANN) to model aspects of the project budget where traditional algorithms and formulas are not available or not easy to apply. Neural networks use a process analogous to the human brain, where a training component takes place with existing data and subsequently a trained neural network becomes an “expert” in the category of information that has been given to analyze. This “expert” can then be used to provide projections, given new situations based on adaptive learning (Stergiou & Siganos, 1996).
The article also presents a fictitious example of the use of neural networks to determine the cost of project management activities based on the complexity, location, budget, duration, and number of relevant stakeholders. The example is based on data from 500 projects and is used to predict the project management cost of a given project.
ARTIFICIAL NEURAL NETWORKS (ANN)
Some categories of problems and challenges faced in the project environment may depend on so many subtle factors that a computer algorithm cannot be created to calculate the results (Kriesel, 2005). Artificial Neural Networks (ANN) are a family of statistical learning models inspired by the way biological nervous systems, such as the brain, process information. They process records one at a time, and “learn” by comparing their classification of the record with the known actual classification of the record.
The errors from the initial classification of the first record are fed back into the network, and used to modify the networks algorithm the second time around and so on for a large number of iterations in a learning process in order to predict reliable results from complicated or imprecise data (Stergiou & Siganos, 1996) (see Exhibit 1).
Some typical applications of ANN include the following:
- Handwriting recognition,
- Stock market prediction,
- Image compression,
- Risk management,
- Sales forecasting, and
- Industrial process control.
The mathematical process behind the calculation uses different neural network configurations to give the best fit to predictions. The most common network types are briefly described below.
Probabilistic Neural Networks (PNN)—These are statistical algorithms where the operations are organized in multilayered feedforward networks with four layers (input, pattern, summation, and output). Training is fast but it has a slow execution and requires large memory. It is also not as general as the feedforward networks (Cheung & Cannons, 2002).
Multilayer Feedforward Networks (MLF)—MLF neural networks are trained with a back-propagation learning algorithm (Exhibit 2). They are the most popular neural networks (Svozil, Kvasnička, & Pospíchal, 1997).
Generalized Regression Neural Networks (GRNN)—Closely related to PNN networks, these are memory-based networks that provide estimates of continuous variables. They represent a one-pass learning algorithm with a highly parallel structure. This algorithmic form can be used for any regression problem in which an assumption of linearity is not justified (Specht, 2002).
ANALOGY PROCESS AND DATA SET
One of the key factors of the neural networks is the data set used on the learning process. If the data set is not reliable, the results from the networks calculations will not be reliable. The use of artificial neural networks can be considered one kind of analogy (Bailer-Jones & Bailer-Jones, 2002).
Analogy is a comparison between two or more elements, typically for the purpose of explanation or clarification (Exhibit 3). One of the most relevant uses of the analogy is to forecast future results based on similar results obtained in similar conditions (Bartha, 2013). The challenge is to understand what a similar condition is. Projects in the past can be a reference for future projects if the underlining conditions where they were developed still exist in the project being subjected to analysis.
One of the most relevant aspects of the analogy is related to the simple process of estimation based on similar events and facts. This process reduces the granularity of all calculations, where the final project costs can be determined by a set of fixed, finite variables.
DATA SET, DEPENDENT AND INDEPENDENT CATEGORIES, AND NUMERIC VARIABLES
The first step to develop an Artificial Neural Network is to prepare the basic data set that will be used as a reference for the “training process” of the neural network. It is important to highlight that usually the right data set is expensive and time consuming to build (Ingrassia & Morlini, 2005). A data set is composed of a set of variables filled with information that will be used as a reference. These references are called cases (Exhibit 4).
The most common variables types are:
- Dependent Category: Dependent or output variable whose possible values are taken from a set of possible categories; for example, yes or no, or red, green, or blue.
- Dependent Numeric: Dependent or output variable whose possible values are numeric.
- Independent Category: Independent variable whose possible values are taken from a set of possible categories; for example, yes or no, or red, green, or blue.
- Independent Numeric: Independent variable whose possible values are numeric.
In the project environment, several variables can be used to calculate the project budget. Some common examples are:
- Complexity: Level of complexity of the project (low, medium, high). Usually, this is an independent category.
- Location: Where the project works will happen. This is associated with the complexity of the works and logistics. Most of the time it is an independent category.
- Budget: Planned budget of the project. It is a numeric variable that can be independent or dependent (output).
- Actual Cost: Actual expenditure of the project. Most of the time, it is an independent numeric variable.
- Cost Variance: The difference between the budget and the actual cost. It is a numeric variable that can be independent or dependent (output).
- Baseline Duration: Duration of the project. This is an independent numeric variable.
- Actual Duration: Actual duration of the project. Usually, it is an independent numeric variable.
- Duration Variance: The difference between the baseline duration and the actual duration.
- Type of Contract: Independent category variable that defines the type of the contract used for the works in the project (e.g., fixed firm price, cost plus, unit price).
- Number of Relevant Stakeholder Groups: Independent numeric variable that reflects the number of relevant stakeholder groups in the project.
Some examples of input variables are presented in Exhibits 5, 6, and 7.
TRAINING ARTIFICIAL NEURAL NETWORKS
When the data set is ready, the network is ready to be trained. Two approaches can be used for the learning process: supervised or adaptive training.
In supervised training, both inputs and outputs are provided and the network compares the results with the provided output. This allows the monitoring of how well an artificial neural network is converging on the ability to predict the right answer.
For adaptive training, only the inputs are provided. Using self-organization mechanisms, the neural networks benefit from continuous learning in order to face new situations and environments. This kind of network is usually called a self-organizing map (SOM) and was developed by Teuvo Kohonen (2014).
One of the biggest challenges of the training method is to decide which network to use and the runtime process in the computer. Some networks can be trained in seconds but in some complex cases with several variables and cases, hours can be needed just for the training process.
The results of the training process are complex formulas that relate the input or independent variables with the outputs (dependable variables), like the graph presented in Exhibit 2.
Most commercial software packages usually test the results of the training with some data points to evaluate the quality of the training. Around 10–20% of the sample is used for testing purposes (Exhibit 8).
PREDICTION RESULTS
After the training, the model is ready to predict future results. The most relevant information that should be a focus of investigation is the contribution of each individual variable to the predicted results (Exhibit 9) and the reliability of the model (Exhibit 10).
It is important to highlight that one trained network that fails to get a reliable result in 30% of the cases is much more unreliable than another one that fails in only 1% of the cases.
EXAMPLE OF COST MODELING USING ARTIFICIAL NEURAL NETWORKS
In order to exemplify the process, a fictitious example was developed to predict the project management costs on historical data provided by 500 cases1. The variables used are described in Exhibit 11.
The profiles of the cases used for the training are presented in Exhibits 12, 13, 14, 15, and 16, and the full data set is presented in the Appendix.
The training and tests were executed using the software Palisade Neural Tools. The test was executed in 20% of the sample and a GRNN Numeric Predictor. The summary of the training of the ANN is presented in Exhibit 17.
The training and tests were used to predict the project management cost of a fictitious project with the variables as shown in Exhibit 18.
After running the simulation, the project management cost predictions based on the patterns in the known data are US$24,344.75, approximately 3% of the project budget.
Another aspect of the analysis of the results is to provide insights about how each independent variable affects the output (Exhibit 19). In the case of the example, more than 50% of the project management cost is related to the project budget.
CONCLUSIONS
The use of artificial neural networks can be a helpful tool to determine aspects of the project budget such as the cost of project management, the estimated bid value of a supplier, or the insurance cost of equipment. Neural networks allow for a precise decision-making process without an algorithm or formula-based process.
With the recent development of software tools, the calculation process becomes very simple and straightforward. However, the biggest challenge in producing reliable results lies in the quality of the known information. The whole process is based on actual results, and most of the time the most expensive and laborious part of the process is related to getting enough reliable data to train and test the process.