Measuring and managing performance improvement in application outsourcing contracts



In today's competitive economic climate, IT projects have to prove their worth. In other words, you have to deliver products on time and with maximum value and demonstrate those two qualities with metrics. Are you able to quantitatively manage your outsourcing application services contract and satisfy your customer? Typically, articles and research concentrate on individual projects rather than the entire portfolio of work within a contract. Application portfolios vary in business complexity and technology platforms, causing some to be more difficult and costly to develop or enhance than others. Regardless of the complexity of the portfolio, customers still want measurable performance improvements and cost savings across the board.

Where do you go for ideas to help you propose a measurement of Contract Performance? The objective of this paper is to provide a standardized approach for measuring performance of an Application Services Contract within an Outsourcing Agreement. Customers want help in making delivery performance more predictable and repeatable. Measuring performance provides the necessary information to improve contract performance over the life of the contract, and brings value to both the outsourcer and the service provider.

It is assumed that work within the application services contract is packaged into projects that follow a structured Project Management approach.

This paper presents technical details of a model that encompasses both the customer and the service provider's view of contract performance for application development and enhancement services in an outsourcing agreement (application maintenance/production support are not addressed in this paper).

Building the Model

Building a measure of contract performance is not an easy task. The process of deriving the proposed measure of performance consists of the following steps:

  • Step 1. Determine attributes/components of project performance
  • Step 2. Set rules to establish a baseline of current portfolio performance
  • Step 3. Create a weighting scheme for evaluating project performance based on project drivers:
    • Cost
    • Schedule
    • Quality
  • Step 4. Establish project technical and business complexity
  • Step 5. Define performance scores of project attributes:
    • Technical Productivity
    • Schedule Productivity
    • Reliability
  • Step 6. Define how to measure overall contract performance
  • Step 7. Measure performance improvement over the life of the contract.

Through the above process, Project Performance Scores (sometimes called a Performance Index) will be calculated for each project in the contract. Overall contract performance can then be measured through the aggregation of these project scores and analysis across the portfolio(s).

These process steps will be used as the logical structure of this paper:

Attributes/components of project performance (Step 1)

There are a number of project attributes that impact project and contract performance, and we begin by identifying the attributes included in the model.

The model being presented here is based on the three application project attributes:

  • Cost
  • Timeliness
  • Quality

In the context of this model

  • Cost is measured by project Technical Productivity = Effort/Size
  • Timeliness is measured by Schedule Productivity = Duration/Size
  • Quality is measured by Reliability = (#Defects in the first 90 days in production)/Size.
  • Application (new development projects) or Project (enhancement projects) Size is measured in Function Points.

NOTE: Please notice that all three measures, Technical Productivity, Schedule Productivity and Quality, as defined above, are “the lower the better” kind of measures. This is reverse of intuitive thinking that higher values would be associated with increased productivity or improved performance. A new common scale will be introduced later to address this issue.

Current performance baseline (Step 2)

One needs to measure current performance, or establish a baseline, in order to measure any change in future performance. To accomplish this, select a sample of completed projects representative of the work performed within the contract using past/current work processes.

Recommended rules for selecting a sample of projects to baseline current performance:

  • Select a specific baseline period, for example, one year.
    NOTE: Special care should be taken in choosing a baseline period to ensure good representation of work performed by the service provider. Selection of the baseline period depends on a typical business cycle, software engineering process stability, and mixture of technological solutions used by the projects. It should be agreed upon by both the client and the service provider. For example, six months' worth of completed projects may be enough if the majority of the projects are short.
  • Only completed projects within the baseline period should be included in the sample (regardless of when they started).
  • All projects included in the sample should follow the same data collection process and collect information on effort, duration, size, defects, and project complexity. Project complexity will be defined in Step 4 below.
    NOTE: The capture of any specific work processes or organizational maturity (i.e. CMM Level) driving individual projects work may be useful for additional analysis and generating corrective actions for future performance improvements.

Project Performance Calculation (Step 3)

Project Performance is defined as a weighted sum of the three components. Technical Productivity Score (TP), Schedule Productivity Score (SP), and Product Reliability Score (PR), as follows:

Project Performance = A*TP + B*SP + C*R, where A + B + C = 100

A, B, C are weights representing the relative impact of the corresponding project characteristic to the overall project performance. The exact values of the weights can be tailored to the contract context. The following set of weights is used in the model:

Project Type A B C
General/Improvement Project 33.3 33.3 33.3
Cost Driven Project 50 25 25
Schedule (Time-to- Market) Driven Project 10 70 20
Reliability Critical Project 15 15 70

Project Complexity (Step 4)

Project technical complexity (i.e. project innovation, new technology, application complexity, reuse, project size, etc) has an impact on Technical Productivity. Each project in the sample is assigned a Technical Complexity level of Easy, Average, or Difficult.

Project Management complexity (multiple vendors, customer involvement, team complexity, requirements instability, etc) also has an impact on Schedule Productivity. To account for the complexity of the project environment Business Complexity is also assigned to each project, as Easy, Average, or Difficult.

Similarly, Testing, Data, and Implementation Complexity (multiple installations, data complexity, testing environment, application business criticality, etc) have an impact on Product Reliability. Based on the Testing/Implementation Complexity every project is classified as Easy, Average, or Difficult.

To account for the impact of Technical Complexity, Business Complexity, or Testing/Implementation Complexity on the project performance a multiplier, called a Correction Factor (CF), is introduced as follows:


In other words, this multiplier applies positive 10% performance credit to a project performance component of a difficult project, and negative 10% to a project performance component of an Easy project. NOTE: The exact rules of assigning Project Technical Complexity, Schedule Complexity, and Testing/Implementation Complexity need to be defined and documented within the contract specific context.

As a generic approach, Constructive Cost Model (COCOMO) II model factors can be used to derive Technical, Business, and Testing/Implementation Complexity. When COCOMO II model factors are used to define project complexities, the data collection process must include filling out the COCOMO Survey.

Define Performance Scores of project attributes (Step 5)

Variation or differences in the measurement units (e.g. hours, days, number of defects) of cost, timeliness, and quality prevent any arithmetical operations that involve these three measures. Technical Productivity Score (TP) Schedule Productivity Score (SP), and Product Reliability Score (PR) will be introduced here as defining a single common scale of these components of the project performance. All projects completed during the baseline period are used to derive Performance Scores for Technical Productivity, Schedule Productivity, and Reliability as follows:

Technical Productivity Score (TP)

First, compute TP for each project as hours per Function Point (FP), which equals Effort/FPs. Then, divide the data into quartiles. The first quartile includes the lowest 25% of values representing the projects with the best technical productivity, and will be called the High Productivity Band. Similarly, the fourth quartile represents 25% of projects with the lowest technical productivity, and it will be called the Low Productivity Band. The projects in the second and third quartiles are identified as the Average Productivity Band. We also introduce a Very High Productivity Band and Very Low Productivity Band for possible future projects that will perform outside of the current Technical Productivity Range, projects which are below the minimum value and above the maximum value. The Technical Productivity Score (TP) is assigned according to Table 1 below.

Schedule Productivity Score (SP)

Schedule Productivity for each project is measured as the number of days per Function Point, which equals Duration/FPs. We repeat the process we used for Technical Productivity and derive a Schedule Productivity Score according to Table 1 below.

Quartile Score Band
< Baseline Min 9.5*CF Very High
1st Quartile 7.5*CF High
2nd and 3rd Quartiles 4.5*CF Average
4th Quartile 3*CF Low
> Baseline Max 1.5*CF Very Low

Table 1. Technical (TP) and Schedule Productivity (SP) Score Baseline Table

Product Reliability Score (PR)

Product Reliability for each project is a number of defects (severity 1, 2, 3, 4 defects in the first 90 days post implementation) per Function Point. We repeat the process we used for the two previous characteristics to derive Product Reliability Score (PR). However, there is a small difference in the calculation of the PR, due to the fact that the number of defects may have such disproportionate number of zeros, that first and second quartiles both may be zero. As this is a desired situation, (everybody would like to have zero defects in production all the time), the scoring system reflects it by introducing two possible scores in the Average Band (see table 2 below).

Quartile PR Value Band
> Baseline Min 9.5 *CF Very High
1st Quartile (L) 7.5*CF High
2nd and 3rd Quartiles (A) 4.5*CF, if A >
0 7.5 *CF if A = 0
4th Quartile (H) 3*CF Low
< Baseline Max 1.5*CF Very Low

Table 2. Product Reliability (PR) Score Baseline

Components of Project Performance, TP, SP, PR, for projects outside of the baseline set, calculated using Table 1 and Table 2, using baseline quartiles.

Organizational/Contract Performance Measure (Step 5)

Organizational/Contract performance is an aggregate measure of performance of all projects and it is calculated as follows:

Organization Performance Index = Average (Project Performance),
where average is taken over all projects during the reporting period.

Baseline Organizational Performance Index = Average (Project Performance),
where average is over all projects completed during the baseline period..

Measure of performance improvement over the life of a contract (Step 7)

A comparison of means t-test (equal, unknown variances) is recommended to measure change in performance. If the t-test shows that new performance is not equal to baseline performance then % change is the measure of performance change.

Performance Change = (New – Baseline)/ Baseline

Positive change indicates performance improvement, where negative change may be a sign of the performance degradation. However, change in the complexity of work may be a cause of decreased performance.


The major benefit of the model is that it provides a standardized approach to measuring performance of an Application Services Contract within an Outsourcing Agreement. It allows establishing target project performance or estimating a new project performance by analyzing the performance of similar projects and tracking actual performance against the estimate. In addition, tracking of the change in the overall contract performance, and the change in client requirements (distribution of projects by complexity) helps to facilitate discussion with the client and set performance expectations. For example, if there are more complex projects in the current year compared to the baseline year, then constant performance is already an improvement.

The use of averages in the calculation of performance assumes small technical difference between projects. If this is not the case, for example, if there is a set of legacy applications in the enhancement mode, as well as new web-based applications in the development or enhancement, then it may be useful to establish separate performance measures by project category. In the future, the model can be improved by the introduction of weighting factors based on value to the customer in the calculation of Overall Contract Performance.

Below is an example included to illustrate the use of the model. This example is based on the real case, but actual values are modified and do not represent any actual contract.



Productivity and Reliability Quartiles
Min 4.05 0.35 0
25-percentile 15.12 1.14 0
75-percentile 38.19 3.24 0.04
Max 180.75 36.50 0.05
Pr. ID T_Prod S_Prod Reliab. A B C CR1 CR2 CR3 TP SP PR Pr. Perfm
1 180.75 36.50 0 33 33 33 1.1 1 1.1 3 3 7.5 674.33
2 23.25 3.07 0 50 25 25 1 1 1 4.5 4.5 7.5 1312.5
3 20.41 2.68 0 25 25 50 1 1 1.1 4.5 4.5 7.5 1031.3
4 45.04 3.30 0.05 33 33 33 1 1 1.1 3 3 3 509.49
4 11.14 3.49 0.05 10 70 20 1 1 1 7.5 3 3 832.5
6 24.78 3.35 0 25 25 50 1 1.1 1 4.5 3 7.5 963.75
7 10.70 2.34 0 33 33 33 1 1 1 7.5 4.5 7.5 2272.7
8 13.59 0.98 0 50 25 25 1 0.9 1 7.5 7.5 7.5 3168.8
9 30.85 2.35 0 50 25 25 1 0.9 1.1 4.5 4.5 7.5 1320
10 40.63 0.45 0.05 33 33 33 1 1 1 3 7.5 3 649.35
11 4.05 0.63 0 33 33 33 1 1.1 1 7.5 7.5 7.5 2397.6
12 27.06 1.63 0.04 50 25 25 1 1 1.1 4.5 4.5 4.5 1248.8
13 91.46 1.79 0.02 10 70 20 1 1 0.9 3 4.5 4.5 486
14 19.73 0.35 0.04 50 25 25 1 1 1 4.5 7.5 3 1275
Overall Baseline Performance 1296

Year 2

Pr. ID T_Prod S_Prod Reliab. A B C CR1 CR2 CR3 TP SP PR Pr. Perfm
1.1 51.64 10.43 0 50 25 25 1 0.9 1 3 3 7.5 674.33
1.2 36.17 4.78 0 10 70 20 1 0.9 1.1 4.5 3 7.5 1275
1.3 20.50 2.68 0 50 25 25 1 1 1 4.5 4.5 7.5 1031.3
1.4 45.04 3.30 0.05 33 33 33 1 1.1 1 3 3 3 509.49
1.5 11.11 3.49 0.05 50 25 25 1 1 1.1 7.5 3 3 832.5
1.6 24.78 3.35 0 25 25 50 1 1 0.9 4.5 3 7.5 963.75
1.7 10.65 2.34 0 33 33 33 1 1 1 7.5 4.5 7.5 2272.7
1.8 13.60 0.98 0 10 70 20 1.1 1 1.1 7.5 7.5 7.5 3168.8
1.9 30.85 2.35 0 25 25 50 1 1 1 4.5 4.5 7.5 1320
1.11 40.60 0.45 0.05 33 33 33 1 1 1.1 3 7.5 3 649.35
1.21 4.05 0.63 0 50 25 25 1 1 1.1 7.5 7.5 7.5 2397.6
1.31 26.80 1.63 0.04 50 25 25 1 1 1 4.5 4.5 4.5 1248.8
1.41 91.40 1.79 0.02 33 33 33 1 1.1 1 3 4.5 4.5 486
1.51 8.24 1.29 1.65 33 33 33 1 1 1 7.5 4.5 3 3000
1.61 4.05 0.61 0.00 50 25 25 1.1 1.1 1.1 7.5 7.5 7.5 3187.5
Overall Baseline Performance 1534
Performance Improvement 18.4%

Average project Technical Productivity, Schedule Productivity, and Reliability did not change much, but project complexity is much higher then in the baseline year.


Boehm, B. W. Horowitz, E., Madachy, R., Reifer, D., Clark, B., Steece, B., Brown, A.W., Chulani, S.,

Abts, C. (2000). 2000 Software Cost Estimation With COCOMO II. Prentice Hall.