Measuring quality in 3-D
Quality is usually one of our triple constraints, but for software and Information Technology (IT) services, it is perhaps the least well understood. Much of A Guide to the Project Management Body of Knowledge (PMBOK® Guide) deals with schedule, budget and costs – we have Earned Value and scheduling tools, but how do you measure software quality? What are the units? How do you make project management decisions about schedule and cost that factor in measures of quality?
While there is plenty written on software metrics, I still see PMs struggling to find an effective measure of software quality. When they do measure quality, they either ignore quality measures when making schedule and budget decisions, or give quality priority over everything else – both are inappropriate. The PM needs a way to integrate quality metrics with information on costs and schedule, and manage all three.
You can't manage what you can't measure – this article explains why traditional methods of measuring software quality can fail to give the project manager and other stakeholders useful results - it recommends a three dimensional approach to quality metrics that the author has used with excellent results.
The technique will be useful to stakeholders who need a way to specify and evaluate the quality of products and services, and especially useful to the project manager who has to integrate these requirements and deliver acceptable quality with acceptable risk.
Gold Plating applies to quality as well as features – as with features, the project manager's job is to deliver quality that meets, not exceeds requirements, and to maintain an appropriate balance between schedule, budget and quality. This article will show you how to specify, and then manage software and IT service quality requirements.
Classic Problems with Quality Metrics
Here are some classic problems with Quality Metrics that the 3-D technique overcomes. I am sure you have encountered at least one of these on software projects:
- We want to measure software quality, but no one can agree on what that means.
- There's never enough time to do it right, but always time to do it over – we do the best we can on development and testing, but unless there are show stoppers, the release date is determined by the schedule.
- Our organization is committed to zero defects but we always end up with known bugs in our software. How can we release a product with bugs if we are committed to zero defects?
- Low-level bugs are accumulating over time, but we do not have time to fix them because of higher priority showstoppers. Is there a way we can reduce some of these low-level bugs? Your stakeholders say, “Yes, you fixed the show stoppers, but the low-level bugs are killing us!”
- In spite of dramatically reduced total defects, our customer says quality is still a problem. “I don't care what your numbers show, quality in the areas that really matter has not improved!”
History of this 3-D Approach
Evolution from 1-D and 2-D
I am surprised by how many projects just count the total number of bugs as a measure of quality – this is a 1-D approach. Some quality management systems use sophisticated statistical mathematics, like standard deviation, but are based on a simplistic counting of bugs per line of code, totally ignoring severity.
Using both number and severity of problems is better, but also falls short – you can have an overall reduction in problems but still have an unacceptable concentration of problems in a particular area of the system – stakeholders will have a very low tolerance for errors in some areas and a very high tolerance in others. Even when the number and severity of problems is within the 2-D targets, an unacceptable concentration of problems at any level of severity, in any one area, could render the system or services unacceptable.
There is a limit, at all levels of severity, as to the number of problems the users can tolerate – there is a cumulative effect. Exceeding that number, in any area, is often perceived by the stakeholders as a failure to meet quality objectives. See the section on Sample Quality Targets (page 4), for examples of ‘area.’
Experience with 3-D
I first used the technique described here on a very large computer project in 1975, as a way of measuring the contribution of my system test group. We invented the 3-D quality measure primarily as a tool for setting our goals and demonstrating the value we added. It proved an effective tool for managing the quality of a series of upgrade projects from 1975 to 1979. Working on those successive projects gave me the opportunity to refine the tools and understand why they worked. It was not until 1986, when I started consulting, that I realized how powerful and unique these tools were.
In 1986, after holding several senior IT management positions, I started consulting - my first project was to conduct a review of a large government project – quality had fallen to unacceptable levels and the stakeholders were frustrated in their efforts to affect the quality of the software.
I recognized that the root of the problem was the lack of effective quality metrics and proceeded to implement the 3-D technique. As you might expect, in order to implement the technique, I had to also address deficiencies in requirements specifications, configuration management and testing – measuring a problem does not, of itself, fix the problem, but at least it lets you manage it! Fortunately, we were able to address all of these areas in parallel – the result was a significant improvement in the stakeholder's ability to specify and affect the quality of the system, and a measureable improvement in quality.
Definition of Quality
I (and others) define quality as the degree to which a product conforms to its requirements. Another way of looking at quality is the degree to which the product deviates from its requirements, an inverse measure of quality. The classic IT vehicle for reporting deviations from requirements is the Problem Report. The technique described in this paper focuses on the Problem Report as way of communicating inverse measures of quality.
This technique requires two measures for each Problem Report: the problem's severity and the affected area. I give some examples of what I mean by ‘area’ in Sample Quality Targets (page 4). Note well, that the urgency for fixing a problem is not the same as severity – a fix might well be required ASAP in order to get it into the next release or to facilitate further testing, but the actual problem severity might be low, compared to other problems.
Now we have two of the dimensions, the severity and the affected area. The third, and most critical dimension is the number of problems, at each level of severity, for each affected area - that describes the minimum acceptable quality.
If you exceed the minimum acceptable quality, at any level of severity, in any area of the system, it will constitute unacceptable quality. Now we have the three dimensions, a 3-D quality measure!
A one-dimensional measure would be just the total number of problems. Two-dimensional measures give you the total number of problems at each level of severity – you set limits for the number of problems at each level of severity. The 3-D technique adds the dimension of ‘affected area’ – it sets separate limits for each area, for each level of severity.
Goals vs. objectives
Zero defects and zero downtime are goals, but are not necessarily objectives for each project. There may be important stakeholder considerations, which require consideration of cost and timeliness along with correctness - quality is just one of the triple constraints.
This technique of measuring quality and setting goals allows each project to make the tradeoffs between quality and getting the product out on time and within budget.
The Quality Review Board (QRB) should set objectively measurable targets in terms of number and severity of defects in each area. The QRB should refine the process by comparing the total number of defects actually encountered in production to the project's targets. In a project with multiple releases, the QRB should recommend appropriate action throughout the software development life cycle, to bring the product close to the targets. In a project with a single release, the QRB should look at the predicted and actual quality of previous projects to set appropriate quality targets.
The QRB can, for example, specify entry and exit criteria for testing phases in terms of the maximum number of problems known and predicted, in each area, and at each level of severity. If there are too many problems, known or predicted, the later stages of testing are likely to go very slowly, causing schedule and budget overruns. The QRB should recommend fixing problems that exceed thresholds, before proceeding with testing. You do not need to fix all problems, indeed, that would be another form of gold plating – you just need to fix enough to stay below thresholds – the QRB should recommend the problems to be fixed and assign them to specific releases. This process works best when there are fixed intervals for testing and the systems under test are stable, not undergoing change during the test interval.
One measure of the value added by testing is the accuracy of the prediction of how the product will perform in the next stage of testing; eventually, the next stage is User Acceptance, and finally performance in production.
You should apply the same kind of entry and exit criteria that you apply between stages of testing before you release the product to the users. Unless the testing is a perfect representation of the user community, the users will find some problems that testing did not reveal – through experience, gauging the effectiveness of testing, the QRB will learn to adjust thresholds to allow for the discovery of some number of bugs by users – but this should be a small adjustment, and testing should improve by accounting for why those bugs ‘slipped the net.’ Conversly, and working to your advantage, some problems revealed through testing will not ever be encountered by actual users.
Sample Charter for the QRB
The Quality Review Board is comprised of representatives of the stakeholders. The QRB should include developers, users, operations, sponsors, testing, and project control. In order to keep the membership down to a manageable level, the QRB should be limited to 15 people.
For each area, the board will review every problem report. They will agree on a level of severity relative to other problem reports for that area. This should take less than 5 minutes per problem report. In practice, I have found it averages less than 3 minutes, providing the QRB stays focused on assessing the severity and does not try to solve the problem.
Initially, the QRB has to accumulate a list of real examples for each level of severity, in each area. Some projects can establish this consensus by reviewing past projects; unique projects have to establish this initially at a more general level, but can quickly add examples as they classify emerging problems.
The QRB must be an advisory rather than an executive body. They represent the consensus of stakeholders but they do not control release dates nor whether or not fixes are included in a release. In my experience, upper management and project managers usually take the advice of the QRB and work with the QRB through its members. If a particular stakeholder needs representation on the QRB it can usually be arranged without going over a limit of 15 people. Sometimes members represent several stakeholders.
The QRB can also affect the number and severity of problems that developers put into a release by limiting the amount of change that they approve for the release. They do this based on their experience with the product or similar products. Initial builds of a product should be limited, starting with a small kernel and adding complexity. Having the resources to make a change does not necessarily mean you should make the change – you have to be able to make the change with a reasonable likelihood of acceptable quality.
Based on whether or not the amount of change required after each test interval is within the “budget for change”, the QRB predicts the risks to the schedule and quality. If the PM maintains cost and schedule, the quality might be at risk. If the PM takes the time and incurs the expense to fix and re-test, he may risk the budget and schedule – I have seen many projects accept change requests based solely on whether or not they have the resources to make the change, without regard for the effect on quality.
Sample Quality Targets
Here is an example of a 3-D Quality Target that states the maximum number of problems the QRB will target at the time of release, and the number they think the stakeholders can tolerate after 30 days.
|Maximum allowable problems at Release and After 30 days|
The Suites, A, B, C, and D, might correspond to areas like User Interface, Reports, Database and Business Rules. In practice, I use about a dozen areas for a software system. While the QRB will develop the ability to work with a table of numbers, I have found that a 3-D graphic representation of the table is an excellent management presentation tool.
Note that the number of tolerable problems at each level of severity differs between the suites. Each project needs to develop a Quality Target based on their unique requirements. The appropriate selection of areas is critical to making this work.
The number of problems known at release time includes all problems that the user reported from previous releases, plus those discovered only by testing. The number of problems after 30 days is comprised just of problems actually encountered – this is because users might never encounter some of the problems found in testing.
Not All Bugs are Created Equal
Risk Management is an ongoing concern for the PM and stakeholders. Bug fixes always carry risks of making the situation worse, costing more than estimated, and taking longer than planned – if they're on the critical path, they can delay the project. The Quality Targets can help you manage those risks.
The QRB, as your project gets close to the release date, should pass judgement on all bug fixes, recommending fixes in areas that have too many bugs at a particular level of severity, and specifically recommending against changes where the system is within the tolerance for error. Note that this is not at all equivalent to just fixing show stoppers. Note also that the QRB is recommending changes – it is up to management to make the final decision. In practice, the QRB reports its recommendations and management rarely overrides them.
Neither the PM, nor the QRB should leave it up to the developers alone, to decide which bugs to fix. Any bug fix carries the risk of introducing a more serious bug or taking more time and resources than estimated. The QRB and PM have to consider the total cost of fixing and testing in terms of dollars and days. The QRB should opt to fix the bugs with the least risk and impact to schedule and budget – the developers and testers certainly have input into that analysis.
Generally, the more code that changes, the more risky the change – given that two bugs in the same area have the same severity, fix the ones that carry less risk. My colleague, Robert Clinton, who has used the Quality Matrix on several projects, provides this appropriate aphorism:
There are no large changes, no small changes, only changes – even the “smallest” change has the potential for catastrophic consequences.
Note that the Quality Matrix approach suggests that you fix something to bring the total number of bugs down to tolerable levels at all levels of severity, in all areas. The PM and stakeholders might decide, however, that exceeding some of these limits is a manageable project risk – just because you have an effective way to measure and control quality does not mean you should give quality priority over the other constraints.
Another suggestion is to approve no bug fixes on a ‘time available’ basis – only approve fixes that the QRB assigns to a realease. This makes it easier for the the PM and test group to schedule resources to test the fixes, and makes sure that you perform a risk assessment for each bug fix.
I've seen, over and over, a ‘simple fix’ going wrong and costing a project weeks to put right, and in an area where the number of bugs before the fix was within the specified tolerance for error. The QRB should evaluate the risk of making any change, based on the magnitude of the change, the skill and understanding of the developer, and the potential for affecting other areas.
Gold Plating Quality
When I first proposed the use of the Quality Matrix, the 3-D technique, I invited the QRB to fill in a table of acceptable number of problems at each level of severity, for each area. Every single member responded with a matrix, completely filled out, by hand, with zeros. Moreover, they did not just write a zero and a line across the table, they all laboriously filled in the zeros in each cell – and this well before using PCs! They were emphatically stating that the number of acceptable bugs, at all levels of severity, in all areas of the system was zero, i.e., they were calling for zero defects!
Looking at our historical database for a particularly good release, I compiled a Quality Matrix – it had non-zero entries at all levels of severity, in practically all areas of the product. Yes, there were even Level 9 problems, and several at Level 8, and yet the user community saw it as a significant improvement in quality, one of our best releases.
I got the QRB to accept the example as a baseline and proposed a modest improvement, about 10%, across the matrix, as our objective for the next project. In this way, we approached zero defects, but within reasonable and achievable parameters for the project.
For anyone from the “it costs less to fix the bugs sooner rather than later” school, remember that we're also managing opportunity costs – the stakeholders might well tolerate higher overall costs for fixing, as long as the project delivers on schedule, so they can start realizing the ROI.
The 3-D Quality Matrix technique enables stakeholders to achieve precise control over software quality. This gives the PM a way to relate quality to schedule and costs.
Quality Review Boards play a critical role in achieving quality metrics that reflect stakeholder requirements for quality.
Pareto analysis of missed quality targets identifies requirements for changes during a project – use the analysis to establish best practice from project to project.
Use of 3-D Quality Measures provides information the PM can use to perform Risk Management and to avoid gold plating. The organization can make progress toward zero defects and zero down time while setting reasonable and attainable objectives for the project quality, schedule and cost.
For the classic problems, posed at the beginning of this paper, the PM now has a tool to:
- Agree on measures of software quality.
- Balance doing it right with doing it on time and within budget.
- Manage risk while managing quality.
- Represent stakeholder requirements in decisions that affect quality.
© 2004, Norm Goodkin
Originally published as part of 2004 PMI Global Congress Proceedings – Anaheim California