Disciplined Agile

Metrics Categories: How to Roll Up Metrics When Every Team Measures Uniquely

Disciplined Agile (DA) teams choose their way of working (WoW) because Context Counts and Choice is Good. An implication of this strategy is that teams will choose what they measure to ensure that their metrics provide meaningful insight for them.  This is great for the team, but has interesting implications for your Value Delivery Office (VDO) or governance team that needs monitor and guide many teams.

In this article we explore three important issues:

  1. How do you aggregate, or "roll up", metrics from agile teams into a portfolio dashboard?
  2. How do you do this when the teams are working in different ways?
  3. How do you aggregate team metrics when you still have some traditional teams as well as some agile/lean teams?

How do you aggregate agile team metrics into a portfolio dashboard?

Pretty much the same way you aggregate metrics from traditional teams. There tends to be several potential challenges to doing this, challenges which non-agile teams also face:

  1. You can only aggregate metrics with similar units. You do need to be careful with some measures, such as team velocity, because the units vary across teams. It is possible to enforce a common way to measure velocity across teams, but this tends to be more effort than it’s worth in practice. Sometimes there are metrics which you think you should be able to aggregate but you discover (hopefully) that you really shouldn’t. One example of this is number of defects by severity. You can roll this metric up when the severity of a defect is determined in a consistent manner across teams, but this isn’t always the case in practice.
  2. Sometimes the math gets a bit complex. Aggregating metrics isn’t always based on simple addition. Often you will need to weight metrics based in terms of time, size, financial impact or combinations thereof. Interestingly, although some metrics can’t be rolled up because they are measured in different units, you can often roll up the trends of those metrics. For example, acceleration is the change in velocity of a team. Given an appropriate weighting formula you can roll up an average acceleration figure across teams.
  3. Some people believe you can only aggregate the same metric. Basically, when a common metric is captured across teams (for example Net Promoter Score (NPS) or cyclomatic complexity) then you can easily (albeit with some “complex” math in some cases) roll them up to a program or portfolio level. In the Govern Team process goal we refer to this strategy as consistent metrics, an option of the Provide Transparency decision point. This strategy works well when teams collect the same metrics, but in when teams choose their WoW this isn’t always the case.

How do you aggregate agile team metrics into a portfolio dashboard when the teams choose their WoW, and it’s different for each team?

When a team is allowed to choose its way of working (WoW), or “own their own process,” the team will often choose to measure itself in a manner that is appropriate to its WoW. This makes a lot of sense because to improve your WoW you will want to experiment with techniques, measure their effectiveness for your team within your current context, and then adopt the techniques that work best for you. Teams will need to have metrics in place that provide them with insight into how well they are working, and because each team is unique the set of metrics they collect will vary by team. For example, in Figure 1 below we see that the Data Warehouse (DW) team has decided to collect a different set of metrics to measure stakeholder satisfaction than the Mobile Development team. The DW team needs to determine which reports are being run by their end users, and more importantly they need to identify new reports that provide valuable information to end users – this is why they have measures for Reports run (to measure usage) and NPS (to measure satisfaction). The Mobile team on the other hand needs to attract and retain users, so they measure things like session length and time in app to determine usage, and user retention and NPS to measure satisfaction.

Quality

Time to Market

Stakeholder Satisfaction

Data warehouse

  • Production incidents
  • Automated test coverage
  • Ratio of data to errors
  • Number of empty values
  • Data transform error rates
  • Cycle time
  • Lead time
  • Data time to value 
  • Ranged burn up chart
  • Net promoter score (NPS)
  • Reports run
  • Time in warehouse

Mobile development

  • Production incidents
  • Automated test coverage
  • Cyclomatic complexity 
  • Cycle time
  • Lead time
  • Net promoter score (NPS)
  • Session length
  • User retention
  • Time in app
  • Lifetime value 

Package implementation

  • Production incidents
  • Automated test coverage
  • UAT issues
  • Schedule variance
  • Burndown chart
  • Net promoter score (NPS)
  • Production incidents
  • UAT issues

Figure 1: Metrics gathered by three different teams across a consistent set of categories.

Furthermore, the nature of the problem that a team faces will also motivate them to choose metrics that are appropriate for them. In Figure 1 we see that each team has a different set of quality metrics: the DW team measures data quality, the mobile team measures code quality, and the package implementation team measures user acceptance test (UAT) results. Although production incidents and automated test coverage are measured by all three teams, the remaining metrics are unique.

The point is that instead of following the consistent metrics practice across teams by insisting that each team collects the same collection of metrics, it is better to ask for consistent metric categories across teams. So instead of saying “please collect metrics X, Y, and Z” we instead say “Please collect metrics that explore Category A, Category B, and Category C.” So, as you can see in Figure 1, each team is asked to collect quality metrics, time to market metrics, and stakeholder satisfaction metrics but it is left up to them what specific metrics they will choose to collect in each category. The important point is that they need to collect sufficient metrics in each category to provide insight into how well the team addresses it. This enables the teams to be flexible in their approach and collect metrics that are meaningful for them, while providing the governance people within our organization the information that they need to guide the teams effectively.

So how do you aggregate the metrics when they’re not consistent across teams? Each team is responsible for taking the metrics that they collect in each category and calculating a score for that category. It is likely that a team will need to work with the governance body to develop this calculation. For example, in Figure 2 we see that each team has a unique dashboard for their team metrics, yet at the portfolio level the metrics are rolled up into a stoplight status scorecard strategy for each category (Green = Good, Yellow = Questionable, Red = Problem). Calculating a stoplight value is one approach, you could get more sophisticated and calculate a numerical score if you like. This is something the governance body would need to decide upon and then work with teams to implement.

Metrics

Figure 2. Rolling up metrics categories (click on it for a larger version).

From the looks of the Portfolio dashboard in Figure 2 there is a heat map indicating the overall status of the team (using green, yellow, and red again) and the size of the effort (indicated by the size of the circle). Anyone looking at the portfolio dashboard should be able to click on one of the circles or team stoplights and be taken to the dashboard for that specific team. The status value for the heatmap would be calculated consistently for each team based on the category statuses for that team – this is a calculation that the governance body would need to develop and then implement. The size of the effort would likely come from a financial reporting system or perhaps your people management systems.

How do you aggregate team metrics when some teams are still traditional?

With consistent categories it doesn’t really matter what paradigm the team is following. You simply allow them to collect whatever metrics are appropriate for their situation within each category and require them to develop the calculation to roll the metrics up accordingly. If they can’t come up with a reasonable calculation then the worst case would be for the Team Lead (or Project Manager in the case of a traditional team) to manually indicate/enter the status value for each category.

Parting Thoughts

Although we talked in of a single portfolio in this article, the strategies we describe apply at the program or sub-portfolio level too, and then these metrics are further rolled up to higher levels. For the consistent categories strategy to work the governance people need to be able to look at the dashboard for a team, which will have a unique collection of widgets on it and be able to understand what the dashboard indicates. This will require some knowledge and sophistication from our governance people, which isn’t unreasonable to ask for in our opinion. Effective leaders know that metrics only provide insight but that they shouldn’t manage by the numbers. Instead, they should follow the lean concept of “gemba” and go see what is happening in the team, collaborating with them to help the team understand and overcome any challenges they may face. 

February 2022

Agile Metrics

Agile Metrics Micro-Credential

Become an agile metrics expert and elevate your career with our 7 PDU specialized course. Help your team and organization thrive by learning why, what, and how you should measure, as well as how to report results to management.

Non-member price: US$250 
Member price: US$225

Get Started