One reason that looking at the value stream is so important is that it gives us a way to see the work being done in a better manner than just watching people. As Don Reinertsen states in The Principles of Product Development Flow: 2nd Generation Lean Product Development (and as SAFe mirrors) – “if you only measure one thing, measure cost of delay.” This reflects that we are trying to eliminate delays. Not just in value realization, but in anything that directly or indirectly causes delays in value realization.
- delays in workflow (typically due to waiting for people or due to handoffs)
- delays in feedback after a decision or activity
- delays in realizing value
- delays in knowledge transfer
These all not only delay value delivered but they literally increase the amount of work to be done. Lowering them is critical and can be done by:
- Managing queues
- Having small batches
- Creating visibility
- Automating testing
- Having a test-first attitude at acceptance and unit level
In Mr. Reinertsen’s brilliant book, which is the basis for much of SAFe’s foundations, he devotes an entire chapter to managing queues. Some of this is referred to in SAFe’s Principle #6 – Visualize and limit WIP, reduce batch sizes, and manage queue lengths.
Why Looking at Delays Is So Important
One of the significant differences between software development and the physical world is that software, while being developed, is essentially invisible. But we can track its progress by looking at where it is in development. We can also get a good sense of this by looking at the queues before each step.
All of our work should add value to the software programs and services we produce; however, much of the work done in software organizations is created because of problems and delays in workflow. We call this “induced work,” the work we make for ourselves beyond what would have otherwise been needed to accomplish our goals. It happens at all levels and scales of an organization.
If we can identify such delays and remove them, we can “stop creating waste” (at least some of it). This reduces the overall time needed to finish our creative work, so we become more productive. And this in turn creates a virtuous cycle where many other benefits follow, including higher quality, avoiding or fixing errors quickly, and gaining a better understanding of features so even less work is wasted. We find delays by looking at where time is spent in the process, so time is key.
Delays reveal loss… and opportunity
Waste: Hiding in plain sight
A picture of two men digging holes. The first is shoveling his dirt into the second's hole. Neither man is aware of the other. The picture is meant to be comical, but is, unfortunately, all too true. Yes, the ditch digger in the background is throwing dirt into the other person’s hole.
We sometimes hear the mantra “eliminate waste” and in this case that would mean stop throwing dirt from one hole into the other.
Unfortunately, as in this case, we often don’t realize we are creating the waste we need to eliminate. In this example, to the ditch digger in the foreground, there is just dirt in his hole that he has to remove. There is not the “useful” dirt that was there that he has to remove and the “waste” dirt that was thrown in by the other ditch digger. There is just dirt.
Note also that the other ditch digger isn’t aware of the extra work he is causing. In other words, if you told these folks to “eliminate waste” they’d probably just shrug their shoulders, think “what waste?” and get on with doing what they are doing. Waste often can only be seen when one looks from outside the problem. Yet another example of why an holistic approach is required.
“Eliminate waste” or “Stop creating it”?
Rather than “eliminate waste,” I prefer to focus on “stop creating waste.” Because all too often, half of our work involves digging out dirt that has been put into our hole by another group. Lean suggests that the way forward is to focus on eliminating delays in the workflow rather than trying to do work faster.
I suggest that much of our time is spent working on what I call induced work. It is work that is literally created from delays in your process and is self-inflicted (even though unintentionally). It can result in a significant amount of additional work you have to do that you wouldn’t have to do if you managed your delays more effectively.
For example, consider the challenge of dealing with bugs in software. A developer writes a bug. Now imagine that he/she is told about it immediately. How long does it take to fix? Let’s say an hour. Now, imagine that they aren’t told about this for a couple of weeks and further imagine that nothing else has changed. How long does fixing take now? A lot longer, maybe even days longer. And it gets even worse if you have other work going on where the code has been changed by others or is using code modified by others since the original code was written.
What does “induced work” look like?
I suggest that much of our time is spent working on what I call induced work. This is work that is literally created from delays in your process and is self-inflicted (even though unintentionally). It can result in a significant amount of additional work you have to do that you wouldn’t have to do if you managed your delays more effectively. In this chapter we’ll take a look at this and why it occurs.
The following lists show work we intended to do and extra work that can be said to be self-inflicted by making mistakes and not having quick feedback to identify them.
Our intended work
- Getting requirements
- Re-doing requirements
- Working from old requirements
- Building the wrong feature
- Building unneeded features
- “fixing” bugs (most of the time is spent on finding them)
- Overbuilding frameworks
- “Integration errors” (they are really errors resulting from two groups being out of synch)
- Essentially duplicating components
Our intended work makes progress on the mission of your organization. Induced work is work was created by making a mistake or having a misunderstanding. I’m not suggesting mistakes and misunderstandings can be avoided. However, the amount of the induced work greatly increases the longer the time from the error until it is detected. I suggest we can usually vastly reduce the cost of the mistakes even when we can’t avoid making them. The common theme in doing this will be to minimize the time from making the mistake until detecting it. The notion that delays increase our waste can also be applied to most of the other items on the right. Let’s see.
Re-doing requirements or working from old requirements is caused when you have a delay from when you got the requirement until you needed to use it. Building the wrong feature is usually due to a miscommunication between the customer (or their proxy) and the development team. The greater the delay between getting the initial requirement and actually building it will increase the amount of work involved. Building unneeded features is so axiomatic in our industry that we think it unavoidable. However, if one builds features in stages, one can often learn that a feature isn’t needed by the time one gets ready to build it.
If we focus on building the most important features in small batches we can use what we learned to see if we actually need the pieces we deferred. This is another tenet of Lean – work on small batches. This accelerates value delivery while shortening delays to feedback. All of this contributes to reducing induced work.
Let’s look at the other items on the list of induced work. You may have noticed that the fixing, in fixing bugs, is in quotes. The reason is that developers don’t actually spend a lot of time on fixing bugs even though they have the experience that they do. Let me explain.
Consider this, imagine the worst bug you’ve ever had in your experience, or the worst bug you’ve seen a developer have if you’ve never been one. Think of the time they spent “fixing” it. Most likely, the first few hours were investigating the problem, then trying something, then setting things back after that didn’t work. Notice, up to this point, no fixing has been done. Investigating and relearning has taken place. The fix itself typically takes very little time.
Some people protest that this is just semantics. I disagree, but even if true it’d be important. There are two activities taking place here. The first is a discovery of what we have to do (finding) and the second is doing it (fixing).
Let’s take a look at this another way. Imagine a developer writes a bug. As a small aside I’ve noticed that developers talk about bugs as if they don’t write bugs but rather that they either show up or testers put them in. Notice how they often say “I found a bug!” or “testing found a bug!” as if they had nothing to do with it. BTW: I noticed this by observing myself, so I’m not deriding anyone. Anyway, now imagine that he/she is told about it immediately. How long does it take to fix? Let’s say an hour. Now, imagine that they aren’t told about this for a couple of weeks and further imagine that nothing else has changed. How long does fixing take now? Lot’s longer, maybe days longer. And it gets even worse if you have other work going on where the code has been changed by others or is using code modified by others since the original code was written.
The additional time required to find and fix from the first case to the second case is not semantics and it is a different nature than fixing code. It is clearly additional re-learning and discovery time. The reality is that we spend much more time finding our problems than fixing them and the greater the delay from creating the error until detecting it the greater this amount of increased time is. Also notice that this is not task-switching time as it is often attributed to – one might start working on the bug fix and concentrate on it alone and this phenomenon will still occur.
Continuing down our list, I would suggest that ‘overbuilding frameworks’ and ‘essentially duplicating components’ are more due to a lack of technical skills that can be improved through the use of design patterns and emergent design. Duplication is also exacerbated by delays as sometimes people forget what has been done.
The last work type on the right is “integration” errors. Again, note the quotes. I mark them that way since integration errors are exceedingly rare. An integration error would be an error in integration. More than 99.9% of the things I’ve seen called integration errors are actually errors that occurred well before integration. That is, the teams needing to integrate did not stay in sync with their understanding or their code. The integrator integrated just fine. The error lay in the fact that the components he integrated properly just don’t work together properly. Calling an error that occurred upstream an “integration” error is equivalent to calling a bug found in testing a “testing” error. Again we can see that the greater the delay from the error occurring and its detection in integration will increase the work taken to fix it. Note that his is just another reason why continuous integration is good. Continuous integration isn’t about avoiding integration errors, it is about detecting miscommunications between groups working together as they occur.
It is easier to save time by not creating induced work
It is important to notice that with the exception of automating testing, the work (on the left) we find valuable will likely be difficult to speed up. Yet the work on the right, which we don’t want to do, can be mostly eliminated by cutting out the delays in our workflow.
It is worth taking a few minutes and consider how much time your organization spends on the left side of the table compared with the right side of the table. Pause, take a minute.
In my classes I ask this question and the general consensus is 30-70% is spent on the left. I actually think this is a bit optimistic, but even so, it provides a lot of motivation to shorten delays and shrink the work we do that isn’t useful.
Much of the work we do is actually not making progress on our goals but is literally induced (created) by the delays in our workflow. Lean suggests that we look at the delays between our workflow in order to eliminate the waste created by these delays. While we should also be looking how to improve our work, our biggest initial returns are likely going to be by attending to time.