Can We Measure Developer Productivity and Is That Even the Right Question to Ask? - Part 1

Recently, McKinsey & Company published an article entitled, “Yes, you can measure developer productivity.” The article created a good deal of strong opinion across social media, including these notable responses from Kent Beck and Gergely Orosz “Measuring developer productivity? A response to McKinsey (Part 1 and Part 2)”.

All of these articles are worthy reads and raise some interesting points, but are not without the need for some further scrutiny and thought. The one thing I want you to keep in the back of your mind as we look at the article and responses: Is measuring developer productivity the most important focus for us if we are concerned about delivering value to customers and realizing enough profits and other benefits for our business?

Let’s start with the McKinsey article in part 1 of this blog. Part 2 will focus on the response from Kent Beck and Gergely Orosz.

The McKinsey Paper

The McKinsey paper states that software development is perennially under-measured compared to functions like sales or customer operations. The authors state that measuring developer productivity is difficult because the link between inputs and outputs is less clear than in other functions, and further note that because software development is collaborative, complex, and creative work, you must measure at the system, team, and individual level.

So far, not so bad, although I would argue that I have spent considerable time measuring software development in my career as a software development leader. Has what I measure regarding software development improved as my career has progressed? Most definitely. I’m all for measuring system level outcomes, like increased feature usage, and capabilities, like reductions in the marginal cost of deploying the next code change. Depending on how we define team, I can see some team measurements being appropriate in certain contexts too. However, if we are asking our teams to work cross-functionally and collaboratively, it is going to be very difficult and likely counter-productive to try to measure individual contribution. To be frank, if you want to assess individual performance, just ask their teammates and others who work closely with the team and you will get a good idea of individual value to the team.

Getting back to the paper, the authors put forth the following questions that leaders can begin to answer through the set of metrics they propose:

  • What are the impediments to the engineers working at their best level?

  • How much does culture and organization affect their ability to produce their best work?

  • How do we know if we’re using their time on activities that truly drive value?

  • How can we know if we have all the software engineering talent we need?

The questions themselves are perfectly good and reasonable to ask, but I am not convinced that measuring developer productivity is the best way to answer these questions. With the first question on impediments and third question on valuable activities, certainly we want to look at waste in the system, namely things like waiting, handoffs, gates, dependencies, and working on features that customers will never use. Tools and systems like Kanban and value stream mapping would seem to provide greater insight into this than looking at one specific role’s “productivity.” I would be more comfortable looking at the impediments to teams working at their best level. Culture and organization are very important considerations that impact the performance of everyone in the organization. I have no idea how measuring developer “productivity” will reveal many insights about culture and organization, and is certainly not the most effective way of doing so. The final question about having all the software engineering talent we need is an important question to be able to answer, but I think we first need to make decisions about what our objectives are and then figure out whether a focus on flow efficiency or resource efficiency is likely to better put us in a position to reach those objectives. We also have to be honest about what we don’t know about what will drive customer value and business results. This uncertainty influences how many resources we should be willing to dedicate to efforts to build sets of features. We should be focused on developing the capability to deliver work within fast feedback cycles to promote learning that will reduce our uncertainty over time.

The mix of metrics that the paper proposes is a combination of DORA, SPACE, and author-generated metrics. The author-created metrics include a Developer Velocity Index benchmark and an Individual Contribution Analysis.

The use of DORA metrics is meaningful in that we get a view of our capabilities to deliver software to our customers regularly, quickly, and without significant disruption. The SPACE metrics are relatively newer, but also seem to be well thought out, giving us a view to employee satisfaction and engagement, communication, and flow. There are a few of the SPACE metrics that give me some pause, especially around activity generated (even with the noted caveats in the paper) and things like meeting quality, which are very subjective. My biggest concern, though, is with a couple of the “opportunity-focused metrics” suggested by the authors of the paper. Developer Velocity Index is worrisome because how are we measuring velocity? Do we even have a good way of measuring velocity? Story points, if used, are highly individualized to team and can’t really be judged against a benchmark. The Individual Contribution Analysis is even more frightening as if we want our developers to work collaboratively and cross-functionally, how are we separating out their “contributions” to these collaborative efforts? Even worse, if we measure this, it is only a matter of time before people refuse to collaborate with others when such activities will hurt their “individual contribution” score.

The big claim in the paper is that they implemented these measurements at nearly 20 tech, finance, and pharmaceutical companies and found significant improvements - namely 20-30% reduction in customer reported defects, 20% improvement in employee experience score, and 60% improvement in customer satisfaction ratings.

Let’s first state that this is a very small sample size. We would be wise to view any causal relationship between tracking these metrics and these results as a very premature conclusion. To the extent that we measure the number and severity of customer reported defects (I am a big supporter of doing so), I can see a focus on improving that metric driving that level of reduction of such defects. Also, seeing a 20-30% improvement in employee experience is fairly believable if we focus on the underlying causes of interruptions and process delays. The bridge too far for me is the reported 60 percentage point improvement in customer satisfaction ratings. These companies would have to have had pretty dismal customer satisfaction to begin with to see such huge jumps, in which case almost any focus on better process should lead to significant improvements. Secondly, it is quite hard to believe that customer satisfaction jumps significantly just based on delivery capability and process improvements. While delivery improvements certainly contribute to customer satisfaction, the hard truth is that up to 80% of software features delivered are rarely or never used and therefore unlikely to be the main driver of customer satisfaction. In my opinion, we can best drive orders of magnitude improvements in customer satisfaction by focusing on getting better at continuous discovery, emergent planning and dynamic roadmapping based off of continuous learning instead of strictly following a static plan / roadmap and optimizing delivery thereof. Once we are good at figuring out what features will add value to customers, then we can see the most benefit from optimizing the delivery process.

So to sum up my take on the McKinsey paper, I do not believe that “Can we measure developer productivity?” is the most important question we should be asking ourselves as organizational leaders if we are trying to maximize the business outcomes we are hoping to achieve. The right question for us to be focusing on first is “Can we improve our ability to predict which features will drive both customer value and desired business outcomes, or given how hard this is, can we construct a fast feedback system with delivery of small increments of value iteratively so that we most quickly arrive at what will deliver value and to minimize the waste of producing unused features?”

Part 2 of this blog will focus on what lessons we can learn from the response to the McKinsey article written by Kent Beck and Gergely Orosz.

The cover image for this blog is from Simone Secci @simonesecci.

Previous
Previous

Can We Measure Developer Productivity and Is That Even the Right Question to Ask? - Part 2

Next
Next

Are We Delivering Too Much?