Bugs and Beyond
A friend asked me the other day if I had any suggestions on bug classification systems. While we didn’t get into the specifics, what I gathered was that with four different development teams in different geographies (they have grown by acquisition), they are trying to get to a common categorization scheme. I inferred this was for two likely reasons – they want to try to merge these different sources of information into one list of field priorities and they are hoping to have this happen efficiently (if not automatically). They currently use a severity/priority system and she was wondering whether I had used anything else that was simpler than this.
In short, I told her about the system I’ve used for years – I’ll break it down in case you find it useful, but I don’t think the classification scheme is novel. In fact, it’s the same severity/frequency system she was already using. For any defect, there are two things that matter – how severe is the problem when it happens (does it crash the system or is it cosmetic, is there a workaround available?) and how often is it affecting your customers (something all users do every day or off the beaten path used only by a small percentage of your customers). This is the same categorization that we use today and we even show the top issues mapped this way on a 2-D graph for the executive team each week.
So, now assume that you have 100 defects all rated with these two values on a scale of one to ten (with ten being the most severe or having the broadest impact). Let’s also assume you have a team of people allocated to work on these defects. Which ones do they fix first? The one that crashes the system once a week or the one that requires all users to work around a defect 10 times a day? Is there a simpler formula that maps the 2 dimensional categorization scheme into a simple priority list? What if I multiply the severity by PI and then multiply that by the square root of the frequency? Or maybe I just simply add them up?
Well, there is, but you’re not going to like it (I don’t think she did either). The system is simple, but it is anything but automatic. The secret? A cross functional team needs to meet on a regular basis for the sole purpose of updating the ordered list.
See – I told you that you would not find it particularly profound. What is profound is that if you want the system to work, then you really don’t have a choice but to put the time into it. To explain further, let’s define “working” as “the team that fixes the problems knows what they need to do, why it is important, when it needs to be done, and what success will look in the eyes of the customer and all stakeholders involved feel that they had the opportunity to advocate for their customers and feel that the outcome is fair and equitable so they support the results rather than work to subvert the system”. If you have some other definition of “working”, then you can probably cheat the system, but I challenge you to figure out which part of my statement you can remove and still feel like you are happy with the solution.
I see this happen time and again, particularly in the technical arena. We “technical” people tend to think that efficiency is achieved by removing the people from the process. In many realms, this is true, but what we fail to see is that some challenges can not be solved this way because they are inherently “political”. Oddly enough, since engineers tend to shy away from conflict in public forums, we even tend to think that our automated system is a great idea because it eliminates all that messy discussion and confrontation. We couldn’t be further from the truth.
So, even when all parties have had the chance to have their say, who decides? The leader of the team that fixes the problems? No way – the list would be based on which problems were easiest or most interesting to solve. The head of sales? Only if you want it to be based on what it will take to win the next deal. In my experience, the best answer is the lead of the support escalation team, coached to make these decisions in order of impact to the field. The reason that I would invest this power in that individual is because I firmly believe that in the long run people make decisions in their own self-interest, so the key to getting what you want out of the system is to align their interests with the interests of the people that should really matter – your customers. While not perfect, the best proxy for field impact is the number of customer issues opened in your support team that would be prevented if the item were fixed. Said another way, anything that you do which reduces the number of calls or emails you receive to report problems is good. Listen to the leadership within your support team, and good things will happen.
To summarize our system, all defects are given both severity and frequency ratings on a scale of one to ten, but all work in the development escalation team is driven by a single, ordered list that makes the priority of the work clear. We meet once a week to review and update the list, with representatives of support and PD present, but the decision on how to rank them is made by our Tier 3 lead.
Some things to watch out for – multiple lists, grouping, and action-ability.
“One list shall rule them all” needs to be your mantra – there will be lots of innocent reasons to consider breaking out sections of the list so that you can see how many bugs related to each feature their are, etc., but in the end, if there is more than one list then your team that does the work will be confused about how to decide between the 5 “number 1″ priorities that come from each list. You’ll feel better because you are able to avoid conflict this way (just let each team keep their own list, right?), but less will get done, and they’ll work on the wrong things. Keeping more than one list is just an invitation for the technical team doing the fixes to choose the work they really want to do instead of what is needed.
Grouping is the attempt to combine issues together in order to influence the priority – I’ve got 15 different reporting issues but I’ll just combine them into one that I call “Reporting Issues”. The impact is magnified because I have lots of customers affected when I lump them together – by doing this I took 15 separately minor issue and made the critical en masse. Why doesn’t this work? Because your fix team still has to work on fixing them one at a time, and when they do, the criticality of the work they are doing will seem hollow. You won’t fool them but instead only undermine your authority and their motivation.
Finally, and with apologies to every English teacher I ever had, “action-ability” – is it clear what the technical team needs to do to fix the problem? All to often I have seen “boil the ocean” issues listed such as “make system more reliable” – again, the more vague you are, the less likely you are to get what you want. Someone needs to have the job of translating the source of pain (customers keep calling to report that the system is slow) into actionable work (it takes 3 minutes for the employee list page to come up if you use the page during the busy hours) – this is usually the job of the Tier 3 escalation team working in conjunction with the lead from PD Escalation – this is THE critical role for making the system work, so make sure you get the best possible person into it.
CK



