The expected result was 42. Now what was the test?: Measuring with Stories

In my experience one of the most difficult tasks I have is to try and change the way we measure testing. When I first started out in the profession of testing there was very little thought given to how we should report the quality of the testing that has been carried out. The most common way was to record the number of test cases and report how many passed and failed.

Then the managers wanted more information such as:

The number of defects raised
The number of defects fixed
The severity of the defects.

Does this sound familiar to anyone?

Sadly even today when testing as a profession has started to mature, managers still measure the quality of testing by using these figures and metrics.

I read the following article the other day: Defect Detection By Developers

The article talks about how developers can discover more defects in the same amount of time frame as a tester.

I always have problems when I see articles like this.

How is the statement:

'The number of defects detected by the developer is of the same order as detected by a test engineer in the same time frame'

Quantifiable or Qualitative?

There is no mention of the type of defect and the measure of risk of the project if the defect was not found. What happens if say 90% of the defects found using this method were purely cosmetic? Would this indicate this method is better than using a skilled tester during the same time frame? The skilled tester may find less defects during the same timescale but they may (and normally do) find the difficult to detect defects. Or having the tester and the developer work together during the development phase using continuous build to check as they go?

The approach discussed is one which many companies should already be using

In my experience one of the most difficult tasks I have is to try and change the way we measure testing. When I first started out in the profession of testing there was very little thought given to how we should report the quality of the testing that has been carried out. The most common way was to record the number of test cases and report how many passed and failed.

Then the managers wanted more information such as:

Code reviews
Peer reviews
Documentation reviews (What happens if the project is a prototype project in which no documentation exists? - Maybe BDD could cover this?)
Unit tests etc.

The article misses vital approaches such as continuous integration? How many defects are trapped during this method?

I think the article has some valid points however I strongly object to the statement that a developer can detect the same quality of defects as a skilled tester can in the same period of time.

I do not like 'measure by defects' to prove quality - quality is proven by the telling of a story to indicate that the product is of the right quality for its purpose.

This leads me nicely into the title of this blog.

Why does management insist on measuring the quality of testing using numbers?

Are managers measured using numbers?

In my role as a test manager I have never, in my experience, been appraised solely using numbers nor as far as I believe has any senior stake holders been measured solely using numbers. When I talk to my superiors about my management goals I try to tell stories and avoid using numbers. For example if I said to my line manager ‘I have met 70% of my targets’ and left it at that. What information would my line manager get? Would that be good or bad? Compare that to some tester reporting to their manager that 70% of their testers have passed. What information can you get from that?

I understand these are extremes but unfortunately these extremes are common when it comes to measuring the quality of testing.

So if I spoke to my line manager and said that over the last six months I have been involved in addressing x, y and z, which are the most important tasks, I still have problems with a and b and I have not had the time to deal with T and S. I have enjoyed the challenge of x y and z and feel I have achieved the best I can in these areas, a and b are ongoing and I am sure that I can complete them soon as long as I get support for b. My estimates where too ambitious for me to even start T and S but these were not too important so I gave them a low priority for completion.

Does this give you a better understanding of what has been happening? Do you notice there is no use of numbers in the above? I feel it gives far more information.

So if we now do the above again and convert it to a story about testing:

For the last six months I have been involved in testing areas x, y and z, which are the most important areas of the product. I still have problems with test areas a and b which appear not to be working as expected and I have raised defects for these issues which are reported below. I have not had the time to test areas T and S. Test areas x y and z are of sufficient quality for release. Currently test areas a and b are not of sufficient quality but once the defects in b are fixed I am confident that the quality will improve. Due to the time taken to address the issues with a and b my estimates where too ambitious for me to even start test areas T and S but since these were not too high priority the product can work without them as long as it is documented.

How does this sound? If you as a manager read this would you have a better understanding of the quality of testing?

Management is normally measured by the telling of a story, so why not measure the quality of testing by the telling of a good story?

I was on twitter the other day I noticed an article by BJ Rollison talking about a similar talking in how we measure the quality of testing. Meaningful Measures

There was an interesting line he came up with:

'At one time I naively believed that there was a core set of metrics that all teams should be collecting all the time that we could put into a ‘dashboard’ and compare across teams. In retrospect that was really a bone-headed notion. Identifying these measures is not easy, and there is no cookie-cutter approach. Each project team needs to decide on their specific goals that may increase customer value or impact business costs. Testers should ask themselves, “why are we measuring this?” “What actions will be taken as a result of these measures?” And, “if there is no actionable objective associated with this measure, then why am I spending time measuring this?'

Some very valid points, testers should be asking why are we measuring this, is it quantifiable? Will it help lay people understand the quality of the product?

When we measure the quality of testing it should be clear, concise and in a language that anyone can understand. Blinding people with numbers does not aid clarity. Trying to measure different teams on different projects with the same metrics (code coverage, defect counts, test case pass/fail) does not indicate one team is better than the other all it does is help management pretend that they understand the quality of testing that took place.

With this article I am not saying we should abandon all numerical metrics to measure the quality of software testing but we need to look more at the story behind the numbers since these can give you far more information on the quality of testing.

Some other useful articles on Software Metrics:
meaningful-metrics by Michael Bolton
Metrics by Kaner and Bond

3 comments:

Fake Tester...24 March 2010 at 12:19
Rightly said... But again, the very people who were test managers and used the same measures that you talk about now are into senior management these days. They tend to ask for the same information with which they grew, and so, we are stuck with the same measure.
Michael Bolton http://www.developsense.com24 March 2010 at 16:30
Thank you for the nod, John. In addition, I would like unhumbly to point people to these articles too:

Three Kinds of Measurement (And Two Ways to Use Them)
Issues About Metrics About Bugs

And if they don't believe me, how about Tom DeMarco's dramatic recantation of the idea of software development as an engineering discipline?

http://www2.computer.org/cms/Computer.org/ComputingNow/homepage/2009/0709/rW_SO_Viewpoints.pdf

Cheers,

---Michael B.
Gergely9 April 2010 at 05:16
Hi Michael.

This is all fine when you are in a stand up and can talk.

But what if you are required to do some graphs and and some analysis and send it to a manager on site who you cannot talk with?

You have to have some numbers by then. Show how many tests passed show what the application is capable off.

And if you write a story in your letter, well the managers are busy people. They won't care about a story they just need to have some fast numbers and off they go to another meeting to tell a customer how well the product is taken care of.

Although i do understand and agree with your article i think it can only be implemented in some cases and situations.

Gergely.

Wednesday, 24 March 2010

Measuring with Stories

3 comments: