The expected result was 42. Now what was the test?: March 2010

In my experience one of the most difficult tasks I have is to try and change the way we measure testing. When I first started out in the profession of testing there was very little thought given to how we should report the quality of the testing that has been carried out. The most common way was to record the number of test cases and report how many passed and failed.

Then the managers wanted more information such as:

The number of defects raised
The number of defects fixed
The severity of the defects.

Does this sound familiar to anyone?

Sadly even today when testing as a profession has started to mature, managers still measure the quality of testing by using these figures and metrics.

I read the following article the other day: Defect Detection By Developers

The article talks about how developers can discover more defects in the same amount of time frame as a tester.

I always have problems when I see articles like this.

How is the statement:

'The number of defects detected by the developer is of the same order as detected by a test engineer in the same time frame'

Quantifiable or Qualitative?

There is no mention of the type of defect and the measure of risk of the project if the defect was not found. What happens if say 90% of the defects found using this method were purely cosmetic? Would this indicate this method is better than using a skilled tester during the same time frame? The skilled tester may find less defects during the same timescale but they may (and normally do) find the difficult to detect defects. Or having the tester and the developer work together during the development phase using continuous build to check as they go?

The approach discussed is one which many companies should already be using

In my experience one of the most difficult tasks I have is to try and change the way we measure testing. When I first started out in the profession of testing there was very little thought given to how we should report the quality of the testing that has been carried out. The most common way was to record the number of test cases and report how many passed and failed.

Then the managers wanted more information such as:

Code reviews
Peer reviews
Documentation reviews (What happens if the project is a prototype project in which no documentation exists? - Maybe BDD could cover this?)
Unit tests etc.

The article misses vital approaches such as continuous integration? How many defects are trapped during this method?

I think the article has some valid points however I strongly object to the statement that a developer can detect the same quality of defects as a skilled tester can in the same period of time.

I do not like 'measure by defects' to prove quality - quality is proven by the telling of a story to indicate that the product is of the right quality for its purpose.

This leads me nicely into the title of this blog.

Why does management insist on measuring the quality of testing using numbers?

Are managers measured using numbers?

In my role as a test manager I have never, in my experience, been appraised solely using numbers nor as far as I believe has any senior stake holders been measured solely using numbers. When I talk to my superiors about my management goals I try to tell stories and avoid using numbers. For example if I said to my line manager ‘I have met 70% of my targets’ and left it at that. What information would my line manager get? Would that be good or bad? Compare that to some tester reporting to their manager that 70% of their testers have passed. What information can you get from that?

I understand these are extremes but unfortunately these extremes are common when it comes to measuring the quality of testing.

So if I spoke to my line manager and said that over the last six months I have been involved in addressing x, y and z, which are the most important tasks, I still have problems with a and b and I have not had the time to deal with T and S. I have enjoyed the challenge of x y and z and feel I have achieved the best I can in these areas, a and b are ongoing and I am sure that I can complete them soon as long as I get support for b. My estimates where too ambitious for me to even start T and S but these were not too important so I gave them a low priority for completion.

Does this give you a better understanding of what has been happening? Do you notice there is no use of numbers in the above? I feel it gives far more information.

So if we now do the above again and convert it to a story about testing:

For the last six months I have been involved in testing areas x, y and z, which are the most important areas of the product. I still have problems with test areas a and b which appear not to be working as expected and I have raised defects for these issues which are reported below. I have not had the time to test areas T and S. Test areas x y and z are of sufficient quality for release. Currently test areas a and b are not of sufficient quality but once the defects in b are fixed I am confident that the quality will improve. Due to the time taken to address the issues with a and b my estimates where too ambitious for me to even start test areas T and S but since these were not too high priority the product can work without them as long as it is documented.

How does this sound? If you as a manager read this would you have a better understanding of the quality of testing?

Management is normally measured by the telling of a story, so why not measure the quality of testing by the telling of a good story?

I was on twitter the other day I noticed an article by BJ Rollison talking about a similar talking in how we measure the quality of testing. Meaningful Measures

There was an interesting line he came up with:

'At one time I naively believed that there was a core set of metrics that all teams should be collecting all the time that we could put into a ‘dashboard’ and compare across teams. In retrospect that was really a bone-headed notion. Identifying these measures is not easy, and there is no cookie-cutter approach. Each project team needs to decide on their specific goals that may increase customer value or impact business costs. Testers should ask themselves, “why are we measuring this?” “What actions will be taken as a result of these measures?” And, “if there is no actionable objective associated with this measure, then why am I spending time measuring this?'

Some very valid points, testers should be asking why are we measuring this, is it quantifiable? Will it help lay people understand the quality of the product?

When we measure the quality of testing it should be clear, concise and in a language that anyone can understand. Blinding people with numbers does not aid clarity. Trying to measure different teams on different projects with the same metrics (code coverage, defect counts, test case pass/fail) does not indicate one team is better than the other all it does is help management pretend that they understand the quality of testing that took place.

With this article I am not saying we should abandon all numerical metrics to measure the quality of software testing but we need to look more at the story behind the numbers since these can give you far more information on the quality of testing.

Some other useful articles on Software Metrics:
meaningful-metrics by Michael Bolton
Metrics by Kaner and Bond

A colleague the other day asked me for my views on the automation/manual debate and asked the following question:

Does manual testing really lose its value as Companies encouraging more and more automation?

I thought this is a very interesting question and decided to blog my response.

I should start by saying I am not split into either camp on the automation vs. manual testing debate. I can see the benefit of both depending on circumstances you are in. If you have an old legacy project in which you are 100% sure nothing will change then automation could be the answer. If you involved in a system in which changes could be made and you want to CHECK that the functionality or the business rule that is in place is still giving the exact same result with no deviation then automation could work.

My thoughts on the question that I was asked (and the title of this article) are completely the opposite. Automation requires no sapience or thinking to be executed, so once the automated checks have been written (which does require sapience) you can run and forget.

The problem is that the world in which we work in is all about changing, adapting and making things better (normally) especially in an agile environment where change is embraced.

I should state that my definition of manual testing is not of following pre scripted tests but of being a test explorer, searching in all the nooks and crannies, trying to discover new and intriguing things about the software. If it is pre scripted then automate it, do not waste good tester intelligence and skill on running a check list, your testers deserve better.

So I would ask the following questions on any company who want to encourage more automation at the expense of manual testing.

Is it cost effective to write lots of automated checks compared to carrying out manual exploratory testing?
Which method would in the same time period give the most test coverage?
Which would be the most easy to adapt to major changes?
Which would uncover the most problems or issues?

I feel there is some value in using automation at unit level and build level, continuous integration with acceptance checks is a useful tool for the software tester since it lets them have a early look at changes with some confidence that what they have been given will at least have a chance of working. Sure beats the good old days of rejecting x releases in day because of a typo in an install script or a missing dll from the build.

You hopefully can see that I am not against using automation; however there appears to be a view in the testing world that automation can replace manual testing or make its value less. This view does worry me since if the corporate suits think they can get more value and better quality using automation then the message about the art software testing is not being broadcast well enough.

Manual and automation can co-exist very well together; however:

manual testing can exist without automation
but automation cannot exist without manual testing.

There is an interesting podcast with Jon Bach and Michael Bolton with their viewpoints on the difference between checking and testing available here:

http://www.quardev.com/blog/2010-02-02-1123487836

The expected result was 42. Now what was the test?

Wednesday, 24 March 2010

Measuring with Stories

Wednesday, 3 March 2010

Does manual testing really lose its value as Companies encouraging more and more automation?