The expected result was 42. Now what was the test?: defects

Showing posts with label defects. Show all posts

Thursday, 17 September 2015

It meets the business requirements

Recently I came across an interesting software requirements issue whilst eating out at a UK restaurant chain.

They had one of those common deals you can come across in similar chains. You choose a main meal from a limited, but tasty, menu. You get unlimited drinks of either tea/coffee or fizzy drinks. (soda) and you can have unlimited access to the salad bar . To end your meal you get an ice cream sundae. All of this for the sum of £9.99!

I was with my wife and we both ordered a main meal and a drink. Whilst our meal was being prepared we went and selected our bowl of salad from the salad bar. We set about eating our salad, during this time the drinks arrived followed by our main meal. We only just managed to finish our main meal and was feeling rather full by now. So when the waitress came over and asked if there was anything else we both said no can we skip the ice cream sundae and just get the bill.

The table was cleared away and the bill arrived......

Can you guess how much the bill came to?

If you are thinking the same as my wife and I you would think the bill would be £19.98. The actual bill came to £20.13. I queried this with the waitress and she replied....

"Oh, that is because you did not have the ice cream sundae!"

Our reply was "WHAT", We have less but it will cost us more?

The waitress was very confused and went to see her line manager. Who came over and explained that what they will need to do is order us the ice cream sundaes and then go and let the kitchens know that there is no need to make it. They did say we could take it with us! Ice cream, in car for hours & warm day not a good combination.

All of the orders were taken on a mobile phone app which fed back to the main system and the kitchens. There appears to be no option to be able to mark an order as one of the deals. Each item is entered individually. Once you have the four items it now knows which are part of the deal and works out the price. Now the question to those reading this..... "Is this a defect?"

From the business requirements perspective this is, I guess, what they wanted from the software.

Since entering each item one at a time and then working out if it is part of the deal allows them to manage stock control and keep business costs under control. It has one less step for the operator to carry out rather than if they had to press a button to indicate it was a deal,

From the end user experience perspective it is flawed. It appears they did not expect customers to decline something that was in reality free. From the staff working there they had to override the system and implement manual processes to enable the correct price to be calculated. At the same time this will upset the stock control reporting that they have two less ice cream sundaes than they actually do.

When we look at software requirements and we start testing these requirements, it is important we look at the implementation from all perspectives. This is especially true for systems that businesses use to help facilitate customer service. Get this wrong and it can leave a bad impression on the business customers.

Tuesday, 14 October 2014

Risk vs Uncertainty in Software Testing

Traditionally software testing appears to be based upon risk and many models and examples of this have been published, just search the internet for ‘risk based testing’.

The following are a few examples from a quick search

The objective of Risk Analysis is to identify potential problems that could affect the cost or outcome of the project. Ståle Amland, 1999 http://www.amland.no/WordDocuments/EuroSTAR99Paper.doc

In simple terms – Risk is the probability of occurrence of an undesirable outcome ISTQB Exam Certification – What is Risk Based Testing 2014 - http://istqbexamcertification.com/what-is-risk-based-testing/

Risk:= You don’t know what will happen but you do know the probabilities, Uncertainty = You don’t even know the probabilities. Hans Schaefer, Software Test Consulting, Norway 2004 http://www.cs.tut.fi/tapahtumat/testaus04/schaefer.pdf

Any uncertainty or possibility of loss may result in non conformance of any of these key factors. Alam and Khan , 2013 Rsik Based Testing Techniques A perspective study http://www.academia.edu/3412788/Risk-based_Testing_Techniques_A_Perspective_Study

James Bach goes a little deeper and introduces risk heuristics

“Risk is a problem that might happen” James Bach 2003 Heuristics of Risk Based Testing http://www.satisfice.com/articles/hrbt.pdf

And continues with the following statement in the 'Making it All Work' section:

..don’t let risk-based testing be the only kind of testing you do. Spend at least a quarter of your effort on approaches that are not risk focuses..”

All of the examples above look at software testing and how to focus testing effort based upon risk they make no mention uncertainty. I have struggled to find any software testing models or articles on uncertainty which I feel can have value to the business in software projects. There are a few misconceptions of risk and uncertainty with people commonly mixing the two together and stating they are the same.

Some of the articles appear to follow the fallacy of mixing risk with uncertainty and attempting to measure uncertainty in the same way as risk. The issue I find with these articles in how you can measure something which has no statistical distribution?

One type of uncertainty that people attempt to measure is the number of defects in a product. Using complex formulas based upon lines of code or some other wonderful statistical model. Since the number of defects in any one product is uncertain I am unsure of the merits of such measures and their reliability.

Defect Density Prediction with Six Sigma - http://www.e-p-o.com/downloads/defectdensityprediction.pdf
Test Cases and Defects - http://www.softwaremetrics.com/Articles/defects.htm
Estimating the Number of Defects: http://www.cs.colostate.edu/~malaiya/p/li98.pdf

The concern here is how would you define a defect? Surely it is not only based upon the number of lines of code or number of test cases defined, but upon the uniqueness of each and every user? In other words what some may see as defects others will gladly ignore and say it is ok, it is the character of the program.

Let’s look at what we mean by risk and uncertainty:

Risk: We don’t know what is going to happen next, but we do know what the distribution looks like.
Uncertainty: We don’t know what is going to happen next, and we do not know what the possible distribution looks like.

Michael Mauboussin - http://www.michaelmauboussin.com/

What does this mean to the lay person?

Risk can be judged against statistical probability for example the roll of a dice. We do not know what the outcome (roll) will be (if the dice is fair) but we know the outcome will be a number between 1 and 6 (1 in six chance).

Uncertainty is where outcome is not known and there is no statistical probability. An example of uncertainty is what does your best friend intend to eat next week on Thursday at 5pm. Can you create a probability model for that event?

Basically risk is measurable uncertainty is not.

“To preserve the distinction which has been drawn in the last chapter between the measurable uncertainty and an unmeasurable one we may use the term "risk" to designate the former and the term "uncertainty" for the latter.” : - Risk, Uncertainty, and Profit Frank Knight 1921 - http://www.econlib.org/library/Knight/knRUP7.html

The problem is that many people see everything as a risk and ignore uncertainty. This is not a deliberate action and is how our brains work to deal with uncertainty. The following psychological experiment shows this effect

Ellsberg Paradox: http://en.wikipedia.org/wiki/Daniel_Ellsberg

The following example of the Ellsberg paradox is taken from the following article: http://www.datagenetics.com/blog/december12013/index.html

_____________

Let’s play a different thought experiment. Imagine there are two urns.

Urn A contains 50 red marbles and 50 white marbles.
Urn B contains an unknown mixture of red and white marbles (in an unspecified ratio).

You can select either of the Urns, and then select from it a random (unseen) marble. If you pick a red marble, you win a prize. Which Urn do you pick from?

Urn A
Urn B

In theory, it should not matter which urn you select from. Urn A gives a 50:50 chance of selecting a red marble. Urn B also gives you the same 50:50 chance.

Even though we don’t know the distribution of marbles in the second urn, since it only contains red and white marbles, this ambiguity equates to the same 50:50 chance.

For various reasons, most people prefer to pick from Urn A. It seems that people prefer a known risk rather than ambiguity.

People prefer to know the risk when making a decision rather than base it on uncertainty.

Next experiment: This time there is only one urn. In this urn is a mixture or Red, White and Blue marbles.

There are 90 marbles in total. 30 are Red, and the other 60 are a mixture of White and Blue (in an unknown ratio). You are given a choice of two gambles:

Gamble 1 you win $100 if you pick a Red marble.
Gamble 2 you win $100 if you pick a White marble.

Which gamble do you take? Now that you've read a section above you will see that most people seem to select Gamble 1. They prefer their risk to be unambiguous. A quick check of the expected value of both gambles shows they are equivalent (with a ⅓ probability). They go with the known quantity.

____________

The summary of this is that we tend to trend towards known risks rather than uncertainty.

What has all of this to do with software testing?

The majority of our testing is spent on testing based upon risk, with outcomes that are statistically known. This is an important task to do however does it have more value than testing against uncertainty? Using automated tools it is possible to test against all the possible outcomes when we are using a risk based testing approach. Risk is based upon known probabilities which machines are good at calculating and working through.

Since it is difficult to predict the future of uncertain events and we find it even more difficult to adjust our minds to looking for uncertainties then an exploratory testing approach may provide good value against uncertainties. Tools here can be of use such as random data generators, emulators where the data used for testing is not based upon risk but is entirely random and can provide unexpected results.

The key message of this article is that we need to be aware of confusing uncertainty with risk and ask ourselves are we testing based upon risk today or upon uncertainty. Each has value however sometimes one has more value than the other.

Sunday, 29 December 2013

Is Defect Removal Efficiency a Fallacy?

I noticed a tweet by Lisa Crispin in which they had said they had commented on this article http://blog.btdconf.com/?p=43. I may not agree with everything in the article but it makes some interesting points that I may come back to at a later date, still not seen that comment from Lisa yet...

However I did notice a comment by Capers Jones which I have reproduced below:

Capers Jones
December 29, 2013 at 10:01 am
This is a good general article but a few other points are significant too: There are several metrics that violate standard economics and distort reality so much I regard them as professional malpractice: 1 cost per defect penalizes quality and the whole urban legend about costs going up more than 100 fold is false. 2 lines of code penalize high level languages and make non coding work invisible The most useful metrics for actually showing software economic results are: 1 function points for normalization 2 activity-based costs using at least 10 activities such as requirements, design, coding, inspections, testing, documentation, quality assurance, change control, deployment, and management. The most useful quality metric is defect removal efficiency (DRE) or the percentage of bugs found before release, measured against user-reported bugs after 90 days. If a development team finds 90 bugs and users report 10 bugs, DRE is 90%. The average is just over 85% but top projects using inspections, static analysis, and formal testing can hit 99%. Agile projects average about 92% DRE. Most forms of testing such as unit test and function test are only about 35% efficient and fine one bug out of three. This is why testing alone is not sufficient to achieve high quality. Some metrics have ISO standards such as function points and are pretty much the same, although there are way too many variants. Other metrics such as story points are not standardized and vary by over 400%. A deeper problem than metrics is the fact that historical data based on design, code, and unit test (DCUT) is only 37% complete. Quality data that only starts with testing is less than 50% complete. Technical debt is only about 17% complete since it leaves out cancelled projects and also the costs of litigation for poor quality.

My problem is the use of this metric, which I feel is useless, to measure the quality of the software, it seems too easy to game. Once people (or company) are aware that they are being measured their behaviour on a psychological level (intended or not) adjusts so that they look good against what they are being measured against.

So using the example given by Capers

Say a company wants their DRE to look good so they reward their testing teams for finding defects, no matter how trivial and they end up finding 1000, the customers still only find 10.

Using the above example that means this company can record a DRE of 99.0099. WOW - that looks good for the company.

Now let us say they really really want to game the system and they report 1, 000, 000 defects against the customers 10 - this now starts to become a worthless way to measure the quality of the software.

It does not take into account the defects that still exist and never found, how long do you wait before you can say your DRE is accurate? The client finds another 100 six months later, a year later? The company testing finds another 100 when they have released to client how is this included in the DRE %?

As any tester would ask:

IS THERE A PROBLEM HERE?

This form of measurement does not take into account the technical debt of dealing with this ever growing list of defects. Measuring using such a metric is flawed by design, never mind the other activities which would also have hidden costs as mentioned by Caper, that will quickly spiral. By the time you have adhered to all of this, given the current market place of rapid delivery (dev ops) your competitors have prototyped, shown it to the client, adapted to meet what the customer wants and released to the client, implementing changes to the product as the customer desires and not focusing on numbers that quickly become irrelevant.

At the same time I question numbers quoted by Capers such as:

Most forms of testing such as unit test and function test are only about 35% efficient and fine one bug out of three.
Other metrics such as story points are not standardized and vary by over 400%
Quality data that only starts with testing is less than 50%
Plus others

Where are the sources for all these metrics? Are they independently verified?

Maybe I am a little more cynical of seeing numbers being quoted especially after reading "The Leprechauns of Software Engineering" by Laurent Bossavit. Without any attributions for the numbers I do question their value.

To finish this little rant I would like to use a quote from Albert Einstein

"Not everything that can be counted counts, and not everything that counts can be counted."