The expected result was 42. Now what was the test?: January 2011

Wednesday, 26 January 2011

What you believe might not be true. (Part 1)

When I started to look at how the human mind works and the traps that it continually falls into I did not realise what a huge area of psychology this is. The subject of bias and the human mind is fascinating and every tester should be aware that every decision we make when testing a product will be subjected to our cognitive biases.

I have previously touched upon how bias can affect our judgement when I wrote the blog post about confirmation bias and cognitive dissonance. We need to have awareness that what we think could be wrong and subjective to our own biases. There are ways to try and reduce cognitive biases by the use of pairing and debriefing however that is not the subject of this blog. The purpose of this blog is to look at some of the common cognitive biases in relation to their effect on testing.

I shall start by defining the term cognitive bias:

A cognitive bias is a mistake in reasoning, evaluating, remembering, or other cognitive process, often occurring as a result of holding onto one's preferences and beliefs regardless of contrary information.

http://www.cramster.com/definitions/cognitive-bias/784

Within the psychology field of cognitive biases there are many different types of biases some of which I have previously discussed. Within this article I will look at a few more which could have an effect on our testing. The whole area of cognitive bias is huge and I could write many more blogs on different types of bias and it is something I may return to at some point. Since it such a large area I may not go into great detail about each type of bias but give enough information for people reading this blog to be aware of the failings of our human minds.

One bias that intrigues me is called the conjunction effect.

A definition of this bias is described below:

When two events can occur separately or together, the conjunction, where they overlap, cannot be more likely than the likelihood of either of the two individual events. However, people forget this and ascribe a higher likelihood to combination events, erroneously associating quantity of events with quantity of probability.

http://changingminds.org/explanations/theories/conjunction_fallacy.htm

An example of this can be seen when using the experiment that Amos Tversky and Daniel Kahneman carried out:

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.

Which is more probable?

A) Linda is a bank teller.
B) Linda is a bank teller and is active in the feminist movement.

In the experiment 86% of people answered B even when using mathematics it can be proven than A is more probable. This is the conjunction fallacy in action. When your mind tricks you into believing something is more probable than it is.

Another experiment from Tversky and Kahneman during 1983 two different experimental groups was asked to rate the probability of two different statements, each group seeing only one statement:

A complete suspension of diplomatic relations between the USA and the Soviet Union, sometime in 1983.
A Russian invasion of Poland, and a complete suspension of diplomatic relations between the USA and the Soviet Union, sometime in 1983.

Even though the probability of each happening was low there was a significant difference in people choosing that the second statement was more likely.

The moral? Adding more detail or extra assumptions can make an event seem more plausible, even though the event necessarily becomes less probable.

How does this all relate to testing?

Within testing we have to look at statements and judge what is most probable to happen.

For example we see the following requirements:

Req1:If the user is a software engineer then screen ‘is engineer’ must be shown.

Req2: If the user is a software engineer and likes to listen to classical music then screen ‘music engineer’ must be shown.

Now if we look at the following user story.

David went to university to train as a classical violinist. Once leaving university David retrained as a software engineer ad started to write code for a major software house.

Which is more likely to be true?

Most people will see Req2 as the most likely to be true, and ignore Req1. However the probability that Req2 is more likely in the given user story is far less than Req1.

Now with these two requirements you can see a conjunction effect, in which one of the statements appear to more likely than the other and as testers we should be aware of this. However our minds might not notice that there is a conjunction and our human bias takes over so when we test we assume that only req2 is valid and ignore req1. However the probability of req2 is less than req1, we need to be aware of this bias and try to eliminate it. This is why context is important, we need to apply context to the situation and to the test to ensure it is valid, correct and nothing is being missed.

The issue we face as testers is that some requirements and the resulting test ideas we come up with could be subjected to the conjunction effect were our minds are telling us that the probability of event x happening is greater than the probability of event y even when if we look at the mathematics the probability between each event is the same or event x actually has a lower probability.

How can we prevent his bias?

Experiments that have tried to repeat the Tversky and Kahneman bank teller fallacy noted that if before the decision was made people were allowed to discuss and communicate their thoughts with others then conjunction fallacy occurrence was significantly lower.

http://www.econ.ohio-state.edu/levin/wpapers/linda_geb_2nd_revision1-1.pdf

This indicates that it is possible to reduce the chance of conjunction fallacy occurring just be simply communicating and talking with other people.

When researching this area for the blog I found that there are a lot of links to cognitive framing:

http://www.adsavvy.org/the-power-of-framing-effects-and-other-cognitive-biases/

The problem being that the way something is worded (framed) can lead people to be subjective to conjunction fallacy. Maybe this is something that architects and technical authors need to be aware of when creating design and requirement documents? How a requirement is framed could influence a developer to write code in certain way and become subjected to the conjunction fallacy giving more probability to some event or requirement that is actually true, developers please be aware of this.

What fascinates me on this particular area is the idea that the way things are worded (framed) and how our mind understands these words (conjunction fallacy) could be the reason that a lot of bugs are created in code. I would love to gather data on this and see if there is some correlation.

So if we look at framing and how it can influence our thought process and cause bias. The problem with framing is that it can be subjective. If we frame a sentence in such a way as to force people to believe that a certain fact is true then there will be some people in which it will have no influence. Framing is used a great deal within politics and advertising to encourage people to belief in a certain policy or product. It can be very powerful as a tool. Michael Bolton ran a workshop at Eurostar 2010 on test framing (add link) and this was very useful to help the tester to think if a test needs to be run or why it was not run however it was only after Eurostar that I had a thought that framing can be used in a opposite direction and become dangerous as a tool. It could be used to force people to think that the wrong view is the correct view. An example of this is to have a list within a test plan of 1000s of test cases but to frame it in such a way that to the casual reader every single test case must be run and is of high value. This is before the tester has actually touched or used the product to be tested. The framing is used to justify wasted effort and work.

For example look at the following statement:

I prefer to code using Java than C++ because it is easier with the development suite I have installed and C++ does not work within the development environment I have set up.

The framing bias here is that the person thinks Java is superior to C++ because of how they have set up their environment. It maybe that using C++ would make it easier for the developer on the project they are working on but they are trying to give reasons why they are working in Java instead of C++.

There are many more examples in the world of software development. I prefer OS x to OS y because of z. Machine type ‘a’ is far better than machine type ‘b’ because it does ‘xyz’. It does not matter that the person making the statement has never used OS y or machine type ‘b’. They are forming a bias viewpoint and using framing to justify it. IMO this is why the context driven way of testing is so important. When looking at requirements and statements it is easier to think of ‘in which context’ to verify if the requirement is just and sound.

As testers we need to be aware of this and when we report our findings during testing be aware of how we frame the words.

Part 2 of this article will look at belief projection and how it may hinder our testing efforts.

Tuesday, 18 January 2011

Are testers’ ethnographic researchers?

People who follow me on twitter or via this blog might be aware that I have a wide range of interests in areas outside my normal testing job. I like to research and learn different things, especially psychology, and see if it may benefit and improve my skills and approaches during my normal testing job. One area I have being looking at for awhile is the social science of ethnography. The approaches used when carrying out research appears to have many similarities to software testing and I feel we could benefit and maybe improve our testing skills by examining ethnography.

IMO there are two areas in which we can learn from ethnography:

To improve our understanding of users and how they differ by using ethnographic methods

Use ethnographic methods to test software in an exploratory way.

I should start by explaining what my understanding of ethnography is:

Wiki attempts to define it here:

http://en.wikipedia.org/wiki/Ethnography

The free dictionary attempts to give a definition here:

http://www.thefreedictionary.com/ethnography

A better definition can be found here:

http://www.brianhoey.com/General%20Site/general_defn-ethnography.htm

The problem with trying to describe and define ethnography is that it has wide and varied meanings.

To me it is a branch of the study of humanity (anthropology) in which the researcher actively gets involved and participates with the study group rather than just sitting back and observing. The reporting is doing using qualitative (words) measurements rather than rely on quantitative (numbers) measurements.

One of the key factors when approaching ethnographic research is to be aware that participation, rather than just observation, is one of the keys to the approach. Does this not sound familiar to testing, especially exploratory testing? Actively using the software under test to find out about its characteristics and behaviour are similar to a ethnographic researcher living within a community and participating with that community to learn about its beliefs and characteristics. There appears to be very close parallels between ethnographic research and exploratory testing. Wikipedia states:

One of the most common methods for collecting data in an ethnographic study is direct, first-hand observation of daily participation.

How similar is that to testing software?

Another approach within ethnography is the use of grounded theory to explain the results from the participation. This is when the data is used to provide theories about the data. This is different from grand theory in which the theory is defined without the use of real life examples and therefore has a danger of not fitting the actual data gathered afterwards (is this similar to scripted and exploratory, grand theory vs grounded theory?)

Grounded theory is a constantly evolving set of conclusions that can continue indefinitely based upon the changing data being obtained by the ethnographic researcher. One of the questions that are asked about ethnographic research is:

When does this process end?

One answer is: never! Clearly, the process described above could continue indefinitely. Grounded theory doesn't have a clearly demarcated point for ending a study. Essentially, the project ends when the researcher decides to quit. (http://www.socialresearchmethods.net/kb/qualapp.php)

How similar is this to testing?

When do we stop testing?

Many articles have been written on this subject and mainly we stop when we can learn nothing new, no time or ran out of money. See this article by Michael Bolton for more information

I feel that ethnographic research stops because of similar reasons.

One interesting section I saw within the wiki article was about the process of ethnographic research in which to aid the researcher areas were split and the research asked questions.

Substantive Contribution: "Does the piece contribute to our understanding of social-life?"
Aesthetic Merit: "Does this piece succeed aesthetically?"
Reflexivity: "How did the author come to write this text…Is there adequate self-awareness and self-exposure for the reader to make judgements about the point of view?"
Impact: "Does this affect me? Emotionally? Intellectually?" Does it move me?
Expresses a Reality: "Does it seem 'true'—a credible account of a cultural, social, individual, or communal sense of the 'real'?"

I thought about this and started to change the context to be about software testing:

Substantive Contribution: "Does the testing carried out contribute to our understanding of the software?"
Aesthetic Merit: "Does the software succeed aesthetically?" Is it suitable for the end user?
Reflexivity: "How did the author come to write this test…Is there adequate self-awareness and self-exposure for the reader to make judgements about the point of view?"
Impact: "Does this affect me? Emotionally? Intellectually?" Does it move me?
Expresses a Reality: "Does it seem 'true'—a credible account of a requirement'?"

By doing this I found I suddenly had a set of heuristics to measure against the software testing that has been carried out, yet again more similarities between the two crafts.

Another area in which ethnographic research can be useful to software testing is when you need to test software that has a lot of UI interactions. Using the methods of ethnography a tester could go visit the users and observe and participate in their daily routine to find out the common tasks carried out and what oddities are seen. The oddities are the things of greatest interest since these are the things that would not normally be planned for and without active participation with the users would normally not be uncovered until it is too late.

There are many studies being carried out to determine if ethnographic research should be used when designing software system, however my concern with this is that it appears to be stuck in the design up front way of working which is not a flexible iterative approach, in my view it is easier, quicker and cheaper to ensure that testers use ethnographic methods when testing to ensure the design is suitable for users or even better get the users involved earlier and observe them earlier.

The more I have delved into the study of ethnography the more and more I have seen similar patterns to software testing. This makes me aware that software testing is not solely a hard science but a craft that encompasses many disciplines outside of the typical number crunching and algorithm creating world of software development.

Within the testing profession we need to look outside of the box and find approaches, methods, structures that can improve the discipline. To ensure our craft grows we need to ensure we do not narrow out field of vision or thought.

Friday, 14 January 2011

Remember you’re a Tester

I want you to remember one word in the following list:

Bug
Insect
Ant
Dragon Fly
Ladybird
Crane Fly
Beetle
Bee
Wasp
Hornet
Cockroach
Earwigs
Termite
Grasshopper
Flea
Mosquito

My previous post was about debrief and how important it is to testing.

The problem I have come across during debrief has been trying to remember all the things that happened during the day or during the session(s). Maybe this is a, me getting older thing, and my memory is going.

One thing whilst I was reading recently about cognitive bias seemed to be a bias that could be helpful both during testing and during the debrief sessions. This was called the Von Restorff Effect and basically it was how our brains remember things that stand out.

http://changingminds.org/explanations/memory/von_restorff.htm

The above link uses an example of a list of words in which one word is in a different colour, our brains are more likely to remember the word that is a different colour and stands out.

You might be asking what connection does this have to testing.

Michael Bolton via twitter pointed me towards Adam White who has an interest in the Von Restorff effect. In his blog Adam states the following:

I use the Von Restorff effect in testing all the time. I frequently notice what doesn’t belong and it tends to be what I remember the most. http://www.adamkwhite.com/2007/09/30/using-heuristics-to-cook/

Part of our skills as a tester is noticing:

When something does not appear to fit – we notice
When something appears out of place – we notice
When something appears not quite right – we notice.

Could it be that testers have a strong Von Restorff cognitive bias? Maybe this is the missing ‘thing’ that people say testers have. You can not describe it but you just know it is a skill you have.

Going back to this article…..

How can this help during debrief?

My thoughts on this are that to remember something that is important for the debrief when we are working within SBTM. We should make a note of what it is and ensure we highlight it in a different way to make it stand out and ensure that we remember it later. Maybe some people already do this (the use of the highlighter pen).

Maybe the excellent tool for recording sessions, Rapid Reporter by Shmuel Gershon be expanded. Can it have an option to highlight certain things to make them stand out. I know it can do rtf and bold but that is not enough for me. I need highlighting and colouring, plus an option to do freehand doodles.

Why doodles you may ask?

One of the side effects of the Von Restorff effects is that we remember words better if associated with a picture. If I need to remember a URL is not working I would doddle a chain with a link missing. Or an interface that is failing to communicate I could draw a face with a plaster over the mouth. Just little things that help me remember problems that occurred. By the way I am rubbish at drawing not on the same level as the cartoon tester.

To conclude this article I think as testers we already have a cognitive bias to remembering things that stand out but within testers it appears we notice these things a lot more, either via our continuing training or a natural skill we possess. We need to ensure that important things we need to remember for debriefs are made to stand out during our testing sessions to ensure we do not forget them.

Which word did you remember from the list at the beginning?

Was it termite?

Tuesday, 11 January 2011

The Feedback Loop

One of the critical elements of following the session based test management (http://www.satisfice.com/sbtm/) approach is the use of quick feedback. To achieve this it is suggested that a debrief should be done at the end of each session/day. Jon Bach (http://www.satisfice.com/articles/sbtm.pdf) suggest the use of PROOF

Past. What happened during the session?
Results. What was achieved during the session?
Obstacles. What got in the way of good testing?
Outlook. What still needs to be done?
Feelings. How does the tester feel about all this?

This approach is excellent for communicating what has happened during the testing session(s), however I keep hearing that people are not doing the debrief . There are many reasons why these are not being done, lack of time/resource or see no benefit are a few of the reasons given. This blog post is why it is important to carry out these debriefs and ensure they are done sooner rather than later.

I am looking at this from a psychology viewpoint to highlight the way our minds work and to keep reminding readers that software testing is a human sapient process and not an automated ticking of boxes process.

There are various studies that have indicated that the longer you take to act upon information the less you are able to recall that same information at a later date. During Eurostar 2010 Graham Freebur stated that unless you act upon information you had digested at the conference then within 72 hours that information would start to be lost and fade. The crucial part of this is that as humans we are fallible and lots of different psychological biases start to play with our minds so unless we can talk and pass on the information we have as soon as possible the more likely that the data we have will become clouded.

It is important that we debrief to someone to ensure that any error in our interpretation of the system under test can be corrected. The reasoning behind this is when we are testing a complex system we make assumptions as we test and the system may appear to confirm our assumptions and as such fuel what could be incorrect interpretations of the system. A computer system will never be able to inform you that your assumptions are wrong or right it could indicate a bias one way or another. The only way to repair errors in interpretations is to interact with a human being. This is the reasoning why debrief is very important so that any assumptions can be challenged and if necessary corrected.

As humans we are very good at being adaptive and changing our viewpoint and opinion when presented with new information but to do this effectively it needs to be a conversational setting, we are very bad at dealing with delayed feedback and the longer it is left the more likely we will keep our initial bias and interpretations.

The point of this rather short blog post is to explain why debrief after a testing session is important and that it needs to be done as soon as possible. Delays and excuses only cause more assumptions and incorrect information to appear to be the correct answer.

Make the time to debrief, plan for it and use it, it is crucial element of testing.

Wednesday, 5 January 2011

Autistic Software

When we start to test a system as testers we normally start with a lot of assumptions predicting how the software will act dependant on certain inputs. However how the software reacts will be interpreted differently depending on who is doing the testing. This is especially true when testing UIs or systems that depend on human interaction.

I recently read an * article by Danah Boyd in which they described a problem in the way in which software is developed without much thought being placed upon what the user needs or practices. This article was first published in 2004 and I still feel it is very relevant to software development today.
*(http://www.danah.org/papers/AutisticSocialSoftware.pdf)

More and more software is being developed which requires interactions with humans and of course as we as software testers know humans are fallible and unpredictable. The problem with developing software is that it is very simple and easy to do the complex mathematical formula stuff since this is what a computer is good at, in reality a glorified calculator. As testers finding problems in these logical areas should be simplistic. This is not to disrespect the skill of testing, but knowing the formula used and without any UIs it is a fairly easy task for a skilled tester to determine if a problem exists. IMO the problems start to appear when we develop software that requires interaction with humans. I am not aware of any research done in this field of software development but I wonder how much time is spent creating and fixing UI systems. If we look at the characteristics for a given piece of software we can see that it is very good with numbers but has poor social interaction skills. This is very similar to the characteristics of autism:

• "Socialising doesn't come naturally - we have to learn it."

People with autism often have difficulty recognising or understanding other people's emotions and feelings, and expressing their own, which can make it more difficult for them to fit in socially. They may:

• not understand the unwritten social rules which most of us pick up without thinking: they may stand too close to another person for example, or start an inappropriate subject of conversation
• appear to be insensitive because they have not recognised how someone else is feeling
• prefer to spend time alone rather than seeking out the company of other people
• not seek comfort from other people
• appear to behave 'strangely' or inappropriately, as it is not always easy for them to express feelings, emotions or needs.
• Difficulties with social interaction can mean that people with autism find it hard to form friendships: some may want to interact with other people and make friends, but may be unsure how to go about this.
(http://www.autismsussex.org.uk/training/WhatIsAutism/characteristics_of_autism)

So as testers what can we do to try and improve the poor human interaction skills of software? Are there any approaches that can help this? Another area I have been researching is Ethnography (http://en.wikipedia.org/wiki/Ethnography) and how similar this is to software testing. (More on this in a future blog)

One of the many things I liked in the article by Danah was how the software was being designed to encourage people to have different identities so that they could protect their identity. This is so much against human nature and Danah expressed their concerns with the following:

Why on earth should we encourage people to perform a mental disorder in the digital world?

Quite a statement!!!

I think from a testing perspective that we need to ensure that software being developed meets the practices and needs of the user rather than the technology needs of the software. We really should be trying to make the software more socialable and be able to interact with a variety of different human types. We should look to understand the user and what their needs are rather than force them to work in a particular way. We should look at what is most suitable for the user. If we can manage to do this then we can start to produce and release software that does not frustrate and annoy users. This maybe a Utopia belief but as testers I feel we are the ones that need to drive this way of thinking. So the next time you are testing a UI and it starts to frustrate you ask yourself why. Do not put up with autistic software, teach it to become more socially aware.