I have had this idea in my mind and on my backlog for quite
a while. It was only after speaking at
MEWT in Nottingham that I felt I really should get around to writing it.
There are many debates in the software development world about
‘test automation’ and how we can ‘automate all of the testing’. I am in the context of this article ignoring
the difference between testing and checking, more details of this discussion
can be found here - Testing and Checking Refined .
However some of my ideas and concepts will touch on the difference between
checking and testing.
Many have put forward arguments about automating what we
know and if we have defined requirements up front then it should be possible to
automate these. My counter to this is that in many sports there are well
defined upfront requirements (laws) of how the game should be regulated. For example the laws of football (Soccer to
those outside of Europe) can be found online here: FIFA Laws of the Game. If this is the case and these requirements
are defined upfront then why do we not have automated referees? I asked this
question on twitter and some of the responses gave reasons due to the psychical
limitations, such as battery power unable to run and so forth. My line of thought is on how these
requirements can, and are, interpreted.
Looking deeper in to the football laws
of the game it can be seen there are many ambiguous statements which given the
current state of AI, at the time of the publication of this article, I feel are
impossible to automate. For example on page 39 it states the following as a reason
for a player to be cautioned.
“unsporting behavior”
What does this mean?
Page 125 attempts to define this with a list of what constitutes
unsporting behavior. One of this in particular I found interesting, it is one based
on human nature of trying to con or cheat
“attempts to deceive the referee by feigning injury or pretending to have been fouled (simulation)”
This I feel would be a common sense decision made by the referee. How could an automated system know if is fake or not? Then again how would the ref know? It it is a
common sense decision, being made depending on a multitude of factors and
contexts.
How about this one?
“acts in a manner which shows a lack of respect for the game”
What would count as lack of respect? A player who in the last second of the game
lets in a goal that allows the opposition to win the title. The player shows human emotion
and frustration, there is a fine line between emotion and respect or the lack
of it?
My issue with this automation
debate is that at this time it is not possible to automate common sense and
multiple contexts in the decision making process that a referee has to go though in
their thinking process.
For example a team is
winning 20 – 0 a machine would continue to officiate the game in accordance to the
strict letter of the law. Whereas a
human referee would allow some flexibility in the interpretation of the rules. They will allow some aspects of empathy to
be applied to the game. Is it yet possible
to automate empathy?
James Christie made a valid point on twitter that the reason
in the majority of sports they are called laws and not rules is that:
“rules are detailed and specific whilst laws can be based on vague principles, which require interpretation and judgment. “
This makes sense since most countries have courts where
lawyers debate how the laws of the land can or should be interpreted. Then a jury, judge or set of judges make a decision
based upon the arguments presented. Another case of were the requirements are
listed and known but given current AI limitations would be impossible to fully automate.
Even though we know that human beings are flawed in the judgement that are
made, would using an automated judgement machine be any less flawed, if at all
possible to produce?
Returning back to the laws of sport and how ambiguous those
laws are we can look at the laws of Rugby Union
Looking at the beginning of the laws on page 21 there is
guidance on how the laws should be applied:
The Laws must be applied in such a way as to ensure that the Game is played according to the principles of play. The referee and touch judges can achieve this through fairness, consistency, sensitivity and, at the highest levels, management.”
How would you automate sensitivity in this context?
According to the Oxford English Dictionary this in this context is defined as:
“A persons feelings which might be easily offended or hurt”
Add into that equation “fairness”, we are now journeying
down the automation rabbit hole.
Looking at the laws regarding fair play and the guidance
that the document provides for foul play (Law 10) section m gives the following
guidance.
“Acts contrary to good sportsmanship. A player must not do anything that is against the spirit of good sportsmanship in the playing enclosure”
What constitutes “the sprint of good sportsmanship”? How do
you clarify between intentional and unintentional behavior? Again I am uncertain if this kind of decision
could be automated.
If we look at the laws of Rugby League we can see similar issues
in how difficult it can be for the laws to be interpreted. Rugby league was one of the early adopters of video
technology to help assist the referee in the game. This is what Michael and James in their article
would define as tool assisted testing. In this case a video referee can review
certain decisions via the use of video technology.
Looking at the definition of a forward pass.
“is a throw towards the opponents’ dead ball line”
How do you define this in the context of a fast moving game? Under the section 10 which offers some
guidance there is a distinction between deliberate and accidental forward
passes. How do you make a distinction
between these two actions? Also would an automated system be able to deal with factors
such as the momentum of the player and the wind moving the ball. Yes they could process information quicker
than a human could but would it be right?
This is not to say that referees are not fallible and there
are many instances in sport of them making mistakes; however people are aware
of this and can accept that fact. Would
people be so willing to accept a machine making similar mistakes based upon our
biases that machine are not fallible?
Many sports are implementing some level of automated systems
which are used to aid the referees.
- Tennis has been using Hawkeye since 2002
- Football has started to implement goal-line technology
- Cricket uses the Umpire Decision Review System
It is interesting to note that each of these automated systems
have had some controversy regarding their accuracy and success especially with
the cricket system.
To conclude when people discuss test automation and attempt to automate as much as possible there is a need to step back and think critically.
Automation in software development has a place and is a useful tool to use, however,
it should not be thought of as an alternative to testing as applied by a human
being. Even when you think you have the requirements
nailed down they are words and as such are open to a multitude of interpretations.
Using a mixture of automation, tool assisted
testing and human testing in a ratio that adds value to the quality of the product
being delivered is a more thoughtful approach rather than the mantra of we can “automate all the testing effort.” Going forward we need to be thoughtful of what machines can do and what they cannot do. This may change as technology progresses but as of the publication of this article there are big limitations in automation.
That was an interesting Twitter discussion and a good blog you wrote here. I'd like to pick out one example you gave:
ReplyDelete" “attempts to deceive the referee by feigning injury or pretending to have been fouled (simulation)”
This I feel would be a common sense decision made by the referee. How could an automated system know if is fake or not? Then again how would the ref know? It it is a common sense decision, being made depending on a multitude of factors and contexts."
You used one of my pet hate words "common sense". It can be common sense to throw people into a volcanoe to safe yourself from the anger of the gods, perfectly valid approach...
I had a good discussion with Duncan Nisbet the other day explaining about the different levels of tacit knowledge and we spoke about situations where it's problematical making the tacit explicit. Your example is one of those.
What is the scenario, when someone is feigning injury? First the referee or the machine have to detect that there is a feigned or real injury. How is that detection going on? You'd have visual information (player rolling on the grass), audio (player screaming their heart out), cues in the facial expression, other stakeholders giving right or wrong information (other players, other referee, etc) and quite a few more besides.That is a lot of information to process with a short time span where the referee, automated or not, has to take a decision. It's one example where all the experience and accumulated tacit knowledge can't be made explicit and therefore it can't be programmed to the level that would be required.
Nice, thanks for making me think.
You make some some good points Thomas.
DeleteI do love the phrase "Common Sense" since in the industry in which we work unfortunately common sense is not that common! I witness this by the continuous re-hashing of the 'testing is dead' discussions or we 'we can replace manual testers' When you step back and really think, like you have done above, it seems sensible to disagree with what they are saying. To me your example about volcano enters into the world of beliefs and assumptions rather than applying common sense, which leads down another dark path of human behavior!
I'd argue that common sense IS a shared system of beliefs and assumptions.
DeleteEating the heart of your enemies makes you stronger is another one. Ok, both are outdated but that only means there's a time component.
My point is that common is highly subjective. How would you test for common sense and how does it differ from shared understanding? Something to ponder ��
Having been a basketball referee for 15 years and being a tester, I really like your post and have actually applied some of the ideas you mentioned. Be it refereeing or testing, tools can only support, not take the whole job over. On the lighter side: for basketball there is not just a rule book, but an official book of interpretation. Automating the rules would in this context lead to wrong results as they might have to be interpreted, often in strange ways.
ReplyDelete