Tuesday, August 12, 2008

Positive and Negative Testing

by Jeff Nyman. GlobalTester, TechQA, Copyright © 2002 All Rights Reserved.

The notion of something like "Integration Testing" or "System Testing" can (and should) be defined so that everyone knows what is meant by that activity within the same organizatoin, but terms like "negative test" and "positive test" are more of a concept than a strict activity. In both instances you are dealing with an input, an action, and an output. The action acts upon the input to derive a certain output. So a test case (and thus a good test) is just one that deals with those three things. Both test cases can produce errors and, in fact, some say that the success of a test case is based upon the probability of it finding new errors in an application.

What I want to do here, however, is state clearly one viewpoint of what the distinction between positive and negative testing is. Then I want to play Devil's Advocate and try to undermine that viewpoint by presenting an argument that others have put forth - an alternative viewpoint. The real point of this will be to show that sometimes trying to adhere too rigidly to conceptual terms like this can lead to a lot of stagnating action. Read this section as a sort of extended argument that I am having with myself as I come to grips with these terms.

So let us first state a simple hypothetical definition: positive testing is that testing which attempts to show that a given module of an application does what it is supposed to do. Negative testing is that testing which attempts to show that the module does not do anything that it is not supposed to do. So, by that logic, and to make a concrete example, an application delivering an error when it should is actually an example of a positive test. A negative test would be the program not delivering an error when it should or delivering an error when it should not. But this sounds like it is more based on what the application does during testing rather than how the tester is actually going about testing it. Well, sort of. The idea here is that neither test necessarily has to force an error condition, per se, at least by strict definition. But both concepts (negative and positive) are looking for different types of error conditions. Consider that one part of negative testing is often considered to be boundary analysis. In this case, you are not so much "forcing an error" because, of course, the application should handle boundary problems. But what you are doing is seeing if the boundary problem is not, in fact, handled. So if the program is supposed to give an error when the person types in "101" on a field that should be between "1" and "100", then that is valid if an error shows up. If, however, the application does not give an error when the user typed "101" then you have a problem. So really negative testing and positive testing are the same kinds of things when you really boil it right down.

Now, some make a distinguishing remark from what I said. I said the following:

Positive testing is that testing which attempts to show that a given module of an application does what it is supposed to do.
Negative testing is that testing which attempts to show that the module does not do anything that it is not supposed to do.
Playing the Devil's Advocate, others would change this around and say the following is a better distinction:

Positive testing is that testing which attempts to show that a given module of an application does not do what it is supposed to do.
Negative testing is that testing which attempts to show that the module does something that it is not supposed to do.
Let us look at this slightly shifted point of view. By this logic, we would say that most syntax/input validation tests are positive tests. Even if you give an invalid input, you are expecting a positive result (e.g., an error message) in the hope of finding a situation where the module either gives the wrong error message or actually allows the invalid input. A negative test is, by this logic, more trying to get the module to do something differently than it was designed to do. For example, if you are testing a state transition machine and the state transition sequence is: State 1 -> State 2 -> State 3 -> State 4, then trying to get the module to go from State 2 to State 4, skipping State 3, is a negative test. So, negative testing, in this case, is about thinking of how to disrupt the module and, by extension, positive testing is examining how well/badly the module does its task.

Now, in response to this, I would agree that most can see looking at it this way from what the tester hopes to find. Testing pundits often tell testers to look for error because if you look for success, you will often find success - even when there is error. By proxy, if you do not find an error and you have reliable test cases (that latter point is crucial), then a positive test case will show that the application did not, in fact, manifest that error. However, showing an error when it should have done so is an example of a "positive test" by the strict definition of that term. So in other words:

Positive Testing = (Not showing error when not supposed to) + (Showing error when supposed to)
So if either of the situations in parentheses happens you have a positive test in terms of its result - not what the test was hoping to find. The application did what it was supposed to do. By that logic:

Negative Testing = (Showing error when not supposed to) + (Not showing error when supposed to)
(Usually these situations crop up during boundary testing or cause-effect testing.) Here if either of the situations in parentheses happens you have a negative test in terms of its result - again, not what the test was hoping to find. The application did what it was not supposed to do.

However, in both cases, these were good results because they showed you what the application was doing and you were able to determine if it was working correctly or not. So, by my original definitions, the testing is all about errors and finding them. It is just how you are looking for those errors that makes the distinction. (Granted, how you are looking will often dictate what you are hoping to find but since that is the case, it hardly makes sense to make a grand distinction between them.) Now, regarding the point I made above, as a Devil's Advocate: "A negative test is more trying to get the module to do something differently than it was designed to do." We have to realize, I think, that what we call "negative testing" is often about exercising boundary conditions - and those boundaries exist within the context of design. Granted, that can be trying to get a value in to a field that it should not accept. However, a good application should have, during the requirements stage, had provisions for invalid input. Thus really what you are testing here is (a) whether the provisions for invalid input exist and (b) whether they are working correctly. And, again, that is why this distiction, for me (between positive and negative), is somewhat banal.

Your negative test can turn into a positive test just be shifting the emphasis of what you are looking for. To get the application to do something it is not designed to do could be looked at as accepting invalid input. However, if you find that the application does accept invalid input and does not, in fact, give a warning, I would agree that is a negative test if it was specified in requirements that the application should respond to invalid input. In this case the application did not, but it was not also specified that it should. So, here, by strict requirements did the application do what it was supposed to do? Technically, yes. If requirements did not specify differently, design was not put in place to handle the issue. Thus you are not testing something outside the scope of design. Rather, you are testing something that was not designed in the first place.

So, going back to one of the previous points, one thing we can probably all agree on: it entirely depends on how you view a test. But are we saying the result of the test determines whether it was a positive or negative test? If so, many would disagree with that, indicating that it is the thinking behind the test that should be positive or negative. In actuality, most experienced testers do not think in terms of positive or negative, they think in terms of "what can I do to establish the level of risk?" However, to this point, I would argue that if that is truly how the tester thinks of things then all concepts of positive/negative go right out of the window (as I think they mostly should anyway). Obviously you could classify the test design in terms of negative or positive, but to some extent that is irrelevant. However, without getting into that, I am not sure we are saying that the result of the test determines positivity or negativity. What I said earlier, relative to my example, was that "in both cases, these were good results because they showed you what the application was doing and you were able to determine if it was working correctly or not." If the application was behaving correctly or incorrectly, you still determined what the application was actually doing and, as such, those are good results. Thus the result tells you about the application and that is good (without recourse to terms like positive and negative). If the result tells you nothing about how the application is functioning that is, obviously, bad (and, again, this is without recourse to positive or negative).

We can apply the term "effective" to these types of test cases and we can say that all test cases, positive or negative, should be effective. But what about the idea of relying on the thinking behind the test? This kind of concept is just a little too vague for me because people's thinking can be more or less different, even on this issue, which can often depend on what people have been taught regarding these concepts. As I showed, you can transform a postive test mentality into a negative test mentality just by thinking about the results of the test differently. And if negative testing is just about "disrupting a module" (the Devil's Advocate position), even a positive test can do that if there is a fault. However I am being a little flip because with the notion of the thinking behind the test, obviously someone here would be talking about intent. The intent is to disrupt the module so as to cause a fault fault and that would constitute a negative test (by the Devil's Advocate position) while a positive test would not be trying to disrupt the module - even though disruption might occur (again, by the Devil's Advocate position). The key differentiator is the intent. I could sort of buy that but, then again, boundary testing is an attempt to disrupt modules because you are seeing if the system can handle the boundary violation. This can also happen with results. As I said: "Your negative test can turn into a positive test just be shifting the emphasis of what you are looking for." That sort of speaks to the intention of what you are hoping to find but also how you view the problem. If the disruption you tried to cause in the module is, in fact, handled by the code then you will get a positive test result - an error message of some sort.

Now I want to keep on this point because, again, some people state that negative testing is about exercising boundary conditions. Some were taught that this is not negative testing; rather that this is testing invalid inputs, which are positive tests - so it depends how you were taught. And figure that a boundary condition, if not handled by the code logic, will potentially severely disrupt the module - which is the point of negative testing according to some views of it. However, that is not the intent here according to some. And yet while that was not the intent, that might be the result. That is why the distinction, for me, blurs. But here is where the crux of the point is for me: you can generlaly forget all about intent of test case design for the moment and look at the distinction of what the result is in terms of a "positive result" (the application showed me an error when it should have) and a "negative result" (the application did not show me an error when it should have). The latter is definitely a more negative connotation than the former, regardless of the intent of the tester during design of the test case and that is important to realize because sometimes our intentions for tests are changed by the reality of what exists and what happens as a result of running the tests. So, in the case of intent for the situation of the application not showing an error when it was supposed to, this is simply a matter of writing "negative test cases" (if we stick with the term for a moment) that will generate conditions that should, in turn, generate error messages.

But the point is that the intent of the test case is to see if the application does not, in fact, generate that error message. In other words, you are looking for a negative result. But, then again, we can say: "Okay, now I will look that the application does generate the error message that it should." Well, in that case, we are really just running the negative test case! Either way the result is that the error either will or will not show up and thus the result is, at least to some extent, determining the nature of the test case (in terms of negative or positive connotation). If the error does not show up, the invalid input might break the module. So is the breakdown this:

P: Not showing error when not supposed to
N: Not showing error when supposed to
P: Showing error when supposed to
N: Showing error when not supposed to
I think the one thing we have to consider is the viewpoint: hinges on the idea of "negative testing" being looked at as forcing the module to do something it was not designed to do. However, if the module was never designed to do the thing you are trying, then your testing is of an interesting sort because, after all, you know nothing exists to handle it. So the real question should not be: "What happens when I do this?" but rather "Why have we not designed this to handle this situation?" Let us say that something is designed to handle the "module disruption" you are proposed to test. In that case, you are actually positively testing the code that handles that situation. To a strict degree, forcing a module to do something it was not designed to do suggests that this is something your average user can do. In other words, your average user could potentially use the application in such a fashion that the negative test case you are putting forth could be emulated by the user. However, if that is the case, design should be in place to mitigate that problem. And, again, you are then positively testing.

Now, one can argue, "Well, it is possible that the user can try something that there simply is no way to design around." Okay. But then I ask: "Like what?" If there is no way you can design around it or even design something to watch for the event, or have the system account for it, how do you write a valid test case for that? I mean, you can write a test case that breaks the application by disrupting the module but -- you already knew that was going to happen. However, this is not as cut and dry as I am sure anyone reading this could point out. After all, in some cases maybe you are not sure that what you are writing as a test case will be disruptive. Ah, but that is the rub. We just defined "negative testing" as trying to disrupt the module. Whether we succeed or not is a different issue (and speaks to the result), but that was the intent. We are trying to do something that is outside the bounds of design and thus it is not so much a matter of testing for disruption as it is testing for the effects of that disruption. If the effects could be mitigated, that must be some sort of design that is mitigating them and then you are positively testing that mitigating influence.

As an example, a good test case for a word processer might be: "Turn off the computer to simulate a power failure when an unsaved document is present in the application." Now, the idea here is that you might have some document saving feature that automatically kicks in when the application suddenly terminates, say via a General Protection Fault (GPF). However, strictly speaking, powering down the computer is different than a GPF. So here you are testing to see what happens if the application is shut down via a power-off of the PC, which, let us say, the application was not strictly designed to really handle. So my intent is to disrupt the module. However, in this case, since I can state the negative condition, I can state a possible design that could account for it. After all: we already know that the document will not be saved because nothing was designed to account for that. But the crucial point is that if nothing was designed into the system to account for the power-off of the PC, then what are you really testing? You are testing that the application does what the application does when a power-off occurs. But if nothing is designed to happen one way or the other, then testing for disruption really does you no good. After all, you know it is going to be disrupted. That is not in question. What is (or should be) in question is how you can handle that disruption and then test how that handling works. So let us take those alternate (Devil's Advocate) definitions:

Positive testing is that testing which attempts to show that a given module of an application does not do what it is supposed to do.
In this case of the power-down test case, we are not positive testing because we did not test that the application did not do what it was supposed to do. The application was not "supposed" to do anything because nothing was designed to handle the power-down.

Negative testing is that testing which attempts to show that the module does something that it is not supposed to do.
In the case of the power-down test case, we are also not negative testing by this definition because the application, in not saving the document or doing anything (since it was not designed to do anything in the first place), is not doing something that it is not supposed to do. Again, the application is not "supposed" to do anything since it was not designed to do anything in this situation.

Now consider my quasi-definition/equation for positive testing that I gave earlier:

Positive Testing = (Not showing error when not supposed to) + (Showing error when supposed to)
I would have to loosen my language a little but, basically, the application was not supposed to show an error and, in fact, did not do so in this case. But what if the application was, in fact, supposed to handle that situation of a power-down? Let us say the developers hooked into the API so that if a shut-down event was fired off, the application automatically issues an error/warning and then saves the document in a recovery mode format. Now let us say I test that and find that the application did not, in fact, save the file. Consider again my quasi-definition/equation for negative testing:

Negative Testing = (Showing error when not supposed to) + (Not showing error when supposed to)
In this case I have done negative testing because the application was supposed to issue an error/warning but did not. However, notice, that the test case is the same exact test case. The intent of my testing was simply to test this aspect of the application. The result of the test relative to the stated design is what determines if the test was negative or positive by my definitions. Now, because I want to be challenged on this stuff, you could also say: "Yes, but forget the document in the word processor. What if the application gets corrupted because of the power-off?" Let us say that the corruption is just part of the Windows environment and there is nothing that can be done about it. Is this negative testing? By the Devil's Advocate definition, strictly it is not, because remember by that definition: "Negative testing is that testing which attempts to show that the module does something that it is not supposed to do." But, in this case, the module did not do something (become corrupted) that it was not supposed to do. This simply happened as a by-product of a Windows event that cannot be handled. But we did, after all, try to disrupt the module, right? So is it a negative test or not by the definition of disruption? Incidentally, by my definition, it is not a negative test either. However, what is common in all of what I have said is that the power-down test case is an effective test case and this is the case regardless of whether you choose to connote it with a "positive" or "negative" qualifier. Since that can be the case, then, for me, the use of the qualifier is irrelevant.

But now let us consider another viewpoint from the Devil's Advocate and one that I think is pretty good. Consider this example: An application takes mouse clicks as input. The requirement is for one mouse click to be processed at a time, the user hitting multiple mouse clicks will cause the application to discard anything but the first. Any tester will do the obvious and design a test to hit multiple mouse clicks. Now the application is designed to discard anything but the first, so the test could be classified (by my definition) a negative one as the application is designed not to process multiple mouse clicks. The negative test is to try to force the application to process more than the first. BUT, I hear you say, this is an input validation test that tests that the application does discard mutiple mouse clicks, therefore it is a positive test (again, by my definition), and I would then agree, it is a positive test. However, the tester might also design a test that overflows the input buffer with mouse clicks - is that a negative test?. Note, this situation is not covered explicitly in the requirements - and that is crucial to what I would call negative testing, that very often it is the tester's "what if" analysis that designs negative tests - so, yes, it is a negative test as you are forcing the application into a situation it may not have been designed and/or coded for - you may not know whether it had or not. The actual result of the test may be that the application stops accepting any more clicks on its input buffer and causes an error message or it may crash.

Now, having said all this, it makes me realize how my point starts to coalesce with the Devil's Advocate. One way that might happen is via the use of the term "error" that has gotten tossed around a lot. My language seemed too restrictive in the sense that when I used the word "error" (as in "showing error when not supposed to") I did not make it clear enough that I was not necessarily talking about an error screen of some sort, but rather an error condition or a failure. With this, my negative testing definition really starts to coalesce with the Devil's Advocate's definition ("does something that it is not supposed to do"). I had said: "Negative Testing = (Showing error when not supposed to) + (Not showing error when supposed to)" and broadening my language more, really what I am saying is that the application is either showing (doing) something it is not supposed to (which matches the Devil's Advocate thought) but I was also saying that the application is not showing (doing) something that it was supposed to. And to the Devil's Advocate that latter is positive testing. Let me restate the two viewpoints somewhat:

Positive Testing (Jeff):
Not doing something it was not supposed to do.
Doing something it was supposed to do.

Positive Testing (Devil's Advocate):
Not doing what it is supposed to do.

Negative Testing (Jeff):
Doing something it was not supposed to do.
Not doing something it was supposed to do.

Negative Testing (Devil's Advocate):
Doing something that it is not supposed to do.
I think I was essentially saying the same thing in terms of negative testing as my hypothetical opponent, just not in terms of positive testing. If you notice, both of our "negative testings" really contain the same point. On the other hand, relative to the common Devil's Advocate position, I am having a hard time seeing the major distinction between positive and negative. The Devil's Advocate's original conception:

"Positive testing is that testing which attempts to show that a given module of an application does NOT do what it is supposed to do. Negative testing is that testing which attempts to show that the module does something that it is not supposed to do."

To me, "doing something you are not supposed to" (the Devil's Advocate negative test) and "not doing something you are supposed to" (the Devil's Advocate positive test) are really two sides of the same coin or maybe just two ways of saying the same thing. So let us say that our requirement is "do not process multiple mouse clicks". In that case, "not doing something you are supposed to" (Devil's Advocate positive test) means, in this case, "processing multiple mouse clicks". In other words, the application should not process multiple mouse clicks. If it does, it is doing something it is not supposed to. Likewise, "doing something you are not supposed to do" (Devil's Advocate negative test) means, in this case, "processing multiple mouse clicks". In other words, the application should not process multiple mouse clicks. Either way, it is saying the same thing. So what we are testing for is "application not processing multiple mouse clicks". It would seem that if the application does process multiple mouse clicks it is both not doing what it is supposed to do (not processing them) and doing something it is not supposed to do (processing them). The same statement, just made different ways. Now, let me see if that works with my definitions.

Again, the lemma is "do not process multiple mouse clicks". If the application does this then it falls under lemma 1 of my negative test ("Doing something it was not supposed to do.") If the application does not do this, it falls under lemma 1 of my positive test ("not doing something it was not supposed to do"). Even with the mouse click example we have two aspects:

Application designed not to process multiple mouse clicks
Application designed to process only one mouse click
Saying the same thing, and yet a subtle shift in emphasis if you want to go by the positive and negative distinctions. The difference, however, is also whether you are dealing with active design or passive design. In other words, does the application actively make sure that only one mouse click is handled (by closing the buffer) or does it simply only process one click, but allow the buffer to fill up anyway. I like the idea of tying this whole thing in with "mitgating design factors". I think that we can encapsulate "intent" and "result" (both of which are important to test casing) by looking more at the efficient and effective demarcations. We have to consider result that is part of how you do test case effectiveness metrics as well as proactive defect detection metrics. If a test case is a tautology test then it is not really efficient or effective - but that is based solely on the result, not the intent or anything else.

And The Point Is ... ?
The point here was simply to show an extended dialogue that I wish more testers would take it upon themselves to do. Regardless of whether you agree with some of the conclusions I have put foward here (most notably that "positive" and "negative" is not necessary a very meaningful distinction), at least you will hopefully see that it pays to sometimes think out these issues because it forces you to better consider what it is you are actually doing when you are testing and it also helps you think about how to explain it to others.

No comments:

How to Get files from the directory - One more method

 import os import openpyxl # Specify the target folder folder_path = "C:/Your/Target/Folder"  # Replace with the actual path # Cre...