Failure Improvement
Created and summarized by Brian Marick.
Here, I'm talking about what many people call "fault
isolation" or "defect isolation". I think failure
improvement is a better term. The process begins when you
observe a failure of the program. It ends with other observed
failures that enable you to write better bug reports. "Better"
can mean several things:
- The original observation might contain several
different failures with different underlying
causes. Failure improvement leads to several bug reports,
each of which exhibits only one of the failures.
Why bother with this? - Individual failures are easier
for everyone to handle. If you reported the intermingled
failures, two developers might have to work simultaneously on
the same bug report. When is the single bug report really
fixed? What if the development manager decides that only one
of the problems is worth fixing for this release? How do you
keep track of what's been fixed and what's still pending?
- The improved failure might be more reproducible.
That is, you started out unsure of exactly what to do to
cause the failure. You end by knowing.
Why bother with this? - Developers are quite likely to
discard your bug report if they can't reproduce the failure
on the first or second try. If they consistently have trouble
reproducing the failures you report, they will "flip the
bozo bit" on you - decide that everything you say is
poorly thought through. Unfair, perhaps, but that's the way
it works.
- The bug report might be simpler. It
might be stripped of unnecessary steps - it's better to
produce a failure in ten mouse clicks than 200. Or the
data that originally caused the problem might be boiled
down to a much smaller set - it's better to use a two-element
database than one with 30 gigabytes of data.
Why bother with this? - The developer spends less time
thinking about irrelevant steps or data, so she fixes the bug
faster. A developer's time is usually worth more than yours,
so it makes sense for you to spend time saving her some. It
is important, however, to avoid simplifying the problem
beyond the point where it saves the developer time.
Understanding what "simple enough" means is one of
the things you learn as you work on a product and with
particular developers.
- The boundaries of the failure might be better described.
You might discover several different ways to cause the
same failure, or you might discover cases that
surprisingly do not fail. From these, you may be able to
make more general statements about the inputs or
environment that contribute to the failure. Instead
of saying "this particular input fails", you
can say "this type
of input fails". (Note: it is often wise to
report both your generalization and also the specific
instances from which you generalized. Your generalization
might be wrong.)
Why bother with this? - The more general your
description, the quicker the developer can find the problem.
Moreover, developers sometimes fall into the trap of fixing a
particular symptom instead of the underlying cause. They look
at the problem too narrowly. By generalizing, you help them
avoid that.
- You might find more serious consequences. Failures
are bad because they do damage. The first example of a
failure probably won't do the maximum amount of damage.
With a little thought, you might be able to make it do
more.
Why bother with this? - At some point in your project,
people will be sleep-deprived and desperate. Don't expect
people in "crunch mode" to imagine the potentially
world-shattering consequences of your seemingly innocuous
failure. Demonstrate them.
- You might find new failures. Our tester
folk wisdom says that bugs cluster. If you spend your
time trying different inputs in an area where you've
already found one bug, you might find more.
Case Studies
Readings
- "Software
Defect Isolation", by Prathibha Tammana and
Danny Faught, describes some techniques, including the
important divide-and-conquer approach.
- Testing Computer Software (second edition), by
Kaner, Falk, and Nguyen, has a stunning section on fault
isolation in chapter 5. It alone is worth the price of
the book.
- Bad Software: What to Do When Software Fails, by
Kaner and Pels, has a troubleshooting strategy in Chapter
3, "Preparing to make the call". The list of
information to have when calling a software publisher's
support hotline will also help you, the tester,
characterize a failure more precisely. The checklists are
most useful for mass market Windows software.
- "Does a Bug Make a Noise
When It Falls in the Forest?", by Noel Nyman,
discusses whether you should report a bug that you don't
believe a customer could see. (Software Testing
and Quality Engineering Magazine, Volume 1,
Number 4, July/August 1999.)
Related Testing Craft Pages
Be the first person to add a comment in the
Wiki Forum at page FailureImprovement.
(The Forum is explained in its FrontPage.)
In this spot, the author of this page
will occasionally summarize the discussion in the Forum.