Re: [PATCH] Generated SBuf::find() test cases

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Tue, 26 Feb 2013 00:19:14 -0700

On 02/25/2013 07:31 PM, Amos Jeffries wrote:

> I like what this does already just because it finds bugs, and I'm happy
> with generating the actual strings randomly to see if we can fuzz up
> some failure somewhere as an on-top kind of check. But I don't see the
> need for nearly so many checks of the basic border conditions. I think
> we could auto-generate a smaller more deterministic/targeted set of
> checks for better results and clarity about what the coverage is exactly.

I am sure it is possible to generate fewer checks (and the code already
has parameters to control some of that). As for coverage clarity, I
would not be surprised if it is more a problem of documenting what is
covered rather than the sheer number of test cases.

> 1) the weak argument:
> 700K '.' characters is a lot to output if we want to make each test a
> CPPUNIT_ASSERT*() - assuming it passes the tests. 700K failures would be
> a few GB or TB of log.
> Whereas a few hundred tests is only a few hundred bytes more build log.

I think is is an argument against making each generated test case a
CPPUNIT_ASSERT rather than an argument for fewer test cases :-). If
CPPUNIT cannot handle a large number of test cases, let's aggregate
(like the proposed patch does) instead of testing CPPUNIT limits.

> 2) the slightly stronger reason:
> Because the SBuf content can be broken down into "fields" of octet
> value ranges that should all have consistent behaviour relating to how
> they break string implementations: ASCII visible characters, extended
> 127+ octets, and ASCII control codes (\1 to \31 and \126), and nil octet
> \0. That is only 16 (4^2) sets of permutations needed to cover all the
> necessary byte code handling failures of whatever internal
> implementation is used (possibly less if we tailor the unit-tests to the
> specific implementation).

> ** I understand std::string cannot handle most of these octect value
> ranges,

Why not? AFAIK, std::string can handle any octet. It can even handle
wide characters if needed. The proposed patch uses digits and letters,
but that is because it makes output easier to read and understand.

The only special character I can think of in this context is \0, and
only if SBuf accidentally uses some function call that treats it
specially. Adding \0 support to generated test cases is a TODO (just
requires handling it properly when printing AFAIK).

> so this test can only be performed using the printabe
> characters. So for now char permutations: 1.
>
> eg. a search for "b" in "acd" is no better than searching for "b" in
> "efg". The important "unit" is that you are searching for a missing
> pattern of length 1.

Yes, that is how generated test cases are already structured. Search for
Placement in the patch to see what needle placements are supported.

Of course, one of the primary reasons we are generating test cases is
because we are worried that humans will miss bugs by grouping too many
slightly different things together. For example, the extension of the
above logic would be to declare searches for missing string of any
length identical. In theory, they are. In practice, buggy code treats
too-long or too-short strings differently.

Yes, it is theoretically possible to think of all possible valuable
combinations and create a hundred or so corresponding test cases.
However, nobody has done that so far, and I would discourage folks from
trying because the number of valuable permutations is too large in this
case IMO.

While the following does not give you the absolute minimum of carefully
handcrafted possibilities, I think it is a useful estimate of the
problem complexity:

  SBuf methods *
     character groups *
     hay lengths * needle lengths *
     needle placements * number of needles *
     search position restrictions.

That's already 700+ test cases if each multiplier has just 3 values (and
most of them are more than 3 values even if we carefully limit them to
the essential ones).

> It is only worthwhile testing what you label "placements" once for each
> permutation in other factors.

Yes, that is what the patch does if I understand you correctly:
Placements are tested once for each permutation in other factors. The
same is also true for any factor other than placement, of course, so I
am not sure I understand you correctly.

> To my knowledge the pattern units / placements we need to check for each
> of the above permutations is:
> - pattern out of range (off end of SBuf)
> - exact match
> - absent pattern
> - pattern too long
> - present pattern
> - present needle prefix
> - present needle suffix
> - present and fully out of range prefix
> - present ending in-range prefix
> - present and fully out of range suffix
> - present starting in-range suffix
>
> ** Placement pattern permutations: 11

I do not understand the difference between some of these (e.g., pattern
vs needle). As you probably know, the patch covers these placements:
needle not in hay, needle in the beginning of hay, needle in the middle
of hay, and needle at the end of hay. I think adding "needle is hay" and
"half-needle in ..." cases would be good (the former is actually tested
a little already).

> All of the above for a couple of different lengths from 0, 1, 2, ... 5
> should cover it.
>
> ** needle length permutations: 5
>
> 1 * 11 * 5 ==> 55 tests covering all permutations of input placement,
> needle size. Adding the 16 character set permutations only makes that
> 880 tests total.

And multiply by eight SBuf::find() functions: 7040 cases.

> For further find tuning the raw hay could be a set of '_' characters
> where needle(s) are placed. Ensuring that the needle does not contains
> '_' will prevent those TODO worries you mention about whether the test
> is checking multiple needles by accident already. (corruption of results
> is not a worry, just knowing whether it is being covered or not *is*)

There is no corruption when multiple needles happen to be in the hay,
and we know that we do not generate such cases intentionally, so they
should not be considered as covered. Those TODOs are about making them
covered.

To summarize, I do agree that it is possible to manually write fewer
test cases and that fewer test cases is better with all other factors
being equal. However, I am skeptical that the final result

(a) will be significantly better overall (small, better coverage, easier
to maintain, easier to understand, etc.) than the proposed approach; AND

(b) will be available before the next StringNG merge request.

This is why we made these generated cases. I would be more than happy to
withdraw the proposed patch when (a) and (b) happens!

>> Bugs notwithstanding, all random input is 100% reproducible so there
>> should not be any intermittent behavior originating from the test cases
>> themselves (at least as far as the same execution environment is
>> concerned -- I do not know whether random() implementations yield
>> different sequences on different platforms).

> They are supposed to. But I would not bet on that either.
> Morely likely unique sequence per kernel family, with a few exceptions
> in those designed for higher security paranoia like OBSD.
> They only thing we can be certain of is same kernel + same math
> libraries + same seed implies same sequence.

Sounds good to me (the seed is a parameter already, with a constant
default).

The current output mitigates problems created by different random
sequences in different environments: The developer working on a bug
report can tell which test cases have failed and can trivially reproduce
them manually if needed. The same developer can rely on 100%
reproducible automated behavior during her own development, on any given
platform.

Thank you,

Alex.
Received on Tue Feb 26 2013 - 07:19:34 MST

This archive was generated by hypermail 2.2.0 : Tue Feb 26 2013 - 12:00:07 MST