Re: [PATCH] Generated SBuf::find() test cases

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 26 Feb 2013 15:31:30 +1300

On 26/02/2013 1:07 p.m., Alex Rousskov wrote:
> On 02/25/2013 03:45 PM, Amos Jeffries wrote:
>> On 26/02/2013 11:06 a.m., Alex Rousskov wrote:
>>> The attached patch adds about 700'000 test cases for eight
>>> SBuf::find() methods. Needless to say, they are generated rather than
>>> manually written. They take a couple of seconds to run. Configuration
>>> options allow us to increase or decrease the number of test cases if
>>> needed.
>> Why do we need so many tests?
> I do not know how many tests are needed. Somebody can study that (if
> they feel that reducing the number of generated tests is important).
> More on that below.
>
>
>> Surely a selection of lengths from the middle, and specific scans of all
>> the edge-boundary tests would be enough?
> That is what the random tests are doing. One could massage them into
> using shorter strings (already a parameter), selecting fewer lengths
> from the middle (a simple code modification), and/or doing fewer
> edge-boundary tests (a simple code modification), of course.
>
>
>> That can't be more than a few hundred - which is few enough to run each
>> as a separate CppUnit case.
> I have not studied whether doing so will reduce the number of bugs
> found. I only studied that going in the opposite direction (more test
> cases) does not seem to help much. Please note that it is difficult to
> know for sure because while many test cases find bugs, it is difficult
> to isolate independent/different bugs without actually fixing SBuf.
>
> Why is "a few hundred" is OK as a separate CppUnit case but "a few
> hundred thousands" is not OK? I am _not_ trying to argue either way,
> just curious what the criteria for sizing CppUnit cases are, since I
> know so little about CppUnit...

I like what this does already just because it finds bugs, and I'm happy
with generating the actual strings randomly to see if we can fuzz up
some failure somewhere as an on-top kind of check. But I don't see the
need for nearly so many checks of the basic border conditions. I think
we could auto-generate a smaller more deterministic/targeted set of
checks for better results and clarity about what the coverage is exactly.

1) the weak argument:
  700K '.' characters is a lot to output if we want to make each test a
CPPUNIT_ASSERT*() - assuming it passes the tests. 700K failures would be
a few GB or TB of log.
  Whereas a few hundred tests is only a few hundred bytes more build log.

2) the slightly stronger reason:
  Because the SBuf content can be broken down into "fields" of octet
value ranges that should all have consistent behaviour relating to how
they break string implementations: ASCII visible characters, extended
127+ octets, and ASCII control codes (\1 to \31 and \126), and nil octet
\0. That is only 16 (4^2) sets of permutations needed to cover all the
necessary byte code handling failures of whatever internal
implementation is used (possibly less if we tailor the unit-tests to the
specific implementation).

  ** I understand std::string cannot handle most of these octect value
ranges, so this test can only be performed using the printabe
characters. So for now char permutations: 1.

eg. a search for "b" in "acd" is no better than searching for "b" in
"efg". The important "unit" is that you are searching for a missing
pattern of length 1.

It is only worthwhile testing what you label "placements" once for each
permutation in other factors.
To my knowledge the pattern units / placements we need to check for each
of the above permutations is:
  - pattern out of range (off end of SBuf)
  - exact match
  - absent pattern
  - pattern too long
  - present pattern
  - present needle prefix
  - present needle suffix
  - present and fully out of range prefix
  - present ending in-range prefix
  - present and fully out of range suffix
  - present starting in-range suffix

  ** Placement pattern permutations: 11

All of the above for a couple of different lengths from 0, 1, 2, ... 5
should cover it.

  ** needle length permutations: 5

  1 * 11 * 5 ==> 55 tests covering all permutations of input placement,
needle size. Adding the 16 character set permutations only makes that
880 tests total.

For further find tuning the raw hay could be a set of '_' characters
where needle(s) are placed. Ensuring that the needle does not contains
'_' will prevent those TODO worries you mention about whether the test
is checking multiple needles by accident already. (corruption of results
is not a worry, just knowing whether it is being covered or not *is*)

> If more CppUnit cases are needed, I am sure it would be possible to
> parameterize the current code to generate more specific test cases.
> After repeating such "focused" generation inside each appropriate
> CppUnit case, one would end up with multiple CppUnit test cases covering
> approximately the same input space.
>
>
>> Also, using fuzzing with a random input leaves us open to the rare
>> occasions when a single byte octet value at a specific location is the
>> cause of a bug. Think \0 at octet 5 when doing 32-bit chunk comparisions
>> in some buggy implementation checking against !*p. That type of failure
>> appears intermittently and inspires confusion rather than a fix.
> Bugs notwithstanding, all random input is 100% reproducible so there
> should not be any intermittent behavior originating from the test cases
> themselves (at least as far as the same execution environment is
> concerned -- I do not know whether random() implementations yield
> different sequences on different platforms).

They are supposed to. But I would not bet on that either.
Morely likely unique sequence per kernel family, with a few exceptions
in those designed for higher security paranoia like OBSD.
They only thing we can be certain of is same kernel + same math
libraries + same seed implies same sequence.

Amos
Received on Tue Feb 26 2013 - 02:31:37 MST

This archive was generated by hypermail 2.2.0 : Tue Feb 26 2013 - 12:00:07 MST