Re: [PATCH] Tokenizer

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 07 Jan 2014 12:14:04 +1300

On 2014-01-07 10:23, Kinkie wrote:
> Hi,
> here's a merge proposal for Parser::Tokenizer, an implementation of
> the API suggested by Alex.
>
> The feature branch is available at lp:~squid/squid/sbuf-tokenizer

in src/parser/Tokeniser.h
* documentation call the second parameter of token() 'delimiters' but
the code calls it 'whitespace'. Please be consistent, keeping in mind
there is no dependency on it being whitespace (ie "Foo:,,,blah,,,\n"
will have delims "," to be skipped)
  + the .cc needs to be updated to match the choice here.

* comment "Use three prefix() calls instead" is inaccurate.
  The code may require non-3 numbers of prefix() calls. You can remove
"three" and possibly also "calls" from that statement.

* the comments suffix "(a token)" on all the skip() methods seem
unnecessary, and 'token' means different things to the different layers
of the parser we will be dealing with. Best not to confuse future dev.

* double-empty line after class definition.

Regarding the semantics of prefix():
   It would be useful to have the capacity for limiting how long the
prefix will search down the buffer. A third SBuf::size_type N=npos
parameter would allow us to do things like search for up to 16 bytes of
method name and error quickly on 32KB of random hex data.
  - token() would be nice limiting too, but as long as prefix provides it
we can refer to prefix() for all cases where length safety is needed.

in src/parser/testTokenizer.cc:
* no need for empty line after squid.h

* please use the standard token types now defined in CharacterSet for
these test parses instead of defining some wrong sets with same names
(ie "whitespace", "crlf").
  - New custom sets are of course okay but should be clearly custom
designs.

* testTokenizer::testCharacterSet() does nothing.

* simple construct/destruct tests are missing.
   - how do we know the buffer bytes parsed are actually the ones given
before testing prefix()/token()/skip() ?
   - how can we be sure the Tokenizer is not triggering a wasteful
data-copy in its constructor?

* testTokenizer::testTokenizerPrefix() seems wrong in several ways.
  - Just provide random input with various edge cases you can think of
embeded (a fuzz test might be good there).
  - It is currently (badly) replicating things more properly found in
testHttpParser units and implying that those calls might be how the HTTP
is actually parsed (ALPHA-only method name etc).

in src/parser/testTokenizer.h
* Please order the testing such that the simplest operations are tested
first.
  - with any functions tested in dependency order (eg token() calls
skip()? then test skip() first)

* Please test the full API. Including simple things like atEnd(),
reset().

* Please keep tests for simple APIs like this cleanly separated.
  - a test should setup a specific state, run some API call, then test
the resulting state is correct.
  - testTokenizer::testTokenizerSkip() is making needless tests of
prefix() which consumes buffer while apparently verifying previous
skip() state results.
   + instead add "#define private public" in front of the Tokenizer.h
include and use SBuf operation on buf_ directly to verify its state
without consuming any of it. The buf_ checks could also be added through
the rest of the unit tests alongside the split-off token SBuf content
checks.

Amos
Received on Mon Jan 06 2014 - 23:14:14 MST

This archive was generated by hypermail 2.2.0 : Tue Jan 07 2014 - 12:00:10 MST