Re: [PATCH] Tokenizer

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Mon, 12 May 2014 22:14:55 +1200

Ping.

Amos

On 7/01/2014 12:14 p.m., Amos Jeffries wrote:
> On 2014-01-07 10:23, Kinkie wrote:
>> Hi,
>> here's a merge proposal for Parser::Tokenizer, an implementation of
>> the API suggested by Alex.
>>
>> The feature branch is available at lp:~squid/squid/sbuf-tokenizer
>
> in src/parser/Tokeniser.h
> * documentation call the second parameter of token() 'delimiters' but
> the code calls it 'whitespace'. Please be consistent, keeping in mind
> there is no dependency on it being whitespace (ie "Foo:,,,blah,,,\n"
> will have delims "," to be skipped)
> + the .cc needs to be updated to match the choice here.
>
> * comment "Use three prefix() calls instead" is inaccurate.
> The code may require non-3 numbers of prefix() calls. You can remove
> "three" and possibly also "calls" from that statement.
>
> * the comments suffix "(a token)" on all the skip() methods seem
> unnecessary, and 'token' means different things to the different layers
> of the parser we will be dealing with. Best not to confuse future dev.
>
> * double-empty line after class definition.
>
>
> Regarding the semantics of prefix():
> It would be useful to have the capacity for limiting how long the
> prefix will search down the buffer. A third SBuf::size_type N=npos
> parameter would allow us to do things like search for up to 16 bytes of
> method name and error quickly on 32KB of random hex data.
> - token() would be nice limiting too, but as long as prefix provides it
> we can refer to prefix() for all cases where length safety is needed.
>
>
> in src/parser/testTokenizer.cc:
> * no need for empty line after squid.h
>
> * please use the standard token types now defined in CharacterSet for
> these test parses instead of defining some wrong sets with same names
> (ie "whitespace", "crlf").
> - New custom sets are of course okay but should be clearly custom designs.
>
> * testTokenizer::testCharacterSet() does nothing.
>
> * simple construct/destruct tests are missing.
> - how do we know the buffer bytes parsed are actually the ones given
> before testing prefix()/token()/skip() ?
> - how can we be sure the Tokenizer is not triggering a wasteful
> data-copy in its constructor?
>
> * testTokenizer::testTokenizerPrefix() seems wrong in several ways.
> - Just provide random input with various edge cases you can think of
> embeded (a fuzz test might be good there).
> - It is currently (badly) replicating things more properly found in
> testHttpParser units and implying that those calls might be how the HTTP
> is actually parsed (ALPHA-only method name etc).
>
>
> in src/parser/testTokenizer.h
> * Please order the testing such that the simplest operations are tested
> first.
> - with any functions tested in dependency order (eg token() calls
> skip()? then test skip() first)
>
> * Please test the full API. Including simple things like atEnd(), reset().
>
> * Please keep tests for simple APIs like this cleanly separated.
> - a test should setup a specific state, run some API call, then test
> the resulting state is correct.
> - testTokenizer::testTokenizerSkip() is making needless tests of
> prefix() which consumes buffer while apparently verifying previous
> skip() state results.
> + instead add "#define private public" in front of the Tokenizer.h
> include and use SBuf operation on buf_ directly to verify its state
> without consuming any of it. The buf_ checks could also be added through
> the rest of the unit tests alongside the split-off token SBuf content
> checks.
>
> Amos
>
Received on Mon May 12 2014 - 10:15:20 MDT

This archive was generated by hypermail 2.2.0 : Mon May 12 2014 - 12:00:13 MDT