Re: [RFC] Tokenizer API

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 13 Dec 2013 18:24:24 +1300

For the record this is now being developed as part of the parser upgrade
branch:
 lp:~squid/squid/parser-ng

Kinkie; I've just done an interim check...

* the changes in CharacterSet introduced a infinite-loop bug in
operator+=().
 FYI: The uint8_t type value cannot represent the value 256, which is
why we use it as the loop terminator and an iterator of a size larger
than 8-bit with a down-cast to prevent the vector accidentally being
grown by an overflow.
  ==> I have fixed that now and pushed. You may want to re-check the
Tokenizer unit test where you have a comment about buggy +=.

* for set definitions please use those defined in the protocol RFCs:
 - http://tools.ietf.org/html/rfc5234#appendix-B.1
  ... defines ALPHA, CR, CTL, DIGIT, DQUOTE, HEXDIG, HTAB, LF, OCTET,
SP, VCHAR, CHAR, WSP ...
 - http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-25
 ... section 1.2 mentions how HTTP uses the above types from RFC 5234
 ... section 3.2.3 defines whitespace == 1*WSP
 ... section 3.2.6 defines tchar, special, qdtext, obs-text, ctext

There are some others such as base64, token68, etagc in odd
feature-specific places around the specification as well. But we can get
to those later when we need them.

* please use the full path below src/ when including headers.
src/parser/Tokenizer.h should have #include "parser/CharacterSet.h"

* please use camelCase instead of the underscores in Tokenizer methods.
 - you are also defining the rv local variable at a higher scope than
necessary. It can be in the for(SBuf::size_type rv = 0;...)

Amos
Received on Fri Dec 13 2013 - 05:24:35 MST

This archive was generated by hypermail 2.2.0 : Fri Dec 13 2013 - 12:00:10 MST