Re: [PATCH] Tokenizer

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 02 Jun 2014 10:43:15 -0600

On 05/30/2014 03:56 AM, Kinkie wrote:
> Hi,
> here's the patch for merging the sbuf-tokenizer branch. Includes the
> fixes to Parser::Tokenizer::int64 method (via reimplementation of
> strtoll to account for SBufs sharing MemBlobs).
>
> Please review.

> +Parser::Tokenizer::token(SBuf &returnedToken, const CharacterSet &delimiters)
> +{
> + SBuf savebuf(buf_);
> + SBuf retval;
> + SBuf::size_type tokenLen = 0;

This code has changed since the patch was posted for review, but please
declare both retval and tokenLen as constant.

> + SBuf::size_type prefixLen = buf_.substr(0,limit).findFirstNotOf(tokenChars);

> + SBuf::size_type prefixLen = buf_.findFirstNotOf(tokenChars);

Please declare variables as constant whenever possible.

> + //fixme: account for buf_.size()
> + bool neg = false;
> + const char *s = buf_.rawContent();
> + const char *end = buf_.rawContent() + buf_.length();

Could you please explain what the above comment means? There is no
SBuf::size() AFAICT.

>
> + * At least one terminating delimiter is required. \0 may be passed
> + * as a delimiter to treat end of buffer content as the end of token.

> + if (tokenLen == SBuf::npos && !delimiters['\0']) {

The above is from the actually committed code (not the code posted for
review). Please remove \0 as a special case. As far as I can tell, the
reason it was added is already covered well enough by the existing API.
The addition of this special case is likely to just confuse and break
things. For example, adding \0 to delimiters for the purpose of
disabling this special case will also tell Tokenizer to skip \0
characters in input, which is often wrong.

Thank you,

Alex.
Received on Mon Jun 02 2014 - 16:43:20 MDT

This archive was generated by hypermail 2.2.0 : Mon Jun 02 2014 - 12:00:10 MDT