Re: [PATCH] Tokenizer

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 02 Jun 2014 14:27:23 -0600

On 06/02/2014 01:33 PM, Kinkie wrote:
> Hi,
> adapting the current implementation to the "extra optional
> parameter" was rather trivial; please consider the attached patch,
> whose bulk is unit tests and (certainly improvable) documentation.

I think requireComplete is better than making \0 special, but since the
existing API already supports similar tests, we should discuss use cases
where the extra parameter is actually needed.

If the patched code stays:

> // no delimiter found, nor is NUL/EOS/npos acceptible as one

The above documentation becomes stale with this patch. You can fix this
be rewriting it as

  // required trailing delimiter not found

Thank you,

Alex.

> On Mon, Jun 2, 2014 at 8:07 PM, Alex Rousskov
> <rousskov_at_measurement-factory.com> wrote:
>> On 06/02/2014 11:27 AM, Kinkie wrote:
>>>>> + //fixme: account for buf_.size()
>>>>> + bool neg = false;
>>>>> + const char *s = buf_.rawContent();
>>>>> + const char *end = buf_.rawContent() + buf_.length();
>>>>
>>>> Could you please explain what the above comment means? There is no
>>>> SBuf::size() AFAICT.
>>>
>>> length() is an alias of size(). Changed to size() to avoid confusion
>>
>> The above code already accounts for .length() or .size(), right? What is
>> there to fix? Is it a bug (XXX) or just an future improvement note (TODO)?
>>
>>
>>>>> + * At least one terminating delimiter is required. \0 may be passed
>>>>> + * as a delimiter to treat end of buffer content as the end of token.
>>>>
>>>>> + if (tokenLen == SBuf::npos && !delimiters['\0']) {
>>>>
>>>> The above is from the actually committed code (not the code posted for
>>>> review). Please remove \0 as a special case. As far as I can tell, the
>>>> reason it was added is already covered well enough by the existing API.
>>>> The addition of this special case is likely to just confuse and break
>>>> things. For example, adding \0 to delimiters for the purpose of
>>>> disabling this special case will also tell Tokenizer to skip \0
>>>> characters in input, which is often wrong.
>>>
>>> I'll let Amos argue for this as he's the one who has the clearest
>>> picture of the use case.
>>> Given your objection, my 2c suggestion to address the need is to
>>> either have two token() methods, one which requires a separator at the
>>> end of a token and one which doesn't, or to have a bool flag passed to
>>> the method.
>>
>> I hope neither is needed, but I will wait for the "use case" discussion.
>>
>> BTW, if possible, please work on your proposal to avoid these kind of
>> rushed commits of knowingly contested code.
>>
>>
>> Thank you,
>>
>> Alex.
>>
>
>
>
Received on Mon Jun 02 2014 - 20:27:27 MDT

This archive was generated by hypermail 2.2.0 : Tue Jun 03 2014 - 12:00:17 MDT