Re: [RFC] Tokenizer API

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Wed, 11 Dec 2013 11:34:27 -0700

On 12/11/2013 11:15 AM, Kinkie wrote:
>>> CharacterSet(const char * const c, size_t len)
>>
>> You do not want the len argument. These character sets will be nearly
>> always initialized once, from constant hard-coded c-strings.
>> For esoteric cases, I would add an add(const char c) method to add a
>> single character to the set (and use the merge operation to produce more
>> elaborate sets as needed, see below for a sketch).
>
> Yes, but without it it's not possible to specify \0 as a valid
> character in the set.

Yes, but \0 can be specified using the add-single-character interface
which is useful for other reasons (see my earlier email with a sketch
adding a single \r character). I suspect \0 is going to be rarely used.

> Possible solution: make len optional. If 0, default to strlen().
> That would allow to cover one possible esoteric case without impacting
> the common case. What do you think?

Given the number of cases where folks wrote the _wrong_ length already,
I suggest the following plan:

1. Do not support the len argument. Use the add-single-character
interface to add \0 when/if needed.

2. If we find ourselves adding \0 (and nothing else) too much, revisit
this issue.

HTH,

Alex.
Received on Wed Dec 11 2013 - 18:34:55 MST

This archive was generated by hypermail 2.2.0 : Thu Dec 12 2013 - 12:00:11 MST