Re: pseudo-specs for a String class: tokenization

From: Kinkie <gkinkie_at_gmail.com>
Date: Mon, 8 Sep 2008 14:57:27 +0200

On Mon, Sep 8, 2008 at 1:52 PM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
> Alex Rousskov wrote:
>>
>> On Fri, 2008-09-05 at 17:47 +0200, Kinkie wrote:
>>>
>>> On Fri, Sep 5, 2008 at 5:02 PM, Alex Rousskov
>>> <rousskov_at_measurement-factory.com> wrote:
>>>>
>>>> On Fri, 2008-09-05 at 10:19 +0200, Kinkie wrote:
>>>>>
>>>>> On Fri, Sep 5, 2008 at 4:43 AM, Alex Rousskov
>>>>> <rousskov_at_measurement-factory.com> wrote:
>>>>>>
>>>>>> Just like String, the iterator interface is pretty standard. For our
>>>>>> Tokenizer, we can simplify it a little unless others think that
>>>>>> compatibility with standard library algorithms is worth the trouble.
>>>>>> Here is a sketch:
>>>>>>
>>>>>> class Tokenizer {
>>>>>> public:
>>>>>> Tokenizer(); // immediately atEnd
>>>>>
>>>>> I'd avoid the default constructor entirely.
>>>>
>>>> Bad idea. The default constructor does not hurt in this case. It does
>>>> help when you want another method to initialize the tokenizer or when
>>>> you want to reset the already initialized tokenizer.
>>>
>>> A tokenizer only has meaning when attached to a KBuf (String,
>>> whatever), that's what I ment by not having a constructor without an
>>> attached KBuf.
>>
>>> From practical point of view, you may not have the right string to
>>
>> "attach" to at the time of construction and attaching to the wrong
>> string is worse than meaningless.
>>
>>> From design point of view, a basic tokenizer that is atEnd() with or
>>
>> without the attached buffer is perfectly fine and meaningful because you
>> cannot do much with atEnd tokenizer.
>>
>> As we add more bells and whistles to the Tokenizer class, the meaning of
>> some methods may indeed become vague for unattached tokenizer. For
>> example, what should the originalString() or source() method return if
>> we have one? For simplicity sake, we can solve that problem by declaring
>> that the default constructor has the same visible effect as the
>> Tokenizer(String(), String()) constructor.
>>
>>>>> I'd rather add a version whcih takes the String but not the delimiters.
>>>>
>>>> I would recommend avoiding implicit conversions from String to anything
>>>> and I doubt there is a reasonable set of default delimiters.
>>>
>>> Why there would be an implicit conversion?
>>
>> Ask Amos -- he has suffered enough from it to give an entertaining
>> answer :-). Or see the attached source file.
>
> All my failed attempts were broken the String MemBuf size separation. I was
> attempting to expand String to implicit conversion as-is but got hamstrung
> when memory buffers were cast to char* and the MemBuf data pointer were
> silently converted to String's and copy-allocator size asserts kicked in :-)
>
> This new method of attack should not encounter that due to two differences.
> Firstly the lack of a buffer size assert :-) and lack of need for an
> implicit conversion between the types.

As you noticed exporting the buffers is not implicit and comes with a
big warning sign (and no, I'm not planning to change that method's
name).
I'll rather proxy all low-level calls, effectively trying to "trap"
users within the KBuf class.
Regarding the by-design size limitation, I honestly fail to see its
reason. IF we end up doing something like that with KBuf, in my
opinion
- it's a class-wide thing (saves a few bytes off each handle)
- it's tuned way higher - 4 megs or something like that.

Also, I expect KBufList to come to the rescue there, for writev()
stuff (can anyone else hear words cache_mem being whispered here?)

-- 
 /kinkie
Received on Mon Sep 08 2008 - 12:57:36 MDT

This archive was generated by hypermail 2.2.0 : Mon Sep 08 2008 - 12:00:04 MDT