Re: pseudo-specs for a String class: tokenization

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Thu, 04 Sep 2008 09:24:40 -0600

On Thu, 2008-09-04 at 11:29 +0200, Kinkie wrote:

> /**
> * strtok()-equivalent, without (some of) the brain-damage.
> * Will extract a token from the beginning of the KBuf, containing
> * all chars up to the first occurrence of any of the chars in
> * <i>delim</i> or end-of-KBuf if no delimiter can be found.

What is returned when the string contains nothing but two delimiters?
Two empty tokens? One empty token? end-of-KBuf? The above needs to be
polished to make this clear.

> * The tokenized KBuf is modified, by removing off its head

Do you modify (and hence, usually copy) the underlying shared memory
buffer or just the string offset? Since your design morphs the two
classes together, it is difficult to say what "modifying KBuf" actually
means...

> * the returned token AND the delimiter.

Do not return the delimiter, at least not by default. A typical user
wants a stream of tokens, not the stuff that separates them. Also, there
is often no delimiter at the end of the string...

BTW, this is yet another case where a Tokenizer class would be better
than let-String-do-everything approach because a tokenizer object can at
any time return the current token, the current delimiter, and/or both,
without performance overhead or design complications.

> This is by explicit design choice, because squid is using C string
> manipulation functions. Removing some of the brain-damage is good,
> especially if doing so also buys performance, but rewriting clients
> from scratch something I'd reserve as a second step.

No String APIs discussed so far require "rewriting clients from
scratch". All APIs will require a lot of mundane client changes. I would
rather suffer through all those changes once (with a good API) than do
them twice, first with a partially brain-damaged API and then with a
damage-free API.

> Also it is (at least to me) a very natural way of seeing things,
> although it's probably not extremely clean. I took the pragmatic
> approach over the "clean design" approach.

Clean design can be pragmatic. It is better to avoid such arguments for
the sake of a constructive debate. It is almost guaranteed that others
will feel that their approach is more pragmatic and natural than yours.

You can say things like "my design models strtok() API" or "my design
models standard or boost library APIs" or "my design is a new wheel" but
do not call your approach "pragmatic" implying that others are not.

> > This is a good example why poor design leads to confusing code (and
> > other problems). We have lots of other examples in Squid code where an
> > API can be used five different ways, only one of them being correct, but
> > three can be found spreading through the actual code.
>
> 252 lines of the source out of 917 are documentation. If there's
> brain-damage, I hope it's well-documented and at least consistently
> applied.

If we are to suffer through String/char*/MemBuf replacement, I want to
suffer for the best String API we can get, an API that does not need to
be replaced for the foreseeable future. I do not care much if other
replacement APIs are well-documented and internally consistent.

> In the meantime I've implemented the storage change you asked, to use
> (offset,len) in place of (char*, len).
> The impact on code complexity was unexpectedy low, and so I agree to
> it and I'm not going back.

Glad we are making progress, at least with the little internal things.

I must say that this is not a normal review process but more like "let's
build a String API together" project, where developers design
preferences differ significantly.

I would recommend freezing implementation until the API is fixed and
accepted. Otherwise, you are going to waste a lot of time on
[re]documenting and [re]implementing the throw-away code.

Initially, I was under impression that you want us to review the API.
The "pseudo-specs" subject of the thread confused me, I guess. Now it
sure feels like you place a lot of value (and working hard) on the
internal code, which worries me because I do not want you to waste time
on redoing the implementation.

HTH,

Alex.
Received on Thu Sep 04 2008 - 15:25:09 MDT

This archive was generated by hypermail 2.2.0 : Thu Sep 04 2008 - 12:00:04 MDT