Re: pseudo-specs for a String class: tokenization

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Wed, 03 Sep 2008 22:59:18 -0600

On Thu, 2008-09-04 at 13:47 +1200, Amos Jeffries wrote:

> >> On an unrelated issue, since it was of interest to some of us, here's
> >> a sample of the caller code for tokenization functions (actual live
> >> code):
> >>
> >> KBuf s1;
> >> cout << "tokenization: \n";
> >> {
> >> s1="The quick brown fox jumped over the lazy dog";
> >> char *needle=" ";
> >> KBuf cs1(needle);
> >> while (!s1.isNull()) {
> >> cout << "token: " << s1.nextToken(cs1) << endl;
> >> }
> >> }
> >> cout << endl;
> >
> > FWIW, I still think that tokenization should be a external to the buffer
> > or string and should not modify them. Please see my earlier posts for
> > details.

> Alex, the basic buffer is not altered, only where the s1 offset is
> pointing at.

> From what he mentioned on IRC last night....
>
> Making s1 a duplicate reference to another KBuf (ie the actual in put
> buffer) should show that the base KBuf is unchanged, but the parsing with
> nextToken() will only spew off a child sub-string and increment the s1
> start offset one token down the string.
>
> I'm in favor, it can be tuned for very efficient Parsing. And in
> inefficient usage of it can be fixed easily.

I see what you mean.

The above example does not illustrate the usage you propose: It
obviously does modify s1 and there is no way to reliably recover the
original buffer after the iteration.

This is a good example why poor design leads to confusing code (and
other problems). We have lots of other examples in Squid code where an
API can be used five different ways, only one of them being correct, but
three can be found spreading through the actual code.

I still think an external class that stores the delimiter and current
position would be a better tokenization API that would be easier/safer
to use now and will eventually allow for more optimizations, non-trivial
or varying delimiters, backing up, etc. And I do not think it introduces
any performance overheads compared to your API.

Cheers,

Alex.
Received on Thu Sep 04 2008 - 04:59:40 MDT

This archive was generated by hypermail 2.2.0 : Thu Sep 04 2008 - 12:00:04 MDT