Re: pseudo-specs for a String class: tokenization

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Thu, 04 Sep 2008 11:30:35 -0600

On Thu, 2008-09-04 at 18:00 +0200, Kinkie wrote:

> > BTW, this is yet another case where a Tokenizer class would be better
> > than let-String-do-everything approach because a tokenizer object can at
> > any time return the current token, the current delimiter, and/or both,
> > without performance overhead or design complications.
>
> There is no such thing as "current delimiter"; it's supplied by the
> caller each time.

According to your documentation the caller supplies a _set_ of delimiter
characters. Thus, the current or actual delimiter (i.e., the actual
character at the end of the returned token, if any) is unknown to the
caller if the caller used a multi-character delimiter set:

> all chars up to the first occurrence of any of the chars in
> delim

If you did not mean to support a set of delimiter characters, then the
documentation needs to be fixed. If you ment to support it, the API
lacks a way to determine the current or actual delimiter.

Eventually, the delimiter may be defined by a regular expression or even
arbitrary code so that we can easily tokenize based on things like HTTP
whitespace, header continuation, or header termination (which includes
things like LF, CRLF, and CRCRLF).

With a Tokenizer class, you can also pass the iterator to an algorithm
that does not know (and does want to know) what the delimiter definition
is. This is good for "convert this string of tokens into a list of
tokens" or "for each token in this string, call that method" kind of
code, commonly used in parsing.

HTH,

Alex.
Received on Thu Sep 04 2008 - 17:31:10 MDT

This archive was generated by hypermail 2.2.0 : Fri Sep 05 2008 - 12:00:06 MDT