Re: Question on String design

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Sat, 13 Dec 2008 22:28:28 -0700

On Sat, 2008-12-13 at 14:10 +0100, kinkie wrote:

> The issue is the interaction between buffers, strings and encodings.
> There's two possibilities:
> 1- a StringNg holds a Buf and references an Encoding. It's thus blob-ish
> and transcoding is done on demand.
> Advantages: transcoding is done done lazily
> Disadvantages: certain common operations are highly expensive when
> variable-length encodings are used ( e.g UTF-8 ), since they require
> parsing the whole String from the beginning
> 2- StringNg are always encoded in a fixed-length encoding (e.g. UCS-2)
> and only reference a SBuf. Transcoding is done on creation and export.
> Advantages: StringNg maniupulation is easy
> Disadvantages: this approach basically nullifies one of the advantages
> of SBufs for Strings, which is their ability to share storage.

I would not do either at this stage. Let's have a basic String with
ASCII length/compare/search operations first.

When the code settles and we want to add support for non-ASCII encodings
and locale, we will take the next step. The API is unlikely to change
much, so the vast majority of String users will not be affected by the
increased internal complexity and special operations.

For now, I would just do a basic String that references a Buffer, with
an offset and length members, and "one octet = one character" ASCII
interpretation of the content (where interpretation matters).

HTH,

Alex.

> While I've set off implementing option 1, I'm changing my mind
> and I'm considering to aim instead for is a variant of option 2,
> consisting of three classes and one class system
> - SBuf (memory management, and blobs)
> - AsciiString (it's a special case of a StringNg, could inherit from
> SBuf, using the ASCII encoding, to be able to exploit SBuf's
> optimizations for the most common case).
> - a system of StringEncoding's (with an accompanying EncodingRegistry)
> - UcsString, a general-purpose UCS-2 encoded string. Encodings help
> importing and exporting to it.
>
> There are some variants possible on this main scheme:
> - doing without AsciiString
> - a pure virtual StringNg class, implemented by AsciiString and UcsString
>
> Thoughts? Opinions?
>
> Kinkie
Received on Sun Dec 14 2008 - 05:28:40 MST

This archive was generated by hypermail 2.2.0 : Sun Dec 14 2008 - 12:00:03 MST