Question on String design

From: kinkie <gkinkie_at_gmail.com>
Date: Sat, 13 Dec 2008 14:10:59 +0100

Hi all,
        during the past few days I've started actually implementing the
StringNg class.

I've come across a fundamental design choice, which I'd like to share
with you before I take the completely wrong direction.

The issue is the interaction between buffers, strings and encodings.
There's two possibilities:
1- a StringNg holds a Buf and references an Encoding. It's thus blob-ish
and transcoding is done on demand.
  Advantages: transcoding is done done lazily
  Disadvantages: certain common operations are highly expensive when
variable-length encodings are used ( e.g UTF-8 ), since they require
parsing the whole String from the beginning
2- StringNg are always encoded in a fixed-length encoding (e.g. UCS-2)
and only reference a SBuf. Transcoding is done on creation and export.
  Advantages: StringNg maniupulation is easy
  Disadvantages: this approach basically nullifies one of the advantages
of SBufs for Strings, which is their ability to share storage.

While I've set off implementing option 1, I'm changing my mind
and I'm considering to aim instead for is a variant of option 2,
consisting of three classes and one class system
- SBuf (memory management, and blobs)
- AsciiString (it's a special case of a StringNg, could inherit from
SBuf, using the ASCII encoding, to be able to exploit SBuf's
optimizations for the most common case).
- a system of StringEncoding's (with an accompanying EncodingRegistry)
- UcsString, a general-purpose UCS-2 encoded string. Encodings help
importing and exporting to it.

There are some variants possible on this main scheme:
- doing without AsciiString
- a pure virtual StringNg class, implemented by AsciiString and UcsString

Thoughts? Opinions?

        Kinkie
Received on Sat Dec 13 2008 - 13:12:25 MST

This archive was generated by hypermail 2.2.0 : Sun Dec 14 2008 - 12:00:03 MST