Re: pseudo-specs for a String class

From: Adrian Chadd <adrian_at_freebsd.org>
Date: Wed, 27 Aug 2008 08:42:17 +0800

(My pre-breakfast 2c, so forgive me if I'm less clear than normal.)

2008/8/27 Kinkie <gkinkie_at_gmail.com>:

> My thoughts: \0 is special, and would only be significant when strings
> need to be exported from the memory-managed code onto nonmanaged code.
> Generally speaking, the safest way to do so is by copy rather than by
> reference, but I'd rather also keep the ability to export by reference
> - hoping the caller knows what they're doing. In that case the \0 is a
> must-have safeguard, in some cases might require copying. Unfortunate
> but unavoidable.

Although plenty of current code assumes a NUL terminated, string, its
assumed primarily for two things:

* debug(); which can be replaced with %.*s or whatever it is, to pass
in a length before the string buffer;
* iterating/parsing; which can be replaced by using the length
parameter in pointer arithmetic (you can toss the pointer arithmetic
too in like 99% of the cases; the parser is about where the possible
speed boosts from pointer arithmetic would even matter)

Both of which can be eliminated without too much trouble. In fact, I
ended up with NUL terminated strings as a special flag case during
transition work so the existing code assuming NULs could still work
whilst I converted stuff over.

> Well, tokenising should be replaced by substringing really.. it could
> mean having to drop strtok().

.. and in reality, writing replacement str*() routines for your String
class instead of using C string.h functions makes everything much
easier. Including the above.

Kinkie, s27_adri has a whole lot of additional String.c functions for
manipulating strings.

>> Append operation on String/MemoryRegion objects is easy in this model,
>> but if the region is not at the end of the MemoryBlob or if the result
>> gets too large the it will need to trigger a copy to a new MemoryBlob of
>> sufficient size.
>
> Yes.

Which won't happen in like >99% of the cases.

> It depends: I expect a rather common case to be when only one String
> owns a Buf/MemoryBlob. In that case modifications are cheap.

Actually, the most common operation for Squid once you've fully
reworked the whole environment to use this model is "lots of Strings
referencing a large buffer" (ie, the request and reply socket buffer;
the URL strings once those are converted over.) Almost all of the
strings in-play are the http header entry strings, and most of -those-
are never modified.

Most of the -rest- are one String referencing an entire buffer.

In any case, I agree with the general model of:

* Memory: some chunk of contiguous memory somewhere;
* MemoryRegion: some reference to { Memory, offset, length }
* String: a MemoryRegion and some routines to manipulate it

Adrian
Received on Wed Aug 27 2008 - 00:43:05 MDT

This archive was generated by hypermail 2.2.0 : Wed Aug 27 2008 - 12:00:06 MDT