Re: pseudo-specs for a String class

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 28 Aug 2008 01:25:18 +1200

Adrian Chadd wrote:
> 2008/8/27 Amos Jeffries <squid3_at_treenet.co.nz>:
>
>
>> No. MemBuf would be good, but thats already used. If the assumptions Squid
>> makes of MemBuf allow it to be replaced entirely by your buffer object.
>> Great re-use the name too.
>
> Well in a completely amusing twist of fate, a large part of the MemBuf use gets
> thrown away when you start providing a better string interface. Crazy but true.

Yes, my point, and what I found threshing out v1 and v2 of the String
update ideas.

>
>> I disagree with Adrian on this one, a String is whatever we define a String
>> to be. Printable characters or not. After all, an ascii string can have
>> whitespace and NULL, and unicode text is 100% binary encrypted blobs at the
>> byte level.
>
> How's that disagreeing with me? I said a String is a region of memory
> with "more" than just a region of memory. It'll have some kind of
> behaviour and more manipulation functions around whatever we define
> String as being.

I read your definitions of memory buffer (simple case) and string
(complex case) as being definition of two separate non-interchangeable
things:

  "A memory region can be manipulated (passed into vector IO, modified
with COW or not semantics, etc). Its just an array of bytes."

" A string includes things like potentially caring about character
encoding in things like length calculations, comparisons, etc. A
memory region doesn't. A string is generally a representation of
printable data; a memory region isn't."

... particularly that last sub-sentence.

Um, and per your:
" Would you use a "String" as the reference counted type for say, the
memory store? "

Yes. I would. I really don't like Java, but their object serialization
concept can be made very efficient for specific cases like HTTP Header
storage. Completely removing any duplicate parsing (speed!) on load of
object bytes from disk etc.
Minimal size cost is disk space of ((2xINT + PTR)*N + INT) though, where
N is the number of tokens in the array of Strings. +(PTR * H + INT) if
its done as a full tree (where H == header count).

>
> Just don't define String as being "UnicodeString" off the bat, or
> things will get slightly complicated, and don't define String as
> "array of bytes of memory" as then certain things like "strcasecmp()"
> have little meaning.
>
>> As long as its contents are known to be contiguous in meaning and
>> information content it fits the description of String to me. Also, we are
>> mostly using them to represent pieces of HTTP Headers, which is a protocol
>> built of classical Strings.
>>
>> If you are implementing the BetterStringBuffer (next generation) objects,
>> I'd go with RefString or similar. Since its ref-counted.
>>
>> If you want to be pedantic about the printable char issue, DataBuffer makes
>> more descriptive sense.
>
> Hm, Alex/Duane/Robert/Henrik, what did Whale call all of this? I
> thought this was one area that was "done" and gels pretty well with
> what we've since learnt..
>
> Adrian

Amos

-- 
Please use Squid 2.7.STABLE4 or 3.0.STABLE8
Received on Wed Aug 27 2008 - 13:25:18 MDT

This archive was generated by hypermail 2.2.0 : Thu Aug 28 2008 - 12:00:08 MDT