Re: pseudo-specs for a String class: char *buf

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 4 Sep 2008 13:47:19 +1200 (NZST)

> On Wed, 2008-09-03 at 16:53 +0200, Kinkie wrote:
>> On Wed, Sep 3, 2008 at 3:59 PM, Alex Rousskov
>> <rousskov_at_measurement-factory.com> wrote:
>> >
>> > I looked at your StringNg wiki page and noticed that your string
>> has
>> > a "char *buf" pointer into the memory buffer (in addition to the
>> buffer
>> > pointer itself). I think it would be better to use an offset instead
>> of
>> > the pointer into internal buffer area:
>>
>> Yes, I had a discussion with Adrian about the same issue earlier on on
>> IRC.
>> You make excellent points - as Adrian did :)
>> Here's my take
>>
>> > - cleaner design: no peeking into other object's privates
>> Yes. At the same time the KBuf::Buf class (the "other object") is a
>> private member class of the KBuf class;
>> it's actually little more than a glorified struct, and shouldn't be
>> thought as a first-level citizen on its own.
>
> Private or not, it is still another object.
>
> And, FWIW, I doubt the memory buffer class will remain inside the string
> class.
>
>> > - easier to change memory buffer internals
>> If you mean "change the buffer contents", that's an operation which
>> should be quite rare.
>> If you mean "change the code" that should be even rarer.
>
> I meant "change the code". I do expect those changes in the foreseeable
> future.
>
>> > - easier to support several buffer types with different internals
>> I didn't really think of different buffer types. Do you have in mind
>> any scenario where it would be useful?
>
> Yes, I do (e.g., small versus large, thread-safe versus not, and
> contiguous versus chunked).
>
> In fact, you kind of documented different buffer implementations
> yourself: "small Bufs (<8Kb) should be managed by MemPools. - Bufs
> bigger than 8Kb should be allocated in sizes compatible with the system
> page size"
>
>> > - easier to support re-allocation of buffer memory
>> > - easier to provide a thread-safe implementation.
>>
>> On the other hand, char* are significantly more efficient for common
>> operations, consistently with the design goals..
>
> I do not think an offset would be significantly less efficient in this
> context. I bet 90+% of operations that require raw data access are far
> more expensive than adding an offset to a pointer.
>
>> I'm not saying that I won't change them, I'd just like to be shown
>> scenarios where it makes a difference.
>
> I believe I provided more than enough reasons and you agreed with at
> least some of them. You have provided one so far ("significantly more
> efficient for common operations"). I think the burden of proof should be
> on you in this case.
>
>> On an unrelated issue, since it was of interest to some of us, here's
>> a sample of the caller code for tokenization functions (actual live
>> code):
>>
>> KBuf s1;
>> cout << "tokenization: \n";
>> {
>> s1="The quick brown fox jumped over the lazy dog";
>> char *needle=" ";
>> KBuf cs1(needle);
>> while (!s1.isNull()) {
>> cout << "token: " << s1.nextToken(cs1) << endl;
>> }
>> }
>> cout << endl;
>
> FWIW, I still think that tokenization should be a external to the buffer
> or string and should not modify them. Please see my earlier posts for
> details.

Kinkie, while I like the single-object API design. I think you could get
around all these arguments and confusion by adding a sub-class of KBuf
called KBufTokeniser, which just provides the nextToken API on top of the
String API.

Alex, the basic buffer is not altered, only where the s1 offset is
pointing at. Kinkie is just not very good at describing that code loop yet
:-( .

From what he mentioned on IRC last night....

Making s1 a duplicate reference to another KBuf (ie the actual in put
buffer) should show that the base KBuf is unchanged, but the parsing with
nextToken() will only spew off a child sub-string and increment the s1
start offset one token down the string.

I'm in favor, it can be tuned for very efficient Parsing. And in
inefficient usage of it can be fixed easily.

Amos
Received on Thu Sep 04 2008 - 01:47:23 MDT

This archive was generated by hypermail 2.2.0 : Thu Sep 04 2008 - 12:00:04 MDT