Re: Buffer/String split, take2 from Kinkie on 2009-01-21 (squid-dev)

From: Kinkie <gkinkie_at_gmail.com>
Date: Wed, 21 Jan 2009 10:47:38 +0100

On Tue, Jan 20, 2009 at 10:24 PM, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> Hello,
>
> Kinkie has finished another round of his String NG project. The code
> is available at https://code.launchpad.net/~kinkie/squid/stringng
> During code review and subsequent IRC discussion archived at
> http://wiki.squid-cache.org/MeetUps/IrcMeetup-2009-01-17 it became
> apparent that the current design makes all participating developers
> unhappy (for different reasons).
>
> We have to revisit the discussion we had in the beginning of this
> project[1] and put this issue to rest, at last.
> [1] http://thread.gmane.org/gmane.comp.web.squid.devel/8188
>
>
> There was not enough developers on the IRC to come to a consensus
> regarding the best direction, but it was clear that the current design
> is the worst one considered as it tries to mix at least two incompatible
> designs together.
>
> This email summarizes a few design options we can chose from (none of
> them matches the current code for the above mentioned reasons).
>
> Please voice your opinion: which design would be best for Squid 3.2 and
> the foreseeable future.
>
>
> * Universal Buffer:
>
> Blob = low-level raw chunk of RAM invisible to general code
> allocates, holds, frees raw RAM buffer
> can grow the buffer and write to the buffer
> the memory allocation strategy can change w/o affecting others
> does not have a notion of "content", just allocated RAM
>
> Buffer = all-purpose user-level buffer
> allows users to safely share a Blob instance via COW
> search, compare, consume, append, truncate, import, export, etc.
> has (offset, length) to maintain an area of Blob used by this Buffer
>
> This design is very similar to std::string. The code gets a "universal
> buffer" that can do "everything". This is probably the simplest design
> possible.
>
> The primary drawback here is that it would be difficult and messy to
> optimize different buffering needs in a single Buffer class.
>
> For example, I/O buffers usually need to track appended/consumed size
> and want to optimize (or eliminate) coping when it is time to do the
> next I/O while some strings are pointing to the old buffer content.
> Adding that tracking logic and optimizations to generic Buffer would be
> "wrong" because it will pollute Buffers used "like strings".
>
> Similarly, general strings may want to keep encoding information or
> perform heavy search optimizations. Adding those to generic Buffer would
> be "wrong" because it will pollute "I/O buffers" code.
>
> Another example is adding simple but efficient vector I/O support. With
> a single Buffer, it would be difficult to support vectors because it
> will clash with string-like usage needs.
>
>
>
> * Divide and Conquer (D&C):
>
> Blob = low-level raw chunk of RAM invisible to general code
> same as Blob in the Universal Buffer approach
>
> Buffer = shareable Blob
> allows users to safely share a Blob instance via COW
> works with Blob as a whole: not areas (see note below)
> used as exchange interface between specialized buffers
>
> IoBuffer = buffer optimized for I/O needs
> perhaps should be called IoStream
> uses Buffer
> has (appended, consumed) to track I/O progress and area
> exports available data as a Buffer instance
> may eventually support vector I/O by using multiple Buffers
>
> String = buffer optimized for content manipulation
> uses Buffer
> has (offset, length) to maintain a Buffer "content area"
> search, compare, replace, append, truncate, import, export, etc.
> may eventually store content encoding information
>
> The killer idea here is that the interpretation of a piece of allocated
> and shareable RAM (i.e, Buffer) is left to classes that specialize in
> certain memory manipulations (e.g., I/O or string search). Optimizing or
> changing one class does not have to affect the other.
>
> More specialized classes can be added as needed. Buffer is used to share
> info between classes. Conversions are explicit and easier to track. We
> could also add an Area class that makes it possible to store "content"
> offset and length when importing or exporting a Buffer.
>
> (note) A possible variation of the same design would be to move area
> manipulation to Buffer. This will free String from "area code" but force
> IoBuffers and others to use the same area model instead of
> appended/consumed counters or whatever they need. This will probably
> make migration to vectored I/O more complex, but we can deal with it. If
> D&C approach is chosen, we will decide where to put area manipulation:
> Buffer, String, or perhaps a separate Area class.
>
>
> * Other
>
> There are probably other options.
>
>
> I still think we should implement one good design, commit it, and work
> on converting the code to use it rather than starting with massaging the
> old code to be easier to convert to something in the future. If you
> would like to discuss the choice between those two strategies, please
> start your own thread :-)!
>
>
> So far, my _personal_ interpretation of votes based on the recent IRC
> discussions and that earlier squid-dev thread[1] is:
>
> Universal String: Kinkie, Amos
> Divide and Conquer: Adrian, Henrik, Alex
>
> Do you prefer a Universal Buffer or a Divide and Conquer design?
>
>
> Thank you,
>
> Alex.
> P.S. I am focusing on the overall design and ignoring all the secondary
> bugs present in the current stringng lp branch. I have sent a partial
> list to Kinkie, but it may not make sense to work on those bugs until
> the above issue is resolved, at last.

What I fear from the D&C approach is that we'll end up with lots of
duplicate code between the 'buffer' classes, to gain a tiny little bit
of efficiency and semantic clarity. If that approach has to be taken,
then I'd rather take the variant of the note - in fact that's quite in
line with what the current (agreeably ugly) code does.

In my opinion the 'universal buffer' model can be adapted quite easily
to address different uses by extending its allocation strategy - it's
a self-contained function of code exactly for this purpose, and it
could be extended again by using Strategy patterns to do whatever the
caller wishes. It would be trivial for instance for users to request
that the underlying memory be allocated by the pageful, or to request
preallocation of a certain amount of memory if they know they'll be
using, etc.
Having a wide interface is a drawback of the Universal approach,

Regarding vector i/o, it's almost a no-brainer at a first glance:
given UniversalBuffer, implement UniversalBufferList and make MemBuf
use the latter to implement producer-consumer semantics. Then use this
for writev(). produce and consume become then extremely lightweight
calls. Let me remind you that currently MemBuf happily memmoves
contents at each consume, and other producer-consumer classes I could
find (BodyPipe and StoreEntry) are entirely different beasts, which
would benefit from having their interfaces changed to use
UniversalBuffers, but probably not their innards.

Regarding Adrian's proposal, he and I discussed the issue extensively.
I don't agree with him that the current String will give us the best
long-term benefits. My expectation is (but we can only know after we
have at least some extensive use of it) that the cheap substringing
features of the current UniversalBuffer implementation will give us
substantial benefits in the long term.
I agree with him that fixing the most broken parts of the String
interface is a sensible strategy for merging whatever String
implementation we end up choosing.

I fear that if we focus too much on the long-term, we may end up
losing sight of the medium-term, and thus we risk reaching neither
because short-term noone does anything. EVERYONE keeps on asserting
that squid (2 and 3) has low-level issues to be fixed, yet at the same
time only Adrian does something in squid-2, and I feel I'm the only
one trying to do something in squid-3 - PLEASE correct me and prove me
wrong.

There's another issue which worries me: the current implementation has
been in the works for 5 months; there have been two extensive reviews,
two half-rewrites and endless discussions. Now the issue crops up that
the basic design - whose blueprint has also been available for 5
months in the wiki - is not good, and that we may end up having to
basically start from scratch. How can we as a project prevent for the
future such a waste of resources (in this case, mine) is in my very
humble opinion an issue key to the very survival of squid as a
project.

As a side note, if we end up starting from scratch, it will have to be
someone else doing it; I would be obviously biased towards the 'wrong'
design, and I would prefer to concentrate on other things to be done
which are characterized by a lesser risk of merge rejection.

-- 
    /kinkie

Received on Wed Jan 21 2009 - 09:47:58 MST

This archive was generated by hypermail 2.2.0 : Wed Jan 21 2009 - 12:00:26 MST