Re: Buffer/String split, take2 from Adrian Chadd on 2009-01-20 (squid-dev)

From: Adrian Chadd <adrian_at_squid-cache.org>
Date: Tue, 20 Jan 2009 20:13:19 -0500

2009/1/20 Alex Rousskov <rousskov_at_measurement-factory.com>:

> Please voice your opinion: which design would be best for Squid 3.2 and
> the foreseeable future.

[snip]

I'm about 2/3rds of the way along the actual implementation path of
this in Cacheboy so I can provide an opinion based on increasing
amounts of experience. :)

[Warning: long, somewhat rambly post follows, from said experience.]

The thing I'm looking at right now is what buffer design is required
to adequately handle the problem set. There's a few things which we
currently do very stupidly in any Squid related codebase:

* storeClientCopy - which Squid-2.HEAD and Cacheboy avoid the copy on,
but it exposes issues (see below);
* storeAppend - the majority of data coming -into- the cache (ie,
anything from an upstream server, very applicable today for forward
proxies, not as applicable for high-hit-rate reverse proxies) is still
memcpy()'ed, and this can use up a whole lot of bus time;
* creating strings - most strings are created during parsing; few are
generated themselves, and those which are, are at least half static
data which shouldn't be re-generated over and over and over again;
* duplicating strings - httpHeaderClone() and friends - dup'ing
happens quite often, and making it cheap for the read only copies
which are made would be fantastic
* later on, being able to use it for disk buffers, see below
* later on, being able to properly use it for the memory cache, again see below

The biggest problems I've hit thus far stem from the data pipeline
from server -> memstore -> store client -> client side. At the moment,
the storeClientCopy() call aggregates data across the 4k stmem page
size (at least in squid-2/cacheboy, I think its still 4k in squid-3)
and thus if your last access gave you half a page, your next access
can get data from both the other half of the page and whatever is in
the next buffer. Just referencing the stmem pages in 2.HEAD/Cacheboy
means that you can (and do) end up with a large number of small reads
from the memory store. You save on the referencing, but fail on the
"work chunk size." You end up having to have a sensible reference
counted buffer design -and- a vector list to operate on it with.

The string type right now makes sense if it references a contiguous,
linear block of memory (ie, a sub-region of a contig buffer). This is
how its treated today. For almost all of the lifting inside Squid
proper, that may be enough. There may however be a need later on for
string-like and buffer-like operations on buffer -vectors- - for
example, if you're doing some kind of content scanning over incoming
data, you may wish to buffer your incoming data until you have enough
data to match that string which is straddling two buffers - and the
current APIs don't support it. Well, nothing in Squid supports it
currently, but I think its worth thinking about for the longer term.

Certainly though, I think that picking a sensible string API with
absolutely no direct buffer access out of a few controlled areas (eg,
translating a list of strings or list of buffers into an iovec for
writev(), for example) is the way to go. That will equip Squid with a
decent enough set of tools to start converting everything else which
currently uses C strings over to using Squid Strings and eventually
reap the benefits of the zero-cost string duplication.

Ok, to summarise, and this may not exactly be liked by the majority of
fellow developers:

I think the benefits that augmenting/fixing the current SquidString
API and tidying up all the bad places where its used right now is
going to give you the maximum long-term benefit. There's a lot of
legacy code right now which absolutely needs to be trashed and
rewritten. I think the smartest path forward is to ignore 95% of the
decision about deciding which buffering method to use for now, fix the
current String API and all the code which uses it so its sensible (and
fixing it so its "sensible" won't take long; fixing the code which
uses it will take longer) and at that point the codebase will be in
much better shape to decide which will be the better path forward.

Now, just so people don't think I'm stirring trouble, I've gone
through this myself in both a squid-2 branch and Cacheboy, and here's
what I found:

* there's a lot of code which uses C strings created from Strings;
* there's a lot of code which init'ed strings from C strings, where
the length was already known and thrown out;
* there's a lot of code which init'ed strings from C strings which
were once Strings;
* there's even code which init's strings -from- a string, but only by
using strBuf(s) (I'm pointing at the http header related code here,
ugh)
* all the stuff which directly accesses the string buffer code can and
should be tossed, immediately - unfortunately there's a lot of it, the
majority being in what I gather is very long-lived code in
src/client_side.c (and what it became in squid-3)

So what I'm sort of doing now in Cacheboy-head, combined with tidying
up some of the aforementioned crappy code so I can actually -get- to
the point where referenced buffers are feasible and I can properly
code up, test and evaluate the possibilities:

* Since the current code which uses the buffer directly, either by
taking a -copy- of the buffer, or just a reference to the C buffer
(very bloody naughty) and praying the String isn't cleared until its
finished with it (eg, the auth code, ARGH), changing String to use a
refcounted buffer preserves that expectation.
* .. so, one could get a cheap string Dup operation right -now- (and
I've done it, and it works fine!) with minimal screwing around, but by
adding some more memory allocator calls (for the structure managing
the refcounted buffer.)
* .. which ends up not being that bad, because even though you double
the memory allocator calls for a new string (one for the refcounted
buffer mgmt struct, one for the buffer itself), you save on about 40%
of them because the majority of the string creation is still done in
stringDup() :)
* And as I said before, that then allows coders to run around the
codebase and start replacing all the uses of C strings with actual
Strings, eliminating a lot of the strdup() calls and such, and
generally forcing the replacement code to turn out tidier.

Once all of this is done and tidied up, figuring out what the
replacement buffer code could and should look like will be much
easier. You'll have limited using the replacement reference counted
buffer type stuff just for strings, not used it anywhere else for now,
and we can all sit back and decide what to do next.

Adrian
Received on Wed Jan 21 2009 - 01:13:28 MST

This archive was generated by hypermail 2.2.0 : Wed Jan 21 2009 - 12:00:26 MST