Re: squid3 comments

From: Adrian Chadd <adrian@dont-contact.us>
Date: Tue, 27 Feb 2007 16:12:45 +0800

On Tue, Feb 27, 2007, Tsantilas Christos wrote:
> Hi all,
>
> I was looking in squid3 code last days. I read again Adrian's mails
> in which he complained about squid3 speed, and cpu usage.
>
> Looking in the code there are a number of code-pieces which can
> improved. An example is the call of headersEnd (Francisco Gimeno note
> this problem too for squid26) which computed again and again in squid3.
> Look in HttpStateData::processReplyHeader (http.cc file) method,
> headerEnd called inside httpMsg::parse call:
> const bool parsed = newrep->parse(readBuf, eof, &error);
> and then called again some lines after:
> header_bytes_read = headersEnd(readBuf->content(), readBuf->contentSize());
> These days the http headers can easily be 2k, 4k or more (big urls,
> cookies etc) so such calls are really costs.
>
> I also rewrite the parseHttpRequest (client_side.cc file), to not
> xmalloc space for url parsing and I modified prepareTransparentURL and
> prepareAcceleratedURL functions too to take an extra argument the url
> length and to not require url as a null terminated string. OK it was not
> difficult but I think that the problem is not only writing something
> faster here.
> For example the struct HttpParser it is better to be a c++ class (maybe
> HttpMsg derived?) and some functions be implemented as methods.
>
> Do you think that such changes in squid3 code make sense?

They make sense but you'd be repeating the work I've done in the squid-2-HEAD
branch.

Whats really needed is some more thought about the structure of the code.
Adding lengths to character buffers is a good thing to save on strlen() calls
but they really should be using some referenced buffer setup. Heck, in C++
it should at the -outside- be using String as an intermediate (since you can
then later on present an implementation of the String API sitting on top of
said reference counted buffer data type..)

Want to know what else is silly in the client-side code when you go digging?

Here's a snippet just from handling the incoming request:

* The order that things are parsed. We call headersEnd() to make sure we've got
  an entire request, then build clientHttpRequest, then parse the URL to build
  a request_t, then fill out the headers. This, to be honest, is horrible.
* Then we take the URL that we've already delineated the start/end of and we
  parse it again to delineate the beginning/end of each chunk - we already have
  method so thats ok, but protocol, user/password, host, port, uripath.
  We've already parsed it once; we should have recorded what we saw as we did it.
* Then we copy all the headers into seperately allocated strings, when the buffer
  already has the content. We should just reference-count the buffer and build
  a string type that can reference an extent of a buffer. This has been done
  more than a thousand times in the last 20 years; its nothing new.

What about building the reply, once we've negotiated the myriad of the store
manager?

* at the moment we copy the HttpReply from the server side (which we used to copy
  and then parse when we'd already have a parsed copy in MemObject), then go through
  the headers and extract out all the information we need. Then we walk the list
  again IIRC and pull out headers that shouldn't be there. Its a pretty big mess.
* Once we've got a reply we pack it all up in a membuf (another whole load of copying)
  and schedule that to write. At least the code used to pack whatever data was
  left over from header parsing and stuff that into the write too so objects that
  fit entirely in the 4k copy buffer were written with one write() but still, all that
  copying is done rather than using writev().

There's also a -lot- of code involved in the data exchange path (store client,
appending) between client and server but thats a whole other story.

We could spend a few years slowly picking away optimising a few percent here and
there or .. we could not. :) Help me sort out the basic store api changes in the
storework branch so I can move onto the next area that really needs all the
rework - the client side request/reply code.

Yes, I'd like to do all of what you've suggested above but I'm going to do it by
junking most of the client-side request/reply handling routines and replacing them
with stuff written from the ground up to use a sane API. Buffers won't be copied;
stuff will be parsed once and parsed quickly; the whole of the URL rewrite/XFF/
Acceleration/ACL lookups/etc will be implemented as a state engine with events
rather than callback-driven like it is today. Its also doable in a -lot- less code
than is traversed today.

I believe the only way we're going to get long-term benefits out of any improvement
work is to pick some medium term goals (mostly structured around knowing what
we'd like the code to look and picking incremental API and code decisions which
take us down that path) and working on them piece by piece.

So c'mon, help me out. :) I have a big list of things that we need to do to move
forward and only Henrik and I to do em.

Adrian
Received on Tue Feb 27 2007 - 01:04:49 MST

This archive was generated by hypermail pre-2.1.9 : Thu Mar 01 2007 - 12:00:02 MST