squid 2.6 parser work

From: Adrian Chadd <adrian@dont-contact.us>
Date: Tue, 29 Aug 2006 17:16:38 +0800

Hiya,

I've been doing some work on Squid-2.6 to optimise the parser code.
I've been working with Mark Nottingham from Yahoo! who has been running various
throughput tests on Squid-2.6.

I've been concentrating on the client-side stuff - request parsing, reply
parsing/building. There's plenty of other areas of the code which could
do with some optimising but I'm focusing on the client side for now.

The summary is:

>1k responses: from 4,900 -> ~6,000-6,700
>4k responses: from 6,500 -> ~8,000-9,000

IIRC, this is measured using a local version of httperf. My changes have been:

* don't call headersEnd() so often if it can be avoided!
* quite a large rework of the request line parser (which is unfinished but
  I believe parses RFC-compliant HTTP/1.0 & 1.1 requests fine; doesn't parse
  HTTP/0.9 requests just for now)
* Some refactoring of clientReadRequest
* Code modifications to not triple-copy the request buffer whilst parsing

Stuff I've seen which should also give a noticable performance boost in this
particular micro benchmark:

* An overhaul of HttpReply so it doesn't double-copy the reply buffer during parsing
  (Which will probably require rewriting the status line parser to not expect
  a NULL terminated string, much like what the request line parser was doing..)

* Rethink the Http Header stuff - a lot of the time spent in request parsing/reply
  building is the memory allocations and array manipulation needed to support
  HttpHeader.

* See if there's a nice way to combine the initial header write and data buffer
  into a single write(). More likely, come up with some simple way of reference
  counting some stuff to build iovec's and feed to writev().

* Hint to memPoolAlloc/memPoolFree that they shouldn't xfree() certain buffers,
  such as the buffers being allocated to strings and stmem buffers.

I've been profiling using gprof and perfsuite. Both are statistical; both give
different results. I've been using the gprof call graphs as well.

I'm using apachebench to do local testing. Here's what I use:

adrian@jacinta:~$ ab -c 10 -n 100000 http://192.168.3.1:3128/squid-internal-static/icons/test.4k

Squid compiled with:

adrian@kandy:~/work/squid/sf/parserwork$ env CFLAGS="-O2 -g -pg -ggdb -fno-inline-functions \
  -fno-inline-functions-called-once --no-inline" ./configure --prefix="/home/adrian/work/squid/run" \
  --enable-storeio="ufs null" --disable-unlinkd --quiet

Flat profile:

Each sample counts as 0.01 seconds.
  % cumulative self self total
 time seconds seconds calls ms/call ms/call name
  3.57 0.67 0.67 4700715 0.00 0.00 memPoolFree
  3.57 1.33 0.67 200018 0.00 0.00 headersEnd
  2.73 1.84 0.51 100009 0.01 0.04 httpRequestFree
  2.20 2.25 0.41 100009 0.00 0.02 parseHttpRequest
  2.17 2.65 0.41 1000090 0.00 0.00 httpHeaderIdByName
  2.06 3.04 0.39 5901105 0.00 0.00 arrayAppend
  1.80 3.38 0.34 4701505 0.00 0.00 memPoolAlloc
  1.77 3.71 0.33 1301128 0.00 0.00 xstrncpy
  1.74 4.03 0.33 200018 0.00 0.02 clientWriteComplete
  1.69 4.34 0.32 1300117 0.00 0.00 httpHeaderEntryDestroy
  1.61 4.64 0.30 1700186 0.00 0.00 memFreeString
  1.55 4.93 0.29 320996 0.00 0.06 comm_call_handlers
  1.50 5.21 0.28 601188 0.00 0.00 xstrdup
  1.50 5.50 0.28 500056 0.00 0.00 dlinkDelete

47 memory allocation/frees, 59 array appends, 13 header entry destroys, etc.

The same test run, compiled without -pg, run under perfmon/perfsuite:

File Summary
--------------------------------------------------------------------------------
Samples Self % Total % File

    365 14.43% 14.43% /home/adrian/work/squid/sf/parserwork/src/client_side.c
    353 13.96% 28.39% /home/adrian/work/squid/sf/parserwork/src/HttpHeader.c
    180 7.12% 35.51% /home/adrian/work/squid/sf/parserwork/src/MemPool.c
    114 4.51% 40.02% /home/adrian/work/squid/sf/parserwork/src/comm.c
     98 3.88% 43.89% /home/adrian/work/squid/sf/parserwork/src/mem.c
     97 3.84% 47.73% /home/adrian/work/squid/sf/parserwork/lib/Array.c
     89 3.52% 51.25% /home/adrian/work/squid/sf/parserwork/src/cbdata.c
     86 3.40% 54.65% /home/adrian/work/squid/sf/parserwork/lib/util.c
     83 3.28% 57.93% /home/adrian/work/squid/sf/parserwork/src/tools.c
     79 3.12% 61.05% /home/adrian/work/squid/sf/parserwork/src/String.c
     68 2.69% 63.74% /home/adrian/work/squid/sf/parserwork/src/store_client.c
     57 2.25% 65.99% /home/adrian/work/squid/sf/parserwork/src/acl.c

Function Summary
--------------------------------------------------------------------------------
Samples Self % Total % Function

    126 4.98% 4.98% memPoolFree
     77 3.04% 8.03% httpHeaderGetEntry
     74 2.93% 10.95% arrayAppend
     74 2.93% 13.88% httpHeaderClean
     59 2.33% 16.21% httpHeaderEntryDestroy
     56 2.21% 18.43% headersEnd
     54 2.14% 20.56% memPoolAlloc
     47 1.86% 22.42% clientWriteComplete
     47 1.86% 24.28% httpRequestFree
     46 1.82% 26.10% memFreeString
     41 1.62% 27.72% comm_call_handlers
     40 1.58% 29.30% stringClean
     35 1.38% 30.68% clientSendMoreData
     35 1.38% 32.07% xstrncpy
     32 1.27% 33.33% connStateFree
     30 1.19% 34.52% dlinkDelete

Function:File:Line Summary
--------------------------------------------------------------------------------
Samples Self % Total % Function:File:Line

     38 1.50% 1.50% httpHeaderClean:/home/adrian/work/squid/sf/parserwork/src/HttpHeader.c:354
     36 1.42% 2.93% httpHeaderEntryDestroy:/home/adrian/work/squid/sf/parserwork/src/HttpHeader.c:1193
     35 1.38% 4.31% httpHeaderGetEntry:/home/adrian/work/squid/sf/parserwork/src/HttpHeader.c:555
     28 1.11% 5.42% ??:??:0
     26 1.03% 6.45% arrayAppend:/home/adrian/work/squid/sf/parserwork/lib/Array.c:95
     25 0.99% 7.43% xstrdup:/home/adrian/work/squid/sf/parserwork/lib/util.c:600
     22 0.87% 8.30% xstrncpy:/home/adrian/work/squid/sf/parserwork/lib/util.c:680
     20 0.79% 9.09% httpHeaderGetEntry:/home/adrian/work/squid/sf/parserwork/src/HttpHeader.c:551
     19 0.75% 9.85% memPoolFree:/home/adrian/work/squid/sf/parserwork/src/MemPool.c:326
     17 0.67% 10.52% arrayAppend:/home/adrian/work/squid/sf/parserwork/lib/Array.c:93
     16 0.63% 11.15% memPoolFree:/home/adrian/work/squid/sf/parserwork/src/MemPool.c:317
     16 0.63% 11.78% arrayAppend:/home/adrian/work/squid/sf/parserwork/lib/Array.c:91
     15 0.59% 12.38% memPoolFree:/home/adrian/work/squid/sf/parserwork/src/MemPool.c:303
     15 0.59% 12.97% memPoolFree:/home/adrian/work/squid/sf/parserwork/src/MemPool.c:319
     14 0.55% 13.52% headersEnd:/home/adrian/work/squid/sf/parserwork/src/mime.c:147

Each gives slightly different results but they're all centred around the same functions -
memory allocation/free, header creation/deallocation.

I'm going to stop speeding things up, complete the request/reply parser modifications and
concentrate on fixing any bugs that pop up. I'm not going to try writing an incremental
HTTP parser for now; I'll leave that for Squid-3. I'm mainly doing this to wrap my head
around what bits of the code are fast, what bits are slow, and why.

Adrian
Received on Tue Aug 29 2006 - 03:16:41 MDT

This archive was generated by hypermail pre-2.1.9 : Fri Sep 01 2006 - 12:00:03 MDT