Re: Cachemgr.cgi from Julian Anderson on 1997-07-01 (squid-users)

From: Julian Anderson <Julian.Anderson@dont-contact.us>
Date: Wed, 02 Jul 1997 12:09:49 +1200

>We are fairly new to Squid and are seeing the benefits.
>Could someone explain to me the following output from cachemgr.cgi:
>Select loop called: 989837 times, 356.304 ms avg
>Page faults with physical i/o: 36660
>dnsserver avg service time: 2570 msec
>
>Select loop seems high, is it?
>I also think that dnsserver avg is way high unless I am misunderstanding
>what the line means.
>We run Squid 1.NOVM.11 on a Pentium 133 w/96meg RAM/4gig Drive space.

I suspect you need to do a little tuning; those page faults will be
slowing down your cache substantially. The select loop is the core of
the cache; it acts as kind of dispatcher for all the IO (pages being
fetched, pages being delivered, pages coming off disk, DNS servers,
redirectors, etc) so it's likely to be busy. I don't know exactly
what that avg figure is, does the cache seem very slow?

You (and squid-users members in general) may be interested in some of
the stats that we've pulled out of our squid system. We recently
installed Digital's Continuous Profiling Infrastructure on our squid
machine to chase down some performance problems we were having. We
got some interesting profiling data.

The system is a Digital AlphaServer 1000A, 512Mb RAM, 8Gb of cache
disk. We're running squid-1.1.10, and we typically process around
200 000 requests / 1Gb data per day. The squid process usually takes
around 55% of the CPU.

The stats below come from a squid that has been running for 7 days.
The DCPI system samples the running process frequently and records
which procedure the application is executing. Note that it's not
telling you how long each procedure takes, just what proportion of the
time the process spends in each procedure.

The imiss field lists misses on the instruction cache, so is
irrelevant for our purposes.

Total samples for event type cycles = 3894751
Total samples for event type imiss = 426861

The counts given below are the number of samples for each
listed event type.
==========================================================
    cycles % cum% imiss % procedure
    470803 12.09% 12.09% 6623 1.55% memCopy
    368460 9.46% 21.55% 37153 8.70% _db_print
    359002 9.22% 30.77% 29513 6.91% comm_select
    312878 8.03% 38.80% 1294 0.30% ipcache_compareLastRef
    145653 3.74% 42.54% 38845 9.10% comm_select_incoming
    124762 3.20% 45.74% 5559 1.30% hash_lookup
    114227 2.93% 48.68% 8309 1.95% aclMatchIp
    105310 2.70% 51.38% 3862 0.90% xstrncpy
     80094 2.06% 53.44% 10081 2.36% hash4
     79278 2.04% 55.47% 843 0.20% ipcacheExpiredEntry
     64442 1.65% 57.13% 11494 2.69% xcalloc
     43040 1.11% 58.23% 12906 3.02% commSetSelect
     39783 1.02% 59.25% 7241 1.70% httpReadReply
     36187 0.93% 60.18% 3890 0.91% mime_get_header
     35424 0.91% 61.09% 2349 0.55% urlcmp

So, this indicates some ways to edge a little more performance out of
squid. That _db_print is what is behind the debug() call, which is
invoked all through the code and it shows. Perhaps a compile-time
directive to disable all debugging output?

Aside from that, the cache seems to copy memory (memCopy, xstrncpy),
compare things (ipcache_compareLastRef, hash_lookup, aclMatchIp,
urlcmp) and sit in that busy, busy select loop. Looking at the code
for most of these procedures, I can't see too much scope for
optimisation. I'm surprised at the figure for
ipcache_compareLastRef(), which is the comparison function for qsort
for determining which entries are to be purged. Is this being called
too frequently? Would a promote-to-first list be more efficient?

I'm not using either SPLAY_TREE or BIN_TREE, I will recompile with
one of those and see if that affects aclMatchIp et al.

Jules
Received on Tue Jul 01 1997 - 17:12:16 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:35:39 MST