[squid-users] 2.6-stable3 performance weirdness with polymix-4 testing

From: Pranav Desai <pranavadesai@dont-contact.us>
Date: Fri, 25 Aug 2006 15:59:14 -0700

Hello All,

resending it ... attachments seems to be creating some problem.

I am seeing a weird problem with 2.6-stable3 when testing with
polymix-4. During the first phase of testing the cache performs very
well, but during the second phase everything seems to break apart
drastically. I have attached the results. I can provide further data
if needed.

setup:
--------
* polymix-4
* 1000 req/s
* 2 x 6hours phases.
* 2 clt/srv pair (regular ... Intel xeon 2.8GHz with 1GB ram).

version:
Squid Cache: Version 2.6.STABLE3
configure options: '--prefix=/usr/squid' '--exec-prefix=/usr/squid'
'--sysconfdir=/usr/squid/etc' '--enable-snmp'
'--enable-err-languages=English' '--enable-linux-netfilter'
'--enable-dlmalloc' '--enable-async-io=24'
'--enable-storeio=ufs,aufs,null' '--enable-linux-tproxy'
'--enable-gnuregex' '--enable-internal-dns' '--enable-epoll'
'--with-maxfd=32768'

cache server:
* 2x Dual Core AMD Opteron(tm) Processor 270 HE
* 16GB RAM
* 1x SATA 45 GB drive.

Few observations:
---------------------------
* It looks like that after the idle phase the cache just breaks completely.
* The CPU utilization is 100% (even with epoll), during the first
phase it seems to be about 70%.
* The cache.log doesnt give any indication as to what is happening.
Except messages like these.

2006/08/25 15:32:45| squidaio_queue_request: WARNING - Queue congestion.

But these messages showed up during first phase as well and also for
2.5-S9 testing.

* 2.5 Stable-9 performs poorly ( all [hit+miss] req. mean resp. time =
4 sec, normally its abt 1.5 sec ) at 1000 req/s but it doesnt break
the way 2.6-S3 did.
* I have tried the same test for a few times and everytime after the
idle phase the problem starts. In fact, I have reduced the first phase
to 3hrs and still the same thing happens.
* vmstat shows very little free memory, but again that happens very
early in the first phase itself.

* The logs for the polyclts and polysrvs doesnt indicate anything
wrong there, except that they complain about connection reset as shown
below:

268.32| Connection.cc:485: error: 1204/1830647 (s104) Connection reset by peer
268.32| error: raw read failed
268.32| connection to 10.51.6.102:8080 failed after 1 reads, 1 writes, 1 xacts
268.35| i-top2 5015237 393.54 5733 2.36 40 4985

So I am not sure what happens during the idle-phase that everything
break so drastically.
Is it not able to recover or does it break down trying to recover ?

I would appreciate any kind of help. Any suggestions or metrics I
should look for to figure out the problem.

Thanks for your time.

-- Pranav

------------------------------
http://pd.dnsalias.org
Received on Fri Aug 25 2006 - 16:59:16 MDT

This archive was generated by hypermail pre-2.1.9 : Fri Sep 01 2006 - 12:00:02 MDT