Re: commloops cache digest code works

From: Joe Cooper <joe@dont-contact.us>
Date: Sun, 24 Feb 2002 17:19:39 -0600

The performance change may all be in my head. As I said, the box has
evolved since I last ran any serious benchmarks on it. It has twice as
much RAM, but twice as much crap running on it and I figured the lack of
a second disk would have a negative impact as well. But we might just
be seeing the improvement that 256MB of extra RAM can give (and load
shedding capability that 2.2STABLE5 didn't have--it was keep up or crash
and burn back then--these days those little spikes up to 3 sec avg
response time do not mean a death spiral as they always do in
2.2STABLE5+hno--so it was impossible to push it as hard).

Stability is the primary purpose of this testing, and on that count it
is definitely very solid. I forgot that recent Polygraph versions don't
play nice with the datacomm-1 workload and never shutdown--and you may
recall that the working set in datacomm-1 grows infinitely throughout
the test. So at 11.92 hours into the test, the box is still holding
together very nicely--14.28GB is the current fill size (and we've got a
single 4GB cache_dir). Hits at this point are about what one would
expect from a too small cache_dir for the test (35%, roughly). Error
count is a lovely '2'. Yep, two errors in twelve hours of too much load.

So, it looks very good so far. We're still running after nearly twelve
hours, response is still good, and the underlying OS is still usable
(not usable for much--CPU is at 2-4% idle). I've hit no weirdness on
the client side either.

I guess I should be poking at range request tickling things to really
know if someting got broken, correct? (I know range handling has gone
away temporarily, correct? So we always get the whole object, or do we
not cache range requests at all and pass them through?)

I'm going to see about turning on the checksums in Polygraph to insure
that data is coming through uncorrupted. My measly three or four
browsing requests per minute won't necessarily show up any bugs on that
count, but maybe a few thousand every hour will.

Adrian Chadd wrote:

> On Sun, Feb 24, 2002, Joe Cooper wrote:
>
>
>>>Anyway, we're half an hour into a four hour sprint. So far, so good
>>>(hits are fine too, at this point).
>>>
>>2.5 hours+ at 100. No problems, no serious complaints. Hits drop off
>>for a few seconds every once in a while, response times jump a lot (from
>>1.5 secs to ~3 secs--not like minutes or anything) every once in a
>>while, but overall very very solid. And it's probably measurably faster
>>than even 2.2STABLE5+hno. I'll have to run it on a box for which I have
>>some specific benchmark data, but I don't think I've seen a single disk
>>450MHz box (with a bunch of crap running on it) maintain 100reqs/sec for
>>over two hours with no hit rate degradation or horrible response times.
>> I think you've got a winner, Adrian. CPU usage might be lower also,
>>it seems like it is, in general--I'll have to try it with two disks to
>>see what happens (I've never been able to push a box hard enough with a
>>single disk to saturate the CPU, so I don't really know yet).
>>
>>I'll keep using it as my local proxy for the time being, watching for
>>any smelly bits. I'm off to sleep for a few hours, but when I wake I'll
>>fire up another run at a higher rate with two disks for Squid.
>>
>
> Ok, cute. It shouldn't be performing quicker - although its slightly
> more efficient in its copies, I didn't think it would be very measurably
> faster until a few more things were killed.
>
> Thanks for the testing Joe. And thanks for the testing before Kinkie. :-)
>
>
>
>
> Adrian
>
>
>
>

-- 
Joe Cooper <joe@swelltech.com>
http://www.swelltech.com
Web Caching Appliances and Support
Received on Sun Feb 24 2002 - 16:20:22 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:14:48 MST