[squid-users] Squid vs httpd mod_cache

From: Neil Gunton <neil_at_nilspace.com>
Date: Mon, 24 Nov 2008 11:25:35 -0800

Hi all,

I'm running a LAMP community website (Debian Lenny, Apache 2.2.9, MySQL,
mod_perl) which gets around 100,000 page requests per day. I currently
use two builds of apache - one lightweight front end caching reverse
proxy, and a heavy back-end mod_perl. This worked well for years while I
was using Apache 1.3, since I was using Igor Sysoev's mod_accel and
mod_deflate modules to do the reverse proxy and caching. Now I have
upgraded to Apache 2.2, I can't use his modules any more, so I've been
trying to use the stock mod_cache. The server is a dual Opteron 265
(i.e. 4 cores), 4GB RAM, 4x10k SCSI drives in RAID0 (I know it's risky,
buy I need the space and performance, and backup is instantaneous with
MySQL replication).

Everything's working fine, mostly, but I'm having some issues with the
cache management. In a nutshell, htcacheclean just doesn't seem to be
able to keep up with managing the cache pruning (i.e. keeping it down to
a reasonable size). If I run htcacheclean in cron mode, then it takes
hours to complete its run, and while running it hogs the disks and
produces big iowait times. If I run it in daemon mode, then it just sits
there and produces about half the iowait (if I run with the -n "nice"
option), in which case it just isn't keeping up with the cache growth.

I'm concerned about the cache structure - it's a 3-level directory, and
it seems to take a long time just to traverse it. Even doing a simple du
on it seems to take forever, currently about 3 hours or more, and that's
for about 10GB of cache. I'd prefer to keep the cache down to more like
1GB at the most. In fact, that's what I have htcacheclean set to -
1000MB. But it doesn't seem to be doing the job.

I've been asking around the Apache and mod_perl lists about ways to
improve this. Someone suggested using Squid instead. So here I am - I've
never used Squid, mostly because I always used Apache and really need
the mod_rewrite capabilities for doing things like blocking image
hotlinking from other sites. I really need a front-end reverse proxy
that has capability to do access control stuff like this, as well as
redirects for old content etc - you know, all the things you can do with
mod_rewrite. I really don't want to have to pass all that back to the
mod_perl processes.

I would like to know how good Squid's cache management (i.e. pruning)
is. I get the impression that mod_cache in Apache 2.2 is not very mature
- some of the cache management features don't even seem to be
implemented yet. I assume that Squid is a much more mature product, and
thus I'd hope that it has cache management pretty much down pat.

How does Squid manage its disk cache? Does it consume a lot of disk io
when doing it?

Has anybody else here migrated from using Apache's mod_cache to Squid,
and if so do you have any insights?

Lastly, if I do decide to use Squid, is the O'Reilly book from 2004
still relevant, or is it out of date now? I know there's a lot of stuff
online, but I like to have a handy book reference, plus a well-written
book often has a good intro to the tool. This book seems to get only
5-star reviews on Amazon. Is it still up to date?

Thanks in advance,

Received on Mon Nov 24 2008 - 19:25:37 MST

This archive was generated by hypermail 2.2.0 : Wed Nov 26 2008 - 12:00:03 MST