Re: [squid-users] Squid vs httpd mod_cache

From: Elli Albek <elli_at_sustainlane.com>
Date: Tue, 25 Nov 2008 02:52:02 -0800 (PST)

I found the online help, especially the wiki top notch. Don’t know about the book, but if it does not cover quid 2.6 and up then I would not get it, even though most things probably still apply. The reverse proxy configuration changed in 2.6 and I guess that's what you need most.

In terms of disk, you can limit the maximum disk size in the configuration and that’s about all you need to do. The directory structure is two subdirectories deep, maybe there is another configuration (there is more than one file system implementation in squid), this is the default.

In terms of slow file system, what file system do you use? We use riser for another server that does something similar on the file system (images directory) and its fast.

----- Original Message -----
From: Neil Gunton <neil_at_nilspace.com>
To: squid <squid-users_at_squid-cache.org>
Sent: Mon, 24 Nov 2008 11:25:35 -0800 (PST)
Subject: [squid-users] Squid vs httpd mod_cache

Hi all,

I'm running a LAMP community website (Debian Lenny, Apache 2.2.9, MySQL,
mod_perl) which gets around 100,000 page requests per day. I currently
use two builds of apache - one lightweight front end caching reverse
proxy, and a heavy back-end mod_perl. This worked well for years while I
was using Apache 1.3, since I was using Igor Sysoev's mod_accel and
mod_deflate modules to do the reverse proxy and caching. Now I have
upgraded to Apache 2.2, I can't use his modules any more, so I've been
trying to use the stock mod_cache. The server is a dual Opteron 265
(i.e. 4 cores), 4GB RAM, 4x10k SCSI drives in RAID0 (I know it's risky,
buy I need the space and performance, and backup is instantaneous with
MySQL replication).

Everything's working fine, mostly, but I'm having some issues with the
cache management. In a nutshell, htcacheclean just doesn't seem to be
able to keep up with managing the cache pruning (i.e. keeping it down to
a reasonable size). If I run htcacheclean in cron mode, then it takes
hours to complete its run, and while running it hogs the disks and
produces big iowait times. If I run it in daemon mode, then it just sits
there and produces about half the iowait (if I run with the -n "nice"
option), in which case it just isn't keeping up with the cache growth.

I'm concerned about the cache structure - it's a 3-level directory, and
it seems to take a long time just to traverse it. Even doing a simple du
on it seems to take forever, currently about 3 hours or more, and that's
for about 10GB of cache. I'd prefer to keep the cache down to more like
1GB at the most. In fact, that's what I have htcacheclean set to -
1000MB. But it doesn't seem to be doing the job.

I've been asking around the Apache and mod_perl lists about ways to
improve this. Someone suggested using Squid instead. So here I am - I've
never used Squid, mostly because I always used Apache and really need
the mod_rewrite capabilities for doing things like blocking image
hotlinking from other sites. I really need a front-end reverse proxy
that has capability to do access control stuff like this, as well as
redirects for old content etc - you know, all the things you can do with
mod_rewrite. I really don't want to have to pass all that back to the
mod_perl processes.

I would like to know how good Squid's cache management (i.e. pruning)
is. I get the impression that mod_cache in Apache 2.2 is not very mature
- some of the cache management features don't even seem to be
implemented yet. I assume that Squid is a much more mature product, and
thus I'd hope that it has cache management pretty much down pat.

How does Squid manage its disk cache? Does it consume a lot of disk io
when doing it?

Has anybody else here migrated from using Apache's mod_cache to Squid,
and if so do you have any insights?

Lastly, if I do decide to use Squid, is the O'Reilly book from 2004
still relevant, or is it out of date now? I know there's a lot of stuff
online, but I like to have a handy book reference, plus a well-written
book often has a good intro to the tool. This book seems to get only
5-star reviews on Amazon. Is it still up to date?

Thanks in advance,

Neil
Received on Tue Nov 25 2008 - 10:52:06 MST

This archive was generated by hypermail 2.2.0 : Tue Nov 25 2008 - 12:00:03 MST