Re: Can I *not* have an on-disk cache?

From: Scott Hess <scott@dont-contact.us>
Date: Tue, 13 Jul 1999 12:53:31 -0700

Clifton Royston <cliftonr@lava.net> wrote:
> Steve Willer writes:
> > On Tue, 13 Jul 1999, Scott Hess wrote:
> > > At worst, put the cache on a ramdisk...
> >
> > Well, it's an interesting idea, but currently the kernel is my
bottleneck.
> > Not the disk. Putting the files in ramdisk still involves system calls,
> > path parsing, etc. It would probably be a bit better, but I was really
> > hoping for a way to avoid system calls entirely in this case.
>
> I'd almost be prepared to bet you're wrong on this. Synchronizing
> blocks to the disk, and the disk writes associated, and the
> calculations associated, are very likely dominating over the system
> calls.

I'd lean that way, also, though it's obviously hard to make such a statement
without seeing the system in question. I do know that I've pushed in excess
of 100 pages/second through a squid setup on one of our machines (P-III 450
w/512M, fast SCSI disks and controllers, mostly dynamic pages), which works
out to perhaps 3.6M pages/day, given how web site usage patterns are (9am
and 4pm peaks, most access between 6am and 6pm, PDT). I didn't even spend
much in tuning (my first target would have been to tell Squid to not bother
caching anything but images and downloads). In any case, I'd have put in
multiple squid boxes on a load balancer long before reaching that point...

It would probably be worth trying a different OS. We're using FreeBSD3.1
and 3.2, and it seems to work well. I like Linux, also, but have never
tried to see how far Squid would go on Linux.

> > Small rant: I've been a bit frustrated over the apparently inflexibility
> > in some portions of Squid. Why is it that we _must_ have an access_log,
> > for example? I could write to /dev/null, but Squid is still going to
build
> > the log line in its buffer and make the kernel calls to output to the
log.
> > Also, why is it necessary that I have an on-disk cache? Surely there are
> > others who are caching very small amounts of data but for whom
performance
> > is critical...what about us?
>
> It is, after all, as the authors have pointed out, a free product
> designed primarily for research, even if lots of people are using it to
> do serious work. If you really want to get into it, you could always
> #ifdef the access log code in the source, or add a special-case check
> for "none" as done with store.log.
>
> Or optimize the algorithms for main RAM storage - I admit I'm still
> shaken by the revelation that the code for Squid will keep cached in
> main RAM is entirely and admittedly sub-optimal. I think that explains
> a lot of performance bottlenecks there.

What you'd really want in this case is not a RAM-optimized cache, but a
RAM-optimized accelerator. Everything I've ever heard or seen on web access
patterns is that temporal locality is not nearly good enough to make a
RAM-based cache worthwhile, except for very specific access patterns. In a
CPU cache, you expect to hit the cache in excess of 90% of the time - with a
Squid cache, you're talking more like 30% of the time.

> If you wanted to get fancy, you could even add some scripts to copy the
> RAM disk off to a disk partition at Squid shutdown, and back at boot
> time.

Eh, probably not worth that much, given the types of error situations you
could get into - besides, you generally don't expect to shutdown, much less
in a controlled fashion!

A similar alternative to the ramdisk would be to fully enable soft updates
on the Squid cache partition, and then do squid -z on every startup (or,
potentially, only on those startups without a sane shutdown involved). That
would hopefully minimize the amount of time spent waiting for seeks due to
write ordering.

Later,
scott
Received on Tue Jul 13 1999 - 13:32:52 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:47:22 MST