Squid performance - SquidFS

From: Eric Stern <estern@dont-contact.us>
Date: Thu, 27 Aug 1998 05:01:44 -0400 (EDT)

I've been reading the "squid performance wish list" thread for the last
week or so with some interest, especially the parts about squidfs (since
that is my particular area of interest). Lots of interesting stuff.

Some of you might recall that I myself am working on a squidFS patch
(well, vaguely...i'm kinda on hiatus at the moment...but do intend to
return to it). So, I felt it was time to throw in some thoughts.

I think most of us will agree that the current 1-object-per-file method is
definately not the optimum way to do things. Opening and closing files is
just an expensive operation and we have to do it way too much.

No FS you can name handles the way squid works really well, due its nature
of lots of objects coming *and going* all the time. Not even CNFS, since
we have objects being removed from all over the store, not in batches like
news does.

So, using one big file (or going right to a partition) should be a much
better way to do things. This is how my patch works. You create 1 big file
to store all objects in. Each StoreEntry has a new member called "offset"
which is simply the byte offset for that object within the file. You want
to read the object? Seek to "offset" and read x bytes (more than 1 read is
generally required). And, removing an object from the store requires a
single write of 512 bytes (if we weren't worried about crash recovery, we
wouldn't even have to do this).

Now, this more or less works famously. The only real problem is
fragmentation. Again, due to the nature of objects coming and going, there
is no way to avoid fragmentation, its just life. My patch takes steps to
reduce this as much as possible, but it'll never be perfect, and I noticed
severe fragmentation starting to occur in a fairly short time. Now, over a
longer period of time the fragmentation might "level out", but I can't say
for sure.

Someone pointed out that you can perform defragmentation, but that takes
CPU time and disk I/O that might be a signifigant impact on a busy cache.
However, I got to thinking that no cache in the world will be busy to
capacity 24 hours a day. So what I'm thinking is that the defragmentation
can be performed during the "off hours" when we can afford to do it.
So, the only trick here is determining the off hours. The simpliest way
would be to just add it to the config and let the admin set it.
ie
off_hours_start 1am
off_hours_end 9am

Another method would be to either measure internally or let the admin set
a "peak load" in requests/min. Since we track that statistic interally, we
can compare.
if (current_load < busy_load)
        storeDefrag();
Then, busy_load can be set in the config, or perhaps something like
busy_load = peak_load / 2;

I think this would be better than defining off hours, since you could take
advantage of a lull in the middle of the day to get some defragmenting
done. I'm thinking there is *at least* 12 hours a day when it wouldn't
impact response time very much if we did some defragmenting.
On a machine that is severly overloaded, this process might fall apart.
But then, if a machine is THAT overloaded, what the hell can you do
anyways?

Defragmenting can be done by either actually physically moving data, or
maybe just releasing some objects. This could probably be decided
semi-intelligently by examining the object in question and seeing if it
might be a good candidate for releasing (ie if its been in the cache a
while, and has lots of hits, we probably don't want to release it).

/-----------------------------------------------------------------------/
/ Eric Stern - PacketStorm Technologies - (519) 837-0824 /
/ http://www.packetstorm.on.ca /
/ WebSpeed - a transparent web caching server - available now! /
/-----------------------------------------------------------------------/
Received on Tue Jul 29 2003 - 13:15:53 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:53 MST