Re: Squid performance wish-list

From: Stephen R. van den Berg <srb@dont-contact.us>
Date: Mon, 24 Aug 1998 13:45:01 +0200

Kevin Littlejohn wrote:
>Stephen van den Berg wrote:
>> Stewart Foster wrote:
>> >3584 + 63 * 8192 + 64 * (8192 / 4) * 8192 = 1.0005 GB (big enough for any
>> >conceivable caching purposes for the next 10 years I'm guessing).

>> This I find dangerously low. Think everyone having 34Mb/s connections
>> and everyone pulling in MPEG movies.

>I guess the question is, are you going to want to dedicate more than 1Gb of
>your on-disk space to cacheing a single item?

When you have 500GB, you probably will. In ten years, this is likely,
very likely.

>> On another note, what is to be gained by using direct and indirect
>> block pointers?

>Heh - that was one of my questions :) If you have direct links, you don't
>have to chain through each file on deletion to find the blocks that need
>free'ing up - you just grab the block numbers from the inode.

Ok, unlink speed goes up. OTOH, unlink is non-critical and can be
done by a background process which squeezes in its accesses in the
elevator queueing mechanism. *And*, since most files consist of
only one or at most two chunks, this is non-critical for most unlink
operations.

>> How about a much simpler scheme, a linked list like:
>>
>> Say, 1KB block sizes, chunks consist of one or more blocks,
>> whereas every chunk starts with:
>>
>> [ 1-byte: misc flags (reserved) ]
>> [ 3-bytes: length of this chunk in bytes ]
>> [ 4-bytes: blocknumber of the next chunk in the chain ]
>> [ 1016-bytes: data for this chunk ]
>> [ 1024-bytes: optional data to extend this chunk (no header) ]

>One loss with this is the 50-odd% items that are within that first read. If
>you can manage to cram close to 4K data into the inode, you cut most of your
>off-disk serves to a single disk access.

Ummm..., the _inode_ in this case, is the 8-byte header at the front
of the first chunk part of a file. I.e. we can, and we will be doing
only one disk access on the majority of the files. Better still,
as soon as Linux 2.2 becomes mainstream, we'll be able to take advantage
of the copyfd() function and do single-system-call flushes from
disk to net and from net to disk with sizes up to 16MB without
intervention from squid.

>The other question that arose was fragmentation - I _think_ we decided
>that was better handled with bigger blocks, but Stew probably has more of
>that documentation in front of him :)

There is blocks, and there is chunks. In the FS I propose, the blocks
are just the allocation size, the chunks make up the files. I think
it will be fairly easy to expirement with different block sizes and
pick the one that gives acceptable fragmentation and good disk utilisation.

>The basic aim was to serve files in as few disk accesses as possible.
>Removing the filename->inode lookup is a big part of that, then cramming

The FS I propose, does that. There are no separate inodes. We only
have the squid-maintained store-entries, which point directly at the
block of their choice where his chunk starts.

>data in with each inode (if you can get ~4K in) is another big win. The

In the chunk system, you can extend this initial chunk to be as long as you
want (much longer than 4KB, up till 16MB if you like), if you can get the
space allocated.

>next one was to put direct links in for as much as possible to speed up
>file deletion.

File deletion is a steady background process, not exactly high-priority
or highly latency dependent.

-- 
Sincerely,                                                          srb@cuci.nl
           Stephen R. van den Berg (AKA BuGless).
This signature third word omitted, yet is comprehensible.
Received on Tue Jul 29 2003 - 13:15:52 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:52 MST