Re: Do not make a compatible squid file system from Eric Stern on 2000-02-02 (squid-dev)

From: Eric Stern <estern@dont-contact.us>
Date: Wed, 2 Feb 2000 17:40:05 -0500 (EST)

On Wed, 2 Feb 2000, Henrik Nordstrom wrote:

> A bit. However, to turn Squid into a really high performance proxy more
> changes than only the object store is required. There are also very
> obvious bottlenecks in the networking and data forwarding.

Could you be a bit more specific? From a commercial point of view I'm
interested in making squid perform better. For now I'm working on the
filesystem, but after that I'll go after other areas. I was going to do
some profiling to find the problem areas, but if you already know what
they are it'll save me a bunch of time. :)

> Have a large can of ideas on how to make a cache object store. The
> cyclic one is only one of the designs. In it's purest circular form it
> would require a lot of disk to maintain a good hit ratio, however with
> some minor modifications and wasting some disk space you can still
> preserve most of the properties of LRU, which should make it more
> economic in terms of disk. The disk storage is cyclic, but object
> lifetime more like LRU.

I think I know what you are getting at here. In a pure cyclic filesystem,
a well used object that is staying in the cache will get overwritten as
the writing cycle comes around. The obvious solution here is to move the
object back to the beginning of the store every time it is hit, thus
keeping the whole store in LRU order. The downside here is all the extra
writes involved for each hit.

I've combated this in COSS two ways. First of all, writes are combined.
COSS keeps large memory buffers (~1MB). As new objects are added to the
store (or hits moved to the front), they are added to a membuffer. When
the membuffer overflows, it is written to disk and a new membuffer
created. Thus, we can conclude that each 100 hits will only result in 1
extra disk write, which isn't too bad. (This is already implemented). We can
combat this further by deciding that a hit object will only be moved to
the front of the store if it is within the "top" 50% of the store. We
assume that if it is within the top 50%, its not in immediate danger of
being overwritten and we can safely leave it there for a while. (This part
hasn't been done yet.)

Honestly, I can't imagine anything that could be more effecient than COSS.

- disk utilization is nearly 100%. No inodes, or other directory crap, 0%
fragmentation (all objects are stored 100% contiguous, one after another).
The only "wasted" space would be objects that get expired by age rather
than being overwritten. That creates a little bubble that is wasted, until
the cycle comes around and overwrites it.
- worst read case is 1 seek/read per hit (COSS reads the entire object
into memory at once). Actually, i guess this isn't worst case, its the
ONLY case.
- typical write case is 1 seek/write per 100 objects (assuming membuffer
size of 1MB and 10k average object size).

I did a little math today comparing UFS to COSS. This is based on 10k avg.
obj size, 40% hit rate and 10000 requests.

UFS:
        - number of seeks: 25000
        - number of reads: 12000
        - number of writes:13000

COSS:
        - number of seeks: 4600
        - number of reads: 4000
        - number of writes: 600

I did that from memory, I think its right.

Yes, COSS does use more memory. Its a tradeoff, and I am more interested
in obtaining performance than saving memory. Its not that bad anyways, in
some informal tests with polygraph to compare UFS vs COSS, the squid
process hovered near 15MB with UFS, vs 22 MB with COSS.

/-----------------------------------------------------------------------/
/ Eric Stern - Industrial Code & Logic Inc. - (519) 249-0508 /
/ http://www.indcl.com /
/ WebSpeedWare - web caching doesn't have to be expensive! /
/-----------------------------------------------------------------------/
Received on Wed Feb 02 2000 - 15:50:47 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:21 MST