Re: unresponsive cache and stats with a loaded cache

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Tue, 18 Nov 1997 14:06:38 +0200 (EETDST)

--MimeMultipartBoundary
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT

> > >> Once I chased the way squid uses its L1/L2 directory structures, and
> > >> it appeared to me that excessive amount of directories slows things
> > >> down quite a bit. I don't recall exactly, but I have an impression that
>
> > >that a configurable number of files (default 512, maybe a little too high)
> > >are written to a directory before going to the next one. Alternating
> > >among all cache_dirs is still done, too.
> >
> > I decided to apply the patch on a real production cache this weekend and
> > have gathered one day of business-hours (08:00h-18:00h) usage data (data
> > has been gathered with my Squid timer patch). The results are quite good
> > if you compare the results of last Friday and today (Monday):
> .
> .
> .
> > open(2)'s for write have dropped from 35ms to 5ms, open(2) for read had
> > dropped from 24ms to 8ms, read(2)'s have dropped from 12ms to 8ms. This
> > all leads to 46.3% idle time, i.e. Squid is waiting in select(2) for
> > something to happen. Waiting for disk I/O (diskr/w+openr/w) has dropped
> > from 68.37% to 25.93%.
>
> A question though - isn't this only going to be useful on a
> new,empty cache (or a cache with only a few files in it).
>
> Given a totally full cache you are still going to get the original
> distribution, aren't you? So for a cache like ours (totall stuffed full
> the whole time) you wouldn't see any real benefit...

 Depends on what you mean by full. There are two distinct meanings for it:
 full by volume, and full by filecount. When you distribute files from start
 on all precreated dirs, then when you reach full by volume, you may have very
 many unfilled dirs and already start to reuse files. If you start filling
 dirs in a row, by the time you reach full by volume, you have used only
 fixed number of dirs, and after starting to reuse files, you operate with
 much smaller number of dirs, even if there are created many more.

 I believe that randomness of allocating files was introduced when there was
 no fixed number of files per dir, then it was the only way to guarantee that
 no single directory had excessive amount of files in it. After fixed dir
 structure was introduced, this behaviour actually became limiting factor, IMHO.

 Squid currently uses bit-array for fileno allocation. its size is fixed to
 accomodate 2^21 (2M) files. If using always smallest free fileno, and mapping
 this to filename in fixed manner, we have fixed number of dirnames to operate
 with and a feature that if our cache size could hold only some fraction of 2M
 files, we simply don't reach larger dirs, instead, we'll start reusing old
 files. Possibly, we don't need unlinkd at all, instead we just overwrite old
 files with new data. LRU just frees "right" files for us. We'd need to unlink
 only to cleanup very rarely used dirs. We also do not need to create all
 the dirs beforehand, we can create them on the fly as needed. We'd need to only
 in case of empty cache and if average object size in cache drops and we can
 accomodate more objects, in any case, this would be relatively rarely.

 Optimal structure seems to be to take MaxItemsPerDir as function of optimal
 disk block size (making sure that MaxItems fit in single block), create 1 L1
 dir and 1 L2 dir and start working. increment L2 as needed and when reaching
 MaxItemsPerDir then go L1++.
 Simply put, user should provide only no of cache_dirs and disk blocksize,
 everything else is a job for squid.

 Eventually we'd end up lowest numbered dirs to be a hotspot on any cache_dir,
 each L1/L2 dir filled optimally and have minimal no of directories overall,
 which is good for dir caching.

 There not so much problem with the algoritm of mapping fileno to filename,
 rather with the fileno/filename usage pattern from OS-es point of view.

 ----------------------------------------------------------------------
  Andres Kroonmaa mail: andre@online.ee
  Network Manager
  Organization: MicroLink Online Tel: 6308 909
  Tallinn, Sakala 19 Pho: +372 6308 909
  Estonia, EE0001 http://www.online.ee Fax: +372 6308 901
 ----------------------------------------------------------------------

--MimeMultipartBoundary--
Received on Tue Jul 29 2003 - 13:15:44 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:30 MST