Re: unresponsive cache and stats with a loaded cache

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Mon, 10 Nov 1997 14:15:18 +0200 (EETDST)

--MimeMultipartBoundary
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT

Date sent: Fri, 7 Nov 1997 15:08:24 +0200
From: Oskar Pearson <oskar@is.co.za>
To: squid-dev@nlanr.net

> One of my caches just became quite unresponsive - here is
> what a strace -f -c revealed... (It's not for a long period of time,
> though, since I am afraid it's lagging squid)
>
> cache1:
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 27.02 3.412204 131 26048 oldselect
> 21.74 2.745975 1281 2143 open
> 19.03 2.403282 261 9223 3 write
> 11.46 1.447688 236 6126 61 read
> 0.32 0.040916 1137 36 getdents
> 0.11 0.014438 1604 9 stat
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 12.630594 93117 1739 total
>
>
> It appears that people are right about the 'open' being a
> problem... Why are we calling 'getdents' anyway?

 Once I chased the way squid uses its L1/L2 directory structures, and
 it appeared to me that excessive amount of directories slows things
 down quite a bit. I don't recall exactly, but I have an impression that
 squid goes through every directory, before returning to any one for a
 second time. On the way squid alters both L1 and L2 dirs in way to
 distribute load evenly between disks, L1 and L2 directories on each disk.

 Whats bad news with it is that Unix FS tries to cache directory entries
 to speedup file searches (open) within directory and this kind of pass-through
 effectively busts to nothing unix's directory caching. By the time squid
 returns to once-used directory, quite alot of time passes and directory
 data is not in cache. By default squid creates 4096 dirs per disk, and
 with full cache each dir is about 4KB size. To keep all the dir cache
 in ram you'd need to have 16MB of spare ram per each disk which is very
 rarely the case.
 Thus, I believe that with every single file open-for-write squid has a
 directory cache miss and needs to do physical disk io for directory.
 open-for-read is random in nature and this cannot be very much optimized.
 All this results in the reasoning for the suggestion in squid.1.1.relnotes,
 you'd want to have the minimum amount of directories needed to hold max
 possible number of objects on your disks. Also, you'd want to increase
 DNLC cache and have lots of unused ram to make it possible for unix to
 cache directory io. Also, disabling inode-accesstime-updates and
 sync mode for dir-io helps alot, but you know what you are risking here...

 Personally, I'd suggest to change squid's store logic a bit in a way that
 it would always use smallest numbered file available, instead of distributing
 load between L1/L2 dirs. Clearly, distributing load between disks is another
 issue and doesn't add very much to dir-cache load. Having fixed number
 of files per any directory there is no speed-loss, but for caches that have
 100 times more directories created than there ever could be objects in
 them it is IMHO more reasonable to have few directorys full of 256 objects
 rather than have zillions of dirs with few files in each.
 Possibly allows to resize L1/L2 up when needed on the fly and does not use
 directories that are created without a need.
 It would be also lovely if there was separate filemap for each disk and its
 corresponding swapindex file was written to its cache_dir root. This would
 allow to tolerate failed disks and allows to add/remove disk space on the fly.
 Also it would allow to distribute load between disks more precisely, based
 on actual historical byte/sec io if you like.

 ----------------------------------------------------------------------
  Andres Kroonmaa mail: andre@online.ee
  Network Manager
  Organization: MicroLink Online Tel: 6308 909
  Tallinn, Sakala 19 Pho: +372 6308 909
  Estonia, EE0001 http://www.online.ee Fax: +372 6308 901
 ----------------------------------------------------------------------

--MimeMultipartBoundary--
Received on Tue Jul 29 2003 - 13:15:44 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:29 MST