ACL based selection of cache_dir

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Wed, 20 Sep 2000 12:06:09 +0200

 What do you think of allowing cache_dir selection to be controlled
 by configuration and specifically ACL's?

 Ideally, we should also be able to select cache_dir based on object
 size, but for now this is not possible.

 This would allow for some quite interesting options.

 Disks are pretty fast at outer areas of spindle, and when seeks are
 limited. Large disks become cheeper, so that it is soon not uncommon
 to have squid boxes with 150GB of disks. Problem is that squid consumes
 alot of ram per object and would need about 2-4GB of ram.
 This can be worked around if average object size in squid is pretty large.
 Yet as large objects are relatively rare, it is desirable to keep them
 separate from typical web objects.
 Or, it is desirable to be able to split large objects and small objects
 into different cache_dirs.

 regex-based acl's are easy to implement to select a cache_dir, and it
 alone is already quite helpful feature.
 We here have cheep local bandwidth and expensive international. I guess
 this is very common. People tend to use cache for everything, and we
 try hard to keep cache from getting polluted by local objects. Yet it
 would be nice to have some store for local objects too, but it should
 be strictly controlled.

 We could have ACL that would match object and place it into separate
 cache_dir of limited size. Or we could configure most hot stuff to be
 placed on a ramdrive.

 We could reserve .com objects a separate cache_dir, and keep the rest
 separate, on less optimal places on disks (inner parts of a spindle)
 Having less frequent references, they have less impact on overall
 performance, yet keeping most hot stuff tightly together we can
 increase performance for most often used stuff.

 Also, on very large disks, (like 73GB), we can force known large
 objects (mp3, zip, exe, cab, etc) to be placed on vast inner area of
 disk, increasing average object size, and keeping those large files
 for longer time. infrequent access to them, and their large size
 reduces performance impact of not being cached by OS filesystem.

 Ideally, we should be able to make difference between objects of
 <16KB and >16KB. This could allow us to force small objects into
 squid-fs that is optimised for small objects (like fifo-fs) and
 let the rest be handled by UFS.
 In ideal case, we wouldn't even need to bother with fragmentation
 of squid-fs, we simply use several fs's with differing block sizes,
 and let squid place objects of optimal size to corresponding FS.
 We could make 1 FS with blocksize of 512b, another with 2KB, one
 more with 8KB, etc. We wouldn't even need to handle subblock
 fragments, multiblock objects, etc. We'd have direct mapping
 between filenumber and object location on disk.

 To implement, we need to buffer at least some amount of object
 before we start swapout. Ideally, this size should be option
 in config file. We should buffer this amount of object in ram,
 and if it exceeds max size, we start swapout to FS that is
 configured to handle large files. If object fits in its full
 into this ram buffer, we can use ACL's to match min/max size
 by which cache_dir selection will be made, and swapout done
 in one shot into an optimal FS.

 comments and critisism welcome.

 I'm planning to implement ACL'based selection, and while looking
 at the code, I'm seeing several things to solve.
 First I'd need to add squid.conf directives, cache_dir_access
 seems logical for ACL's. Currently, FS type is configured on the
 same line as cache_dir. Maybe we should split this, as fifo-fs
 has no L1/L2 configuration, and we might want other different
 config directives depending on FS type. For eg. starting and
 ending disk block for direct-mapped fs, max and min object size,
 etc. Therefore, maybe it is reasonable to add cache_dir_conf
 directive that defines all the specifics for the cache_dir?
 Or should we make configuration directives optional on the
 cache_dir line as in cache_peer configuration?

 Another problem is disk loadsharing. Currently disk selection
 builds a list of cache_dirs and loadshares between them. When
 adding ACL selection to that, should we build a list of matched
 cache_dirs before loadsharing, or after loadsharing list, or
 even both before and after?

 How would placing objects on different cache_dirs of different
 sizes and content interact with replacement policies? I'm planning
 to play with this on 2.3, or should I reconsider and try with 2.4?

 Ability to force objects on different disks could easily result
 in installations with dozens of cache_dirs. How good is squid
 at handling very many cache_dirs?

 thanks,

------------------------------------
 Andres Kroonmaa <andre@online.ee>
 Delfi Online
 Tel: 6501 731, Fax: 6501 708
 Pärnu mnt. 158, Tallinn,
 11317 Estonia
Received on Wed Sep 20 2000 - 04:08:46 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:37 MST