Do not make a compatible squid file system

From: wang_daqing <wang_daqing@dont-contact.us>
Date: Wed, 2 Feb 2000 12:42:24 +0800

This message may be send more than once for ircache mail system have
Anti-SPAM feature and block mails from some smtp server. So I had to
use several smtp server to send and I am still not sure whether it
reached.

If it's disturb you, I feel sorry! but it's not my fault. I also
suggest the mail list administrator add a auto replay feature in mail
list system.

Hello Squid,

  I have noticed that in this year bake-off the squid is slowly than
  other product. Although this compare must be unfair. But Wessels
  said that squid can do better if there are no bottleneck of the
  filesystem. Wessels also said they are working on a new filesystem
  but still remaining compatible with the Unix filesystem. I don't
  think it's a good idea. The question is who need it? I mean a new
  filesystem compatible with UNIX filesystem like VFS.

  As we know, a cache object storage does not need even filename or
  URL to access the cached object, does not need users,groups and ACL
  control, don't use file last modify date(only use cache object last
  modified date), and does not need about everything except file size
  in other filesystem. So why you want implement a storage system
  compatible with UNIX filesystem or any other filesystem? What the
  benefit it is.

  In my opinion, I think the best way is to create a Cache Object
  Storage System or called "Hash File System" directly on device file.
  (Although it's still a filesystem, but not similar to any existing
  filesystem)

  A cache storage system behavior is very different with normal
  filesystem. First, it did not need a file name or URL to open a
  cached object. Currently you use MD5 hash key (although I think so
  complex method is not necessary). So you don't need normal directory
  structure or directory tree. Just seek the directory item position
  by the hash key. Secondary, before you save the cache object to
  disk, most time you know the file size already, so you can allocate
  the disk block as continuous as possible. There will be less pieces
  in disk. You don't need a i-node for files, just use a pointer to
  data and a flag (indicate whether store in one piece), if it stores
  in several pieces, add a node table just before the file (cache
  object) to point rest pieces. If this table size is not enough, then
  use a chain. A separate i-node is not necessary for you never
  randomize access the cache file and usually only read from begin.

  In my imagination, a Cache Storage System should have a head area, a
  bit FAT area, and a data area. The directories numbers must be 2^x
  type like 1024,2048,4096... so you can determine open which easily
  by key, and directories is average distribute in entire data area.
  Directory items is indexed by hash key, and contains meta data of
  cache object.(I wish you can reduce or compress meda data size).
  Each file in directories will allocate the disk blocks as continuous
  as possible unless it's too big (So continuous is not important) and
  near the directories contain it as possible like in ext2fs. Each
  disk block may be 512 bytes to 16K, I think 1k,2K or 4K is better
  now. Each directory must be large than 4K may be 4K to 32K or
  more(until read whole directory take more additional time). The
  directory size needn't be 4K*x, but may allocate more than it need
  to prevent directory full.

  I have calculated the probability of file numbers in directory, it
  is in Gauss distribute(may be the mathematics term is wrong for I am
  speaking Chinese). If the average items numbers in one directory is
  N, then the possibility of the number > N+3*sqrt(N) is 0.00135,
>N+4*sqrt(N) is about 3E-5, >N+5*sqrt(N) is about 3E-7, >N+6*sqrt(N)
  is about 1E-9, so if N=100, and you allocate the space for 200, the
  possibility exceed the limit is very very low. If it is exceed, you
  can move the file just after the directory to another place (divide
  the file into pieces if it's large) to allocate more blocks for
  directory or simply remove the oldest file in this directory if the
  file just after the directory is not in this directory.

  Before you create the Cache Storage System, you need estimate how
  many objects will be in this system. It's depend the disk (or
  partition) size and mean object size. I think use mean object size
  7.5K to estimate is conservative. The mean object size is depend on
  maximum_object_size and how to use the cache. For example in my
  cache the Mean Object Size is only 7.17 KB for it is only allow used
  for browse purpose. If the maximum_object_size is limited to 400K,
  mean object size may be only 5.5K. Then depend on object size and
  directory size decide need how many directories. The directories
  size too large will cause sort slowly(?) and read slowly (if not
  cache all directories in memory), too small will cause low cache
  memory utilization.

  Of course you may use more than one disks. Just create Storage
  System cross several disks as one disk. But each disk should have a
  head,a bit FAT and data area for safe reason. And allocate file
  blocks in same disk (with directory) as possible. If you lost one
  disk, you may lost cached objects more than one disk, but not be too
  much. Then you may need use a utility to redistribute directory and
  recalculate the bit FAT. If you add a disk, the step is similar. For
  this reason, may be you need a root directory point to all the hash
  directory and record it's position (not seek by calculate), items
  and blocks. (The bit FAT and root directory should be always
  resident in memory), then add and remove disk will be quickly (may
  be redistribute the directories online). You can create some utility
  to import or export cache object to other filesystem. This work may
  be difficult if you have only one disk when convert unless you
  borrow a disk or use network for temporary, but a new compatible
  filesystem is just same.

  When you accept the request from browser, first generate the key,
  then calculate the directory it belong to and visit it. Get the
  directory items (also meta data) and determine whether to visit
  original web server (depend on request URL and 3 dates). The
  directory items is sorted by key, if 2 URL has same key, just put it
  together, so you didn't need 128 bit for key, I think 8 bytes is
  enough. If the key is wrong to match the request URL, what will you
  lose? Just the time.(I assume store the URL in file head before
  content and verify the URL is correct before give it to customer).
  There are also may be some flag in directory items, like negative(no
  cache object),partly(downloading),continuous (disk allocate in one
  pieces). and flag for heap replacement like: querypage(contain
  /cgi-bin/ ?),cgi(.asp,.cgi,.php3,.exe?) etc. (I think these dynamics
  page should be replace first)

  In this discussion I mean no swap.state file contain all cached
  object meta data. And not load all the meta data to memory until you
  have enough memory. It works like Novell BorderManager. if someone
  don't have enough memory, a request need average 1.5 read(50% hit),
  may be slowly 3 times than cache all directories in memory. Of
  course you had to mange the cache you self (I guess UNIX does not
  cache the device file for you) and the cache of directories will
  have more high priority than files for the possibility of reuse
  these files is very low.

  I think directory item (meta data) 32 bytes is enough. I mean 8
  bytes hash key,4 bytes file pointer,4 bytes file size(I don't think
  you should cache a object large than 2G, although may be you have
  this capacity, but it's not the right way to do things I mean use
  cache), content-length may be store in file head because you don't
  need it before give the object to browser(It depend on your IMS
  policy check or not check content-length) or use one byte store the
  file head size (*8 or *16) to calculate content-length, I think the
  file head can not be too large. 4 bytes object(not file) last
  modified date,4 bytes expires,4 bytes last validation date and 4
  byte for flags and etc. if you think 4 byte is not enough to store
  date, Since squid is use minutes to calculate refresh policy, I
  think compress is easy and will not reduce any performance. I know
  you must have your reason to use 74 bytes. But I think store all
  meta data in memory really waste memory for some part of it may be
  never used. May be you can use some technical like cache digest to
  compress it and load in memory.

  I think if you build the Storage System this way. You only need read
  once from disk if the cached object is valid. So no one can make a
  system quickly than you 20%(may be 5%) unless they store object only
  in memory. And any people can run squid with a large disk and don't
  worry not have enough memory to run it. It will be suitable for low
  load cache work together with other services.

  Back to begin, the question is who need a compatible filesystem?
  First the squid will take no advantage from compatible, it does not
  need filename,directory tree,randomize access ability and file grow
  ability(which need a i-node). Second who need a SquidFS while
  running other programs since there are so may filesystem, I wonder
  even a web server will use it. Then the only advantage is that if
  the cache is not full, people can store some other files in it. But
  those people who most want SquidFS must be in heavily load and I
  think their cache must be already full, to other people I worry a
  new filesystem may be difficult to install and I think a
  non-compatible system can use utility add or reduce it's size in
  disk to meet this need. Although it's may be more difficult than in
  a compatible filesystem. But running squid never be a easily work
  like other GUI program although you have do many work on it.

  Although I am a programer(mainly in C++). And I don't know UNIX too
  much. But I think every one want his program be the best. What will
  you think about? If I am wrong or you think so, please tell me.

  Someone is talking about a cyclic filesystem, I wonder if a heavily
  load cache mean object life is less than a week, who can take
  advantage from it. It's only useful for people who have very very
  large disks.

Best regards,
 Wang_daqing mailto:wang_daqing@163.net
Received on Wed Feb 02 2000 - 07:07:50 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:21 MST