Re: Do not make a compatible squid file system

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Wed, 02 Feb 2000 20:40:41 +0100

wang_daqing wrote:

> I have noticed that in this year bake-off the squid is slowly than
> other product. Although this compare must be unfair.

Perhaps. I haven't looked at the results yet (didn't know the results
was available yet..)

> But Wessels said that squid can do better if there are no
> bottleneck of the filesystem.

A bit. However, to turn Squid into a really high performance proxy more
changes than only the object store is required. There are also very
obvious bottlenecks in the networking and data forwarding.

> Wessels also said they are working on a new filesystem
> but still remaining compatible with the Unix filesystem. I don't
> think it's a good idea. The question is who need it? I mean a new
> filesystem compatible with UNIX filesystem like VFS.

I think you may have misunderstood Duane there.

Squid will continue to have a file based object store as one option in
the foreseeable future, but there will be other options not relying on
having a filesystem or directory structures.

We are not talking about implementing a new filesystem in the kernel,
only to allow Squid to use other storage methods than a normal
filesystem.

> In my opinion, I think the best way is to create a Cache Object
> Storage System or called "Hash File System" directly on device file.
> (Although it's still a filesystem, but not similar to any existing
> filesystem)

What makes you say that it isn't similar to any existing filesystem? In
the lower layers of the UFS family of filesystems the file is named by
it's inode which is an abstract number. The directories and filenames is
a way to index the inodes

> A cache storage system behavior is very different with normal
> filesystem. First, it did not need a file name or URL to open a
> cached object. Currently you use MD5 hash key (although I think so
> complex method is not necessary). So you don't need normal directory
> structure or directory tree.

Agreed.

> Just seek the directory item position by the hash key.

Or any other key the object is stored by. As you say other naming shemes
are possible than using the MD5 hash. For Squid it may actually be
required as a number of objects can exists for the same MD5 hash (when a
object gets replaced with a newer one while there are readers for the
older version).

> Secondary, before you save the cache object to disk, most time
> you know the file size already, so you can allocate the disk block
> as continuous as possible.

Agreed. This is being worked on (done).

> There will be less pieces in disk. You don't need a i-node for
> files, just use a pointer to data and a flag (indicate whether
> store in one piece), if it stores in several pieces, add a node
> table just before the file (cache object) to point rest pieces.

As you say you will need some kind of file node, for consistency
validation reasons if nothing else.

> If this table size is not enough, then use a chain. A separate
> i-node is not necessary for you never randomize access the cache
> file and usually only read from begin.

Agreed.

[lots of interesting stuff deleted]

> Back to begin, the question is who need a compatible filesystem?

We don't. It is not the issue.

> Although I am a programer(mainly in C++). And I don't know UNIX too
> much. But I think every one want his program be the best. What will
> you think about? If I am wrong or you think so, please tell me.

Fully agree with you.

> Someone is talking about a cyclic filesystem, I wonder if a heavily
> load cache mean object life is less than a week, who can take
> advantage from it. It's only useful for people who have very very
> large disks.

That would be me I guess. It spun off from the thesis that a cache
filesystem ought to be optimized for writing, with reasonable thruput
for reading.

Have a large can of ideas on how to make a cache object store. The
cyclic one is only one of the designs. In it's purest circular form it
would require a lot of disk to maintain a good hit ratio, however with
some minor modifications and wasting some disk space you can still
preserve most of the properties of LRU, which should make it more
economic in terms of disk. The disk storage is cyclic, but object
lifetime more like LRU.

A quick summary of the storage ideas buzzing around in my head:

* Cyclic storage with LRU modifications
* Chunked storage
* Chunked cyclic storage
* To look into the ideas of purely log based filesystems
* On-disk hash based indexing without a memory index (i.e. like what you
proposed)
* and a couple of other things

We could sure make use of more programmers or designers in this area.
You are more than welcome to join the work of making a better disk
storage system if you want to.

--
Henrik Nordstrom
Squid hacker
http://squid.sourceforge.net/
Received on Wed Feb 02 2000 - 13:00:31 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:21 MST