Re[2]: Do not make a compatible squid file system

From: wang_daqing <wang_daqing@dont-contact.us>
Date: Tue, 8 Feb 2000 14:41:23 +0800

Hello Henrik,

HN> wang_daqing wrote:

>> But I also do not agree this. For you create a new system not like
>> other filesystem. Why you want to operate them same way? I suggest
>> abstract in object store layer not in filesystem layer.

HN> The API is not moddeled like the UFS API, is is modelled around Squids
HN> needs for object storage. Then there is a storage implemenation mapping
HN> this to UFS calls.

May be you are saying same thing I am said, may be not. The question
is how you abstract? In a filesystem level for high performance only
or in full object orient level. That mean can somebody plug another
filesystem for other purpose or this API work fine only with your
SquidFS ?

Another question is can squid can work with UFS and SquidFs at same
time while running. That also tell me you abstract in which level?

>> I don't know what are you thinking or feel about inode. I think a bit
>> FAT is enough(one disk block one bit) for fsck.

HN> That is a block bitmap, not a FAT. A FAT also contains file specific
HN> information like block sequences.

In a compressed DOS filesystem. That called BITFAT. May be you think
it's wrong.

Let's look back on my suggestion: A summary.

1. I want reduce squid memory usage(include meta data load in memory
and filesystem cache) by store meta data and directory item in same
place and node with file in same place(for most file is small in 8-14K
average object size so you can store them in one piece easily and do
not need node and you rarely need grow a file and never truncate a
file). Although may be it's not you interesting. I think this will not
conflict with get more high performance at all.

I think save memory is more important for leaf cache and low load
cache. But a cyclical filesystem will take no advantage to these. In
another meaning, currently you need about same money to buy memory and
disk. If the memory usage can reduce to 40%(I wish to 25%), you can
save money or use same money do more things.

2. I also want a high performance cache if it does not take more
price. I think every people will agree the best way is write one block
and read one block per object(almost) when need. Then the question is
how to write and how to read.

I do not against a cyclic system but the question is how much benefit
we can get(compare to a storage system with one disk read/write
randomize store like my suggestion don't compare with a normal
filesystem, it's unfair) and how price it cost.

I noticed that someone use a hit ratio 40% in a example. That mean
filesystem take 60% flow in write and 40% in read (Just a rough
description). But if the hit ratio is 65%. What will happen? (It's
seems impossible in backbone cache, but it's possible in leaf cache
and you want the cache hit ratio can be improved isn't it?)

If the hit ratio improved. The benefit from cyclic system will reduce
and the cost will grow for it need write objects more than normal
cache for garbage collection reason and waste disk. Another question
is it may be get bad performance or not compatible with other heap
replacement policy like GDSF or LFUDA which may be improve the hit
ratio.

Here are my thinking about object store in disk write and read.

Although read request may be have some time relation in mathematics.
But it's too small so you can get only a little benefit by write
object closely when received than randomize store. To store object
closely by request time relation of course is better than store
randomize. So I always agree store object closely than randomize.

But if store objects in a continuous write will get more better
performance than just store them closely. So that's why you want a
cyclic system. (Other advantage is not important) Of course this many
objects one write behavior will only take advantage in disk write and
I think the difference between disk read compare with store object
closely in time relation is very little and can be ignored. ( for
write and read request time relation will be very low).

Another question is you don't really need a cyclic system. You only
need a large enough continuous free space for a chunk write. But you
think the best way to get a large block is cyclic. So you want that.

If you store object randomize. assume the disk average seek time is 9
ms. and mean object size is 13K, Then the write performance is about
1.44MB/s. In http://moat.nlanr.net/Dskwtst/ there are a continuous
disk write test. Some disk can get a performance about to 13MB/s but
some only 3.8MB/s. My question is if you store object closely as
possible but not together, the performance is what? ( Is there anyone
really know how disk operate with it's cache when write is closely but
not continuous?)

This test will write to disk with objects in randomize size but in
about 13K mean size and with a randomize interval similar to object
size. (It's depend how much disk block free in write position) If I
have time, I'll try test it.

If the difference is not too large. I suggest store objects closely
(elevator optimization?) with a burst write. It will take no cost and
avoid some potential problem of the cyclic system like garbage
collection rewrite and heap replacement compatible and may be more.

I am still reading on your discussion about cyclic system.

Someone say cache many objects will waste disk for most of them will
never be re-referenced. But if you want improve the hit ratio, you
have to do like that. (Why we need a cache? I don't think 30% is
enough)

Best regards,
 Wang_daqing mailto:wang_daqing@163.net
Received on Tue Feb 08 2000 - 10:48:30 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:21 MST