Store I/O interface

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Sat, 18 Sep 1999 17:52:38 +0200

I have been thinking on the design of Store I/O again..

First a potential problem:

How is the borderline between what is swapped out and what is kept in
memory managed? I assume that the memory object low position mark will
be kept at the lowest position of any reader, but what about new
readers? Isn't there a race condition for new readers where data may
have been removed from the memory object but not yet actually written to
the filesystem object? remember: writes are asyncronous without
notification, so it is very hard to know how much has actually been
written and can be read back.

Then some design issues:

Back to the basic design issue discussed some time ago. If we are to be
able to implement efficient custom filesystems then it ought to be up to
the filesystem module to assign the filenumber part of swapfileno. This
is to allow the filesystem to use the StoreEntry as directory structure
to minimize the amount of indirect lookups needed to reach the
filesystem object. There isn't a single reason why the file part of
swapfileno namespace needs to be managed at a higher level as it does
not identify a StoreEntry, only the on-disk object.

This may be accomplished is by splitting the open operation in three
slightly different operations
a) Open an existing object for reading, with a given integer id.
b) Create a new object. A asyncronous callback will tell what the
integer ID of the new object is sometime while the object is open,
guaranteed before the close callback is called.
c) Open a read handle from a write handle. To open read handles to a
object being written, without having to know the numerical ID (which may
not have been assigned yet). Also of importance for the filesystem code
when allowing readers while writing.

Other missing Store I/O operations which makes sense is

* An abort operation to abort storage of an object. From the assumtion
that most objects released within the first few seconds of it's storage
time are aborted during storage, and if this was signalled with a abort
instead of close, remove then the filesystem can often avoid to store
the object at all, and there is less stress on the store state log.
* Size hint while writing. The object is expected to get this size when
all data is written. Important for buffer management and storage layout
planning. Also allows for different strategies based on object size.
* Size hint when opening an object for read. For some filesystems,
knowing the size when an object is opened could be very helpful both for
read-ahead buffer management, and in some cases for knowing how large
the object actually is without having to walk thru on-disk structures.
* An operation similar to stat(), to verify that an on-disk object
actually exists, and to get it's size and creation date.

It also makes sense if the sio interface was modelled more like normal
file I/O with read/write operations that operate at the current
position, and a seek operation to move around. This would simplify
buffer management and such things at the filesystem level. From what I
can tell seeking is only needed when serving a range request, all other
operations is entirely sequential from start to end.

/Henrik
Received on Tue Jul 29 2003 - 13:16:00 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:17 MST