Re: io assumptions

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Thu, 05 Dec 2002 09:26:10 +0100

Robert Collins wrote:

> storeDiskdOpen for instance, if the diskd shmget fails, cleans up the
> request and returns NULL - inficating a failure. If the request is
> queued, then yes it currently returns after the next io loop. BUT:
> overlapped IO (or any OS-callback based IO) could potentially callback
> immediately if the file metadata is in cache - breaking the current calling code.

I don't agree here.

The callback on the SIO may only occur when the store is beeing polled
or there is specific I/O activity, not randomly at any time. Should
probably be limited to polling only to avoid I/O error races.

If your underlying I/O mechanism has a builtin callback mechanism where
the callback is made asyncronously without polling then this callback
must only be into the "fs" driver, not on the SIO object, and the "fs"
implementation needs to queue the event until it can be processed in a
sane manner. You also need to employ some kind of safe locking in such
case.. such designs is probably only safe when the callback occurs as a
new thread.

> Another example: storeUfsOpen returns NULL on open failure, an object on
> open success.
>
> And storeAufsOpen returns NULL to shed IO load, and an object that can
> have reads queued - but that may not actually open successfully - if the
> request gets queued.

Yes?

Same thing on create.

> I think we would be better served by:
> * void return type.
> * open always calls back, with error (failure of some sort) or good object (success on open).
> * And the callback is allowed to occur immediately.

The second part of the second point is not acceptable (callback to
return "good" object) and won't solve your problem unless you are
willing to paint us into a FS design corner which the current design is
deliberately designed to avoid.

There is no callback today on successful open as this carries no
significant information in the current API.

There is a callback on successful create of an object identity. The
purpose of this callback is NOT to signal that the create operation was
successful but to signal that this object has now been assigned an
identity in the fs layer. This callback can occur at any time from
storeCreate() up to where the SIO has been destroyed, even after
storeClose().

There is also a callback on close. This callback is also used for
signalling I/O errors, including failed open/create.

What should perhaps be done is to separate I/O errors from close, and
not automatically destroy the SIO on I/O errors. In terms of the storeio
API a failed open/create is just a kind of I/O error. There is nothing
special about a failed open/create. The same things happens if a read or
write fails.

This design is intentionally done such that the core does not rely on
when/how/why the FS layer opens/closes files, assigns object identities
etc, only that it gets done.

Yes, this makes life slightly more complex in the storeio layer, but is
very much intentional as other designs paints you into corners where
many interesting object store design cannot be done without a great deal
of complexity.

The property I am defending here:

The object identity assignment part of storeCreate() should be allowed
to be delayed to where sufficient amount of data has been sent to the
storeio layer, possibly the whole object contents. This to allow for
storeio implementations which uses the object itentity as a pointer to
where the data is stored and not as an indirect name (a UNIX file name
is a indirect name, a block pointer is not), and to be allowed to assign
this when the data can be laid out on disk.

What I can accept as a change in the storeio API here is that
storeOpen() always returns a SIO and only the callback is used for
signalling "I/O errors" such as load schredding or other events where
the object cannot be accessed. But I see no good reasons why to do this
and it increases the overhead significantly in the load schredding case.

I can also accept that I/O errros does not automatically close the SIO
and storeClose() must be called unless it has already been called for
the SIO, but this mainly makes the storeio implementation more complex
as it then needs to have two slightly different paths in how to deal
with I/O errors (one if the SIO is currently open, another if the SIO
has already been closed and the caller is waiting for all writes to
complete)

Regards
Henrik
Received on Thu Dec 05 2002 - 01:26:36 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:19:00 MST