Re: Store state logs

From: Stewart Forster <slf@dont-contact.us>
Date: Wed, 04 Feb 1998 11:52:44 +1100

--MimeMultipartBoundary
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

> My idea on how the store state maintaince should work:
>
> 1. Keep a transaction based log of the swap state. We should be able to
> quickly recover the state from the state logs even if Squid restarts
> several times while starting, and this without corrupting the cache or
> losing track of cached objets.

        This is what my jumbo patch to beta11 added.

> 2. Duplicate the metadata that is not deduced from the object data
> inside each swap file. Most notably the URL.

        Indeed.
 
> All extra overhead while the swap state is recovered should be avoided
> if possible (and touching each swap object is a HUGE overhead).

        Agreed. This was the point of my posting yesterday.

> The object based metadata is primarly for verification and recovery
> purposes.
>
> - Verify that swapped in object this really the object we think it
> should be (the correct URL), to guarantee that Squid in no circumstanses
> gives the wrong object to the clients, even if the cache is corrupted.

        This is pretty much all that needs to be added after the transaction
based swap.state file patch.

> - If using hash based store keys, the metadata does not contain the
> objcet URL and we should be able to recover this somehow. The URL is
> needed for proper Hit-metering and possibly other operations as well,
> but it is not required in-memory. And this guards for any unexpected
> hash collisions, which gives the hash based keys a higher trust factor.

        True. An fstat on the object to get the expected size coupled with
the hash-based store key would "almost guarantee" no collisions. It's almost
like getting a collision on two separate hashes, and in fact that may be a
way to go. Use's URLs for ABSOLUTE safety, use two hashes and a fstat on
the size of extremely good safety, 1 single hash and fstat for reasonable
safety, and a straight hash for passable safety. I don't know how many of
these options you'd want but you could definately make them configurable.

> - If the swap state log is lost, Squid could slowly rebuild the cache
> from the disk objects. This should be done at a moderate speed to not
> saturate the system by the cache rebuild.

        Agreed.

> The requirements for the state logs depends a bit on how we do the store
> object validation. Without validation the transaction model of the state
> logs needs to be strict and include expunged objects. If we add
> validation on swapin (combined with a graceful fall back to fetch the
> object when the validation fails) we can handle the state more loosely,
> with the main purpose of quickly knowing at least wich objects we have
> in the store. In this situation it does not matter much if we think we
> have a few objects that in fact is not there, as this will be recovered
> gracefully when encountered.
>
> And in either case, the URL validation is more or less required, to
> reliably handle store hits while recovering the swap state from log
> files.

        The less strict approach is what my patch uses. Objects are marked
invalid while reading in, but after the swap.state files have been read,
they are all pretty much valid and are marked as such without doing any
extra work. The rest is left up to fstat() to verify object clashes and
storeSwapInObject() failures to get the object from the source. It all
works cleanly. The code reports when it needs to do either, and from what
we have running here, it happens VERY rarely, and only occasionally during
a dirty rebuild readin and almost not at all after that.

> Some ideas on how the store logs can be handled:
>
> The store state logs is kept in files named
> swap.state.<nnnn>
>
> The base log called
> swap.state.1
>
> Updates are written to transaction logs named
> swap.state.<timestamp>
> where timestamp is when the log was opened (for example when Squid
> starts).
>
> Periodically write the swap state to new.swap.state. When this done
> rename it to swap.state.1 and remove all previous transaction logs (all
> but the current one).

        Take a look at the patch code. It simply uses the storeWriteCleanLogs
to do this task (whever a log rotate is done) to write out clean non-
transaction logs (when are then appended to with transactions).

        Stew.

--MimeMultipartBoundary--
Received on Tue Jul 29 2003 - 13:15:45 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:42 MST