update: Re: Validation code.

From: Robert Collins <robert.collins@dont-contact.us>
Date: Fri, 1 Dec 2000 10:29:29 +1100

Just making a note in my earlier reply...
----- Original Message -----
From: "Robert Collins" <robert.collins@itdomain.com.au>
To: "Duane Wessels" <wessels@squid-cache.org>
Cc: <squid-dev@squid-cache.org>
Sent: Friday, December 01, 2000 10:05 AM
Subject: Re: Validation code.

>
> ----- Original Message -----
> From: "Duane Wessels" <wessels@squid-cache.org>
> To: "Robert Collins" <robert.collins@itdomain.com.au>
> Cc: <squid-dev@squid-cache.org>
> Sent: Friday, December 01, 2000 9:41 AM
> Subject: Re: Validation code.
>
>
> >
> >
> > On Sun, 26 Nov 2000, Robert Collins wrote:
> >
> > > so the cleanup code could that calls doublecheck could call it on a new file
> > > being written, thus causes a problem.
> >
> > Can you be more specific. What bad things are going to happen in this
> > case? Unlinking an open file is not bad (at least not on unix) because
> > any other thread that has the file open for reading and writing can
> > continue reading and writing.
>
> We lose the swapfile. And if it's a big one, that will affect our hit ratio. (for example I reboot squid, a client starts
> downloading a service pack. Yes the file is written, but it's released as soon as the client is finished.

forget this - it's been solved..

> >
> > > Finally is there any reason the storeCleanup code can't be part of the
> > > rebuild? Now the disk checking is a background task, the Cleanup routine
> > > just counts the file sizes and sets the VALIDATED bit.
> >
> > Actually I was thinking that storeCleanup() can probably go away
> > entirely. The VALIDATED bit is less useful now that the
> > swapin code is more robust (checking size, MD5, URL).
>
> Do you mean the swap file size check or the sane metadata check? It was the file size checking that you pointed out as slowwwwwing
> the whole thing down - so I got rid of that again and put it in the background check.
>
> > Also I think there is really no difference between a clean and
> > dirty rebuild, so that can disappear as well.
> >
>
> In reverse order (it'll make more sense).
>
> The clean and dirty rebuild seem quite different to me: the clean code just reads the file into memory. It doesn't perform any
> checks to see if there are duplicate entries/MD5 is wrong etc. So it is very fast. The rebuild from dirty checks for previously
read
> in entries conflicting URI's cancelled downloads etc etc. Getting rid of the concept of clean vs dirty means those checks will
occur
> every time - slowing it down.
>
> As an example my workstation running win2k rebuilds around 6000 entries per second on clean, and around 5000 on dirty. But it may
> make for difference for a large store size on a big server.
>
> Onto the VALIDATED bit: If we can get rid of that and assume that once it's in memory and the store log has been completely
rebuilt
> that the object is valid, then lets get rid of it. I do suggest that we only bring the store dir online once the rebuild is
> finished - to prevent trying to serve out an old hit (which the rebuild from dirty code corrects by the time the log is fully
read).
>
> maybe on a rebuild from directory we could mark the store as hit-only immediately because we know that there will be no collisions
> between the objects, and once the directories are checked allow writes to occur?
>
> this will still allow removal of the storeCleanup() routine, and should provide earlier hits on rebuilds than we get today.
>
> Comments? I'm happy to do this as part of the store_check stuff... or maybe it should be a 2.5 project?
>
>
>
>
Received on Thu Nov 30 2000 - 16:21:54 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:13:01 MST