RE: How long does a disk-cache double-check last?

From: Chemolli Francesco (USI) <ChemolliF@dont-contact.us>
Date: Mon, 16 Oct 2000 16:31:34 +0200

> On Wed, Oct 11, 2000, Chemolli Francesco (USI) wrote:
> > Yesterday I started a cache double-check.
> > After 14 hours, it was still crunching, the
> > disk cache at the moment was 15 gigs big,
> > distributed among 5 diskd-based dirs on different
> > HDDs.
> >
> > CPU usage was very low, disk usage very high,
> > response by squid was very sluggish.
> >
> > So.. how long should I expect it to run?
> > 1 hour/gig seems a pretty big time...
>
> Its synchronous. This is probably your problem.

Actually my problem is that it's badly designed.
With help from Robert Collins, I figured out the problem:

When started with -S, squid stat()s every file it knows
about, and compares its known size with the on-disk size.
If it differs, it print an error message to cache.log
IT DOESN'T PERFORM ANY OTHER OPERATION.
Once it is finished going through all the files, it
will abort, upon failing the assertion that storeerrors==0.
It will then start over, with the same result, in
an endless loop.

How the operation is to be handled (maybe it's FAQ, maybe it
should be added).

1) Start squid -S
2) once it fails on the assert, kill it
3) grep cache.log looking for broken entries
4) forcibly rm broken entries
5) for each cache_dir, rm swap.state
6) restart squid. swap.state will be rebuilt

This is not good(TM). Too much manual intervention.

How it should be changed short-term (IMO):

The above operations should be performed automatically. When
squid -S finds some problem, it forcibly removes the broken
swap-file. Once it's done checking them all, it forces a
swap.state rebuilding.

How it should be changed long-term (IMO):

-add a cache_object option to double-check the storage
-when invoked, there are two cases:
 a) only one cache_dir. Act like the short-term fix above.
 b) more than one cache_dir. Read on
- for each cache dir, flag it as "unselectable". This means,
  it can be read from but it will never be selected to swap new
  objects on.
- once it's marked as "unselectable", fork. The child process
  immediately closess all accept-ports and open files, then goes
  on to work on the just-selected directory. It creates a
  working-copy of swap.state, and double-checks each file in
  that cache-dir. If mismatches are found, it removes the offending
  file from both the swap medium and the swap.state.
  The child can be completely synchronous. It doesn't need to care
  about select, delay, anything. Just get the job done fast.
- Once the child is done, it dies possibly returning some status-code.
  The parent checks the return-code, and if all went OK unmarks the
  "unselectable" dir, re-syncs to the new swap.state, and goes
  on to work on the next.

-- 
	/kinkie
Received on Mon Oct 16 2000 - 08:35:38 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:43 MST