Re: 2.3STABLE - stability issues

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Thu, 7 Sep 2000 10:09:11 +0200

On 6 Sep 2000, at 21:55, Duane Wessels <wessels@squid-cache.org> wrote:
> On Mon, 4 Sep 2000, Andres Kroonmaa wrote:
>
> > I lately ran into problem. Squid swapdisk got nearly full,
> > to the extent that Squid didn't have enough temporary space
> > to write swap.state.new (or swap.state.clean) and it bailed
> > out with fatal error. Squid is unable to recover on its own,
> > although this isn't very fatal error.
> > Manual recovery is quite dirty and implies removing contents
> > of several L2 dirs by hand.
> >
> > Wouldn't it be appropriate to delay writeout of clean swaplog
> > in case swapdir is too close to disk full?
>
> I think what you are asking for implies that you delete the old
> swap.log, then write the new one. Your assumption being that
> removing the old log frees up enough disk space for writing the
> new one?

 no I didn't mean that.

> I think its a bad idea if there is a point in time where NO
> swap state file exists on the disk. If squid crashes before
> writing the new log completes, you have a partial record, or
> perhaps no record at all.

 I agree. But I don't see very much difference. If squid can't
 write clean swap state, it fails fatal, and this implies dirty
 rebuild. And if it starts up dirty, it doesn't really matter
 if swap state file exists. It will be rebuilt. but the problem
 is that when it is rebuilt, squid may again fail to write out
 clean state.

> Maybe a better idea is for Squid to refuse to start unless
> there is a certain amount of free disk space.

 this isn't good also. We expect Squid caches to run unattented.

> Or it disabled the cache_dir (for reading and writing) until some
> space is freed up.

 This one would be better, but also not ideal.
 Look, if someone fills in the cache_dir other than Squid, then
 this is a DoS. Default squid conf assumes that cache swap is on
 /usr/local. So this can happen, even unpurposefully.

 There are situations when squid itself fails to remove certain files
 from swap. I my case, cache swap usage suddenly went up, although I
 had limited squid max disk usage at safe levels. I left over 500M of
 disk free (swap.state takes 100M), and still, due to something beyond
 me, it got full. Almost syncronously on 2 different boxes.

 Squid's current logic is: startup, look for swap.state in cache_dir,
 load it in. While verifying, service requests, and why not write new
 objects to disk. Then write clean swap.state and if any failure during
 that, panic. RunCache script restarts squid, and the loop is closed
 into endless panic, only with permanently starting up with dirty rebuild.

 Normally, when disk usage is above configured limits, squid enters into
 cleanup mode, removing objects until it reaches normal disk space levels.
 Squid can even detect disk-full errors and reduce configured limits
 to continue. But this only after the successful startup.

 What I propose is that Squid should estimate needed disk space to write
 swap.state.new, and if it can't then NOT panic, but enter storage cleanup
 mode, so that it delays write of clean swap.state until there is enough
 disk space. Perhaps it should keep rebuiling flag set, and skip writing
 new objects to disks while in this mode. Then, when disk space is freed
 to fit swap.state, write it out and be happy.

 Even if it never reaches enough free disk, it is better that squid will
 run in read-only recovery mode, rather than endlessly loop in panic,
 denying service to lots of pissed people until manual recovery is done.

 I know there is a config option to define where to place swap.state
 and one could place it on another disk, but this isn't really a solution,
 as totally unrelated disk can also get full. Besides this state file
 really belongs to cache_dir.

 Basically I wish that squid should recover by itself from as many
 failures as possible. Disk full is ridiculous reason for fatal failure.
 At least squid should be able to fallback to run-from-ram mode.

------------------------------------
 Andres Kroonmaa <andre@online.ee>
 Delfi Online
 Tel: 6501 731, Fax: 6501 708
 Pärnu mnt. 158, Tallinn,
 11317 Estonia
Received on Thu Sep 07 2000 - 02:12:31 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:36 MST