Re: Problems with filedescriptor leaks under Solaris

From: WWW server manager <webadm@dont-contact.us>
Date: Mon, 18 Aug 1997 22:51:17 +0100 (BST)

Chris Tilbury wrote
> >
> > Filedescriptor leaks for Squid running under Solaris are a problem,
> > it would appear, for people using the NOVM version under reasonable
> > loads (>250 concurrent requests), causing the Squid process to run
> > out of FDs and stop working completely after a while.

and in reply Edward Henigin wrote:
> I had an FD leakage problem with 1.NOVM.11, Solaris 2.5/2.5.1,
> and the FD leakage has completely stopped when I upgraded to 1.NOVM.15.

The fix (which Duane Wessels developed in response to information from some
experimentation here) seems to have fixed the file descriptor leak, but I'm
suspicious that adding Michael Pelletier's connection retry patch (which
applies almost cleanly; manual fixup needed for the version string and one
source file where there's an adjacent change in 1.NOVM.15) has reintroduced
some variant of the same problem (part of connection setup not protected by
a timeout, such that cleanup was incomplete if there was an interruption at
that point).

What I've been seeing is that 1.NOVM.15 (or in my case, a pre-release
1.NOVM.15 plus Duane's draft versions of the patches) seemed to fix the FD
leak completely; it could just have been confusing good luck (still broken
but no leaks), but it ran for days (on a couple of occasions) with no
problems, so it looks hopeful. However, with the connection retry patch
added (both to the prerelease version, and now the final release 1.NOVM.15),
I'm seeing similar symptoms to the old problem - initially, a number of old
(by the time I notice) zero-length cache files with a pair of
filedescriptors listed in cachemgr.cgi output, one open for reading and one
for writing. Also a persistent vm_objects summary entry for each such case.
After HUPing Squid to reread its configuration, the reading FDs seem to get
cleaned up, but leave the writing FDs until Squid is shut down and
restarted.

Question 1: does anyone see long-term stuck file descriptors with unmodified
Squid 1.NOVM.15? [Does it happen with anything other than Solaris 2 (2.4 in
my case)?]

Question 2: does anyone who is using the connection retry patch with
1.NOVM.15 see FD leaks as described above, especially (if you've tried with
and without the patch) in cases where the problem didn't arise with
unpatched 1.NOVM.15?

I'm guessing that the retry patch (which I believe was written with
standard, non-NOVM Squid as its target, though not affected directly as far
as applying the patch is concerned by the changes for NOVM) may have
introduced some extra timing window in the NOVM case where things can get
left in the same sort of mess as happened routinely with unpatched copies of
earlier versions of Squid. Just a guess, but that's how it looks from what
I've seen.

Feedback on what other people are seeing with standard and retry-patched
1.NOVM.15 could help pin down where to look for the problem (i.e. Squid or
the connection retry patch...).

                                John Line

-- 
University of Cambridge WWW manager account (usually John Line)
Send general WWW-related enquiries to webmaster@ucs.cam.ac.uk
Received on Mon Aug 18 1997 - 15:04:21 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:36:47 MST