Re: Squid connection retry problem -- fd leak? from WWW server manager on 1997-09-15 (squid-users)

From: WWW server manager <webadm@dont-contact.us>
Date: Mon, 15 Sep 1997 15:42:05 +0100 (BST)

Michael Pelletier wrote:
> On Mon, 15 Sep 1997, John Line wrote:
> > Duane's analysis of the original problem was that a connection problem
> > occurred at a stage in connection setup when Squid didn't have a timeout
> > set to allow it to clean up after problems, leaving things in a mess. It
> > looks as if the retry patch may introduce a similar problem.
>
> Duane, or someone, do you have the details of what was done to correct the
> problem originally?

Here are the patches which Duane sent me to evaluate, which were incorporated
in 1.NOVM.15 - this should make it clearer which changes were for the FD
problem. (Both are for http.c)

=====
Index: http.c
===================================================================
RCS file: /surf1/CVS/squid/src/http.c,v
retrieving revision 1.143.2.14
diff -w -u -r1.143.2.14 http.c
--- http.c 1997/07/11 21:51:17 1.143.2.14
+++ http.c 1997/07/17 23:23:26
@@ -918,6 +918,11 @@
     comm_add_close_handler(httpState->fd,
         httpStateFree,
         (void *) httpState);
+ commSetSelect(httpState->fd,
+ COMM_SELECT_TIMEOUT,
+ httpReadReplyTimeout,
+ (void *) httpState,
+ Config.connectTimeout);
     request->method = orig_request->method;
     xstrncpy(request->host, e->host, SQUIDHOSTNAMELEN);
     request->port = e->http_port;

Index: http.c
===================================================================
RCS file: /surf1/CVS/squid/src/http.c,v
retrieving revision 1.143.2.14
diff -w -u -r1.143.2.14 http.c
--- http.c 1997/07/11 21:51:17 1.143.2.14
+++ http.c 1997/07/25 22:25:28
@@ -1011,6 +1016,11 @@
     comm_add_close_handler(httpState->fd,
         httpStateFree,
         (void *) httpState);
+ commSetSelect(httpState->fd,
+ COMM_SELECT_TIMEOUT,
+ httpReadReplyTimeout,
+ (void *) httpState,
+ Config.connectTimeout);
     httpState->ip_lookup_pending = 1;
     ipcache_nbgethostbyname(request->host,
         httpState->fd,
=====

Duane noted "So when you're done, there should be a call to

         commSetSelect(httpState->fd,
                COMM_SELECT_TIMEOUT,
                ...

in both functions httpStart() and proxyhttpStart()."

> I thought I was taking care of timeouts correctly in
> the connection-retry patch, but maybe not. The timeouts I'm setting
> relate to the connection establishment, and I think that the connection
> timeout handler is set when the filehandle is first initialized, right?

I can't comment on that but the patches may answer the question. I tried
comparing Duane's patches with what the retry patch does, but the retry patches
appear to be working "at a lower level", updating timeouts in data structures
rather than as arguments to connection setup functions.

> Perhaps when the fd is being dup2()'d for the next attempt, something odd
> is happening.

Hmm... that rings alarm bells for me (on Solaris 2.5). A recent change in the
Apache web server hit problems with Solaris 2 because dup-ing sockets leads to
problems (in some unspecified circumstances, maybe not in general) - the
comments in the Apache source says

    /* Solaris (probably versions 2.4, 2.5, and 2.5.1 with various levels
     * of tcp patches) has some really weird bugs where if you dup the
     * socket now it breaks things across SIGHUP restarts. It'll either
     * be unable to bind, or it won't respond.
     */

I don't know if that is relevant, or totally unrelated. (Apache appears to dup
the socket FD using fcntl, though, not dup2 - don't know if that makes a
difference.)

Is anyone seeing the problem on anything other than Solaris 2?

> One important question is: when you see these stuck write fd's, do you
> see anything in the log file pertaining to connection retries on that
> address?

After my earlier messages, I tried 1.NOVM.16 + retry patch (Oskar Pearson's
version) for a few hours, but backed off to 1.NOVM.16 without the patch as it
rapidly became clear the problem was still there. cache.log does not show *any*
retries during the time I was running with the retry patch, but the problem
still happened...

John Line

-- 
University of Cambridge WWW manager account (usually John Line)
Send general WWW-related enquiries to webmaster@ucs.cam.ac.uk

Received on Mon Sep 15 1997 - 07:50:02 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:37:06 MST