Re: TCP Connection failures

From: David Luyer <luyer@dont-contact.us>
Date: Thu, 30 Oct 1997 19:04:35 +0800

Ralph Rudolph wrote:
> On Wed, 29 Oct 1997 09:58:04 +0800, David Luyer <luyer@ucs.uwa.edu.au> wrote:
> >(re: "TCP connection to xxx/3128 failed" problems)
> >
> >It's most likely a linux bug - up to 2.0.31 you can get "bind - address
> >already in use" from autobinds or binds to port zero, meaning you fail
> >connections which are meant to just come from random TCP ports.
> (...)
> >It's probably better to fix the network/machine/OS problems than to patch
> >squid to handle them.
>
> David, and all the others,
> does that mean that upgrading to linux 2.0.(>31) will fix the problem?

Possibly, maybe only partially. I'm still seeing TCP connections to parent
failing, but I think it's all the parent's fault now...
 
> Upgrading all linux boxes would take me at least a day, but Im willing to do
> it if chances are that it will fix the problem.

You'd only have to upgrade the boxes running caches.

> Is there anybody who could affirm this? I mean, who had the lost TCP
> connections before upgrading, and solved it by upgrading to linux 2.0.31++ ?

Yes and no, as above. I'm not seeing as many clearly false failures, but
I'm still seeing failures. You're probably better off applying the patch I've
included below, BY HAND. This won't patch properly since I've just cut
the relevant parts out of a larger patch.

And, in fact, I just noticed it's a reversed patch (ie, the top sections are
the "fixed" code, the bottom sections are the bad stuff).

The algorithm should be self-explanatory - give a parent (or peer) cache 10
chances before marking it truly dead, as soon as it fails the first time
fire off a connection attempt and if that succeeds mark it back with 10 fresh
chances.

Anyone is welcome to make a 'clean' patch out of this.

David.

PART I: in neighborAdd() increase tcp_up to 10
***************
*** 885,891 ****
      e->acls = NULL;
      e->icp_version = ICP_VERSION_CURRENT;
      e->type = parseNeighborType(type);
! e->tcp_up = 10;
  
      /* Append peer */
      if (!Peers.peers_head)
--- 879,885 ----
      e->acls = NULL;
      e->icp_version = ICP_VERSION_CURRENT;
      e->type = parseNeighborType(type);
! e->tcp_up = 1;
  
      /* Append peer */
      if (!Peers.peers_head)

PART II: in peerCheckConnectDone() change structure a bit, reduce time to 60
***************
*** 1131,1144 ****
  peerCheckConnectDone(int fd, int status, void *data)
  {
      peer *p = data;
!
! if (status == COMM_OK) {
! p->tcp_up = 10;
        debug(15, 0, "TCP connection to %s/%d succeeded\n",
            p->host, p->http_port);
      } else {
        p->ck_conn_event_pend++;
! eventAdd("peerCheckConnect", peerCheckConnect, p, 60);
      }
      comm_close(fd);
      return;
--- 1125,1137 ----
  peerCheckConnectDone(int fd, int status, void *data)
  {
      peer *p = data;
! p->tcp_up = status == COMM_OK ? 1 : 0;
! if (p->tcp_up) {
        debug(15, 0, "TCP connection to %s/%d succeeded\n",
            p->host, p->http_port);
      } else {
        p->ck_conn_event_pend++;
! eventAdd("peerCheckConnect", peerCheckConnect, p, 80);
      }
      comm_close(fd);
      return;

PART III: in peerCheckConnectStart() change structure, reduce time to 30
***************
*** 1150,1161 ****
      if (!p->tcp_up)
        return;
      debug(15, 0, "TCP connection to %s/%d failed\n", p->host, p->http_port);
! p->tcp_up--;
! if (p->tcp_up != 9)
! return;
      p->last_fail_time = squid_curtime;
      p->ck_conn_event_pend++;
! eventAdd("peerCheckConnect", peerCheckConnect, p, 30);
  }
  
  static void
--- 1143,1152 ----
      if (!p->tcp_up)
        return;
      debug(15, 0, "TCP connection to %s/%d failed\n", p->host, p->http_port);
! p->tcp_up = 0;
      p->last_fail_time = squid_curtime;
      p->ck_conn_event_pend++;
! eventAdd("peerCheckConnect", peerCheckConnect, p, 80);
  }
  
  static void
Received on Thu Oct 30 1997 - 03:15:34 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:37:22 MST