squid3: we're out of fds - deferring io

From: Duane Wessels <wessels@dont-contact.us>
Date: Fri, 26 May 2006 13:47:26 -0600 (MDT)

Hi,

I spent some time tracking down (and finding) problems in squid-3
when Squid starts to run out of FDs.

Problem #1: comm_accept_check_event() scheduled incorrectly.

     from fdc_t::acceptOne():

         eventAdd("comm_accept_check_event", comm_accept_check_event, this,
                  1000.0 / (double)(accept.accept.check_delay), 1, false);

     There are at least two problems here. First, the division is
     wrong. It should be check_delay / 1000. Second, check_delay
     is never set for the HTTP accept socket because fdc_open() is
     not called for it. The value "infinity" gets passed to eventAdd().

     Also, the comm_accept_check_event is cancelled by a comm_close().

Problem #2: comm_accept_check_event() and AcceptLimiter accomplish the
     same thing

     Squid has two ways to make sure it doesn't accept new connections
     when short on FDs. One is httpAccept(), okToAccept, and the AcceptLimiter
     class. The other is acceptOne(), comm_accept_check_event(),
     a new connection has been accepted. The acceptOne() method also
     checks FD resources and might try the comm_accept_check_event
     trick *before* calling accept(2).

     I don't really see the point to having both of these competing
     methods in the code.

Problem #3: comm_poll.cc: assert(shutting_down)

    Due to problem(s) #1, Squid will get into a state where there
    is no handler for the incoming HTTP FD, and no way to get one
    back. The comm_accept_check_event() will never be called either
    because it is schedule at infinity, or because comm_close()
    cancels it. After some time all open file descriptors get closed
    and comm_poll() will assert because there are no FDs to poll on.
    The assertion is that the only time there should be no FDs to
    poll on is during shutdown.

    Now if problem #1 is fixed, it may be very unlikely that this
    condition could happen again. i.e., it is unlikely that all
    sockets and files would get closed before the event happens.
    But this only makes it less likely, not impossible, to happen.

Problem #4: reconfigure during deferred accept doesn't work

    Both accept-deferring techniques assume that the incoming HTTP
    FD does not change between the time deferring starts and ends.
    If Squid is reconfigured, the FD will likely change. This should
    be easy, but ugly, to fix.

It seems to me that AcceptLimiter works okay, and that
comm_accept_check_event() is a mess. Can someone justify keeping
comm_accept_check_event()?

Duane W.
Received on Fri May 26 2006 - 13:47:27 MDT

This archive was generated by hypermail pre-2.1.9 : Thu Jun 01 2006 - 12:00:04 MDT