[PREVIEW] cache_peer standby=N

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 07 Feb 2014 14:14:30 -0700

Hello,

    The attached patch implements a new Squid feature: An actively
replenished pool of unused peer connections. This is similar to Squid2
idle=N connection feature, with a few key differences discussed below. I
am posting this patch as a PREVIEW, not PATCH, because it has not been
ported to recent trunk and checked for "make check" failures (and also
has no support for SSL peers).

If you have any high-level comments about this work, please share them.
I tried to disclose and explain all known high-level controversial
decisions or changes below. Please review squid-dev "steady peer
connection pool" thread for the background, to avoid rehashing the same
issues to the extent possible.

A more detailed description and discussion of the changes follows.

Scope
-----

The feature focus is to instantly provide a ready-to-use connection to a
cooperating cache peer, virtually at all times. This is useful when
connection establishment is "too slow" and/or when infrequent peer use
prevents Squid from combating slow connection establishment with the
regular idle connection pool.

What about Squid2 idle=N?
-------------------------

The feature is similar to Squid2 idle=N feature, but there are key
differences:

* Standby connections are available virtually at all times, while Squid2
unused "idle" connections are available only for a short time after a
peer request.

* All N standby connections are not opened at once, reducing the chance
of the feature being mistaken for a DoS attack on a peer.

* More consistent support for peers with multiple IP addresses (peer IPs
are cycled through, just like during regular Squid request forwarding).

Besides, "idle" is a poor choice of adjective for an unused connection
pool name because the same term is used for used persistent connections,
which have somewhat different properties, are stored in a different
pool, may need distinct set of tuning options, etc. It is better to use
a dedicated term for the new feature.

Max-conn=M with standby=N
-------------------------

The relationship between the max-conn limit and standby/idle connections
is a complex one. After several rewrites and tests, Squid now obeys
max-conn limit when opening new standby connections and accounts for the
standby connections when checking whether to allow peer use. This often
works OK, but leads to standby guarantee violations when non-standby
connections approach the limit.

The alternative design where standby code ignores max-conn works better,
but is really difficult to explain and advocate because an admin expects
max-conn to cover all connections, and because of the idle connections
accounting and maintenance bugs. We may need to come back to this when
the idle connections code is fixed.

Fixed max-conn documentation and XXXed a peerHTTPOkay() bug (now in
peerHasConnAvailable()) that results in max-conn limit preventing the
use of a peer with idle persistent connections.

Using standby conns for POSTs and PUTs
--------------------------------------

Decided to use standby connections for non-retriable requests. Avoiding
standby connections for POSTs and such would violate the main purpose of
the feature: providing an instant ready-to-use connection. A user does
not care whether it is waiting too long for a GET or POST request.
Actually, a user may care more when their POST requests are delayed
(because canceling and retrying them is often scary from the user point
of view).

The idea behind standby connections is that the admin is responsible for
avoiding race conditions by properly configuring the peering Squids. If
such proper configuration is not possible or the consequences of rare
races (e.g., due to peer shutdown) are more severe than the consequences
of slow requests, the admin should not use standby=N.

This choice may become configurable in the future.

TODO: Peer probing
------------------

Teach peer probing code to push successful probing connections into the
standby pool (when enabled). Should be done as a followup project
because of the differences in standby and probe connection opening code,
especially when SSL peers are supported. Will require some discussion.

TODO: Send a fake request after opening a connection
----------------------------------------------------

In some cases, it is a good idea to send a Squid-originated HTTP request
on the just-opened connection, placing it in the idle connection pool on
the peer side. In some cases, that would not really work well because
the peer are going to close those idle pconns too soon. It also
introduces availability delays. Should probably be a configurable feature.

I think this is best done when we make client side more modular and can
reuse its code for this feature. Otherwise, we would have to duplicate a
lot of complex ClientStreams/Store integration logic in this code.

Why whole PconnPool for only one peer?
--------------------------------------

Unlike the ICAP code, a standby pool is using a full-blown PconnPool
object for storage instead of the smaller IdleConnList. The primary
reasons for this design were:

* A peer may have multiple addresses and those addresses may change.
PconnPool has code to deal with multiple addresses while IdleConnList
does not. I do not think this difference is really used in this
implementation, but I did not want to face an unknown limitation. Note
that ICAP does not support multiple ICAP server addresses at all.

* PconnPool has reporting (and cache manager integration) code that we
should eventually improve and report standby-specific stats. When this
happens, PconnPool will probably become abstract and spawn two kids, one
for pconn and one for standby pools.

Added reconfiguration support to RegisteredRunner API
-----------------------------------------------------

Added RegisteredRunner::sync() method to use during Squid
reconfiguration. The existing run() method and destructor are great for
the initial configuration and final shutdown, but do not work well for
reconfiguration when you do not want to completely destroy and then
recreate the state. The sync() method (called via SyncRegistered) can be
used for that.

Eventually, the reconfiguration API should present the old "saved"
config and the new "current" config to RegisteredRunners so that they
can update their modules/features intelligently. For now, they just see
the new config.

If the feature is merged, this change will go in a separate sub-commit.

Other seemingly unrelated changes triggered by standby=N addition
-----------------------------------------------------------------

* Removed PconnPool from fde.h. We used to create immortal PconnPool
objects. Now, standby pools are destroyed when their peer is destroyed.
Sharing raw pointers to such pools is too dangerous. We could use smart
pointers, but PconnPools do not really belong to such a low-level object
like fde IMHO.

* Added FwdState::closeServerConnection() to encapsulate server
connection closing code, including the new noteUses() maintenance. Also
updated FwdState::serverClosed() to do the same maintenance.

* Encapsulated commonly reused ToS and NfMark code into
GetMarkingsToServer(). May need more work in FwdState where some
similar-but-different code remains.

* Close all connections in IdleConnList upon deletion. The old code did
not care because we never deleted PconnPools (although there may have
been minor bugs related to ICAP service connection pools which use
IdleConnList directly).

* Fixed PconnPool::dumpHash(). It was listing the first entry twice
because the code misused misnamed hash_next().

* Removed unnecessary hard-coded limit on the number of PconnPools. Use
std::set for their storage.

* Fixed very stale PconnPool::pop() documentation and polished its code.

Thank you,

Alex.

Received on Fri Feb 07 2014 - 21:14:36 MST

This archive was generated by hypermail 2.2.0 : Sat Feb 08 2014 - 12:00:11 MST