Re: Introduction / accelerator feature ideas

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Thu, 20 Feb 2003 22:07:58 +0100

Hi, and welcome to Squid-dev.

On Thursday 20 February 2003 20.03, Flemming Frandsen wrote:

> A) Race conditions exist in the webapplication (not that uncommon I
> guess) that means that having two identical requests running at the
> same time in different apache processes will either result in one
> of them blowing up or simply returning the wrong result.

Hmm. This is a new twist of an old problem. Basically a variant of the
problem that Squid may initiate multiple requests for the same URL,
but now with added twist that you only want to limit it per user..

The general problem that Squid may request the same URL multiple times
before knowing that the result is cacheable should be addressed for
accelerator setups where the resulting content is expected to be
cacheable.

Not entirely sure about your twist however. It smells more of a
bandaid fix in the reverse proxy to work around a inheriently broken
application. For this to work it must be very carefully specified how
to identify that the request is exacly the same and should be allowed
to take over the pending result for a previously aborted identical
requests.. and when to keep waiting for responses to aborted requests
in hope that the user simply retried the same request..

The exact same problem will be seen if the application is published
directly on the Internet with no reverse proxy infront of it even for
light loads which the server can perfectly well handle on it's own.

> B) When a client hits a webserver it's more or less random what
> webserver he hits, now my application does a lot of caching so the
> first time a client hits another apache process it's a much harder
> hit than if the client had hit a resently used one.

This unfortunately is a bit harder to do anything about.. Squid have
no means of indicating which web server process should accept the
request on the same port..

If the connections are kept alive then sure, can be done. Such binding
of server connections to client (users or connections) is also needed
for proxying of NTLM authentication and is of interest. Will however
increase the demand on your backend servers as more connections will
be needed between Squid and the web server.. but maybe a good balance
can be found allowing sharing of connections and keeping a good user
locality per connection. Intuitively however I feel this is better
solved by having a better per-user information cache in the
application on the web server, and be persistent about which web
server (if you have more than one) each user is sent to.

The latter (per user persistent selection of web server from a farm of
servers) can be implemented in many ways. In our eMARA reverse proxy
we have a simple weighted hashing based sheme using either username
or source IP address as key which has proven very effective for the
purpose of always sending the same user to the same web server in a
farm of web servers.

> C) When the backlog is long enough clients will get impatient and
> abort the connection, but squidie seems more than happy to keep
> serving the request (I don't quite know if this is true or the
> clients just give up when the request is being run).

Well, this actually already have a partial solution. See squid.conf.
(hint: half_closed_clients)

Also related to 'A' I think.

> D) Almost 100% of the content on the site is dynamically generated,
> the only static bits are css files and a tiny bit of graphics on
> very few pages, so very few different requests will be cache hits,
> so all this writing everything to disk business seems a litte
> wasted.

Objects which are not cacheable should not be written to disk. What
makes you think they are? I have seen no evidence that they are in
all my hacking on Squid..

> B) When users are identified by their session id it's relatively
> easy to maintain a list of the 5-10 latest server processes that
> the client has talked to (this calls for the server connections to
> be kept alive, but squid already does this, right?). The number of
> open server connections will need to be limited, I havn't found
> that option anywhere.

Yes, squid keeps connections persistent where possible within HTTP/1.0
+ keep-alive. By default all open server connections act as a pool
where requests can be forwarded to that server, and only if there is
no idle connection Squid opens a new connection.

Regards
Henrik
Received on Thu Feb 20 2003 - 14:06:26 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:19:16 MST