Re: Introduction / accelerator feature ideas from Robert Collins on 2003-02-21 (squid-dev)

From: Robert Collins <robertc@dont-contact.us>
Date: 21 Feb 2003 22:30:35 +1100

On Fri, 2003-02-21 at 06:03, Flemming Frandsen wrote:

> A short recap of the problems:
> A) Race conditions exist in the webapplication (not that uncommon I
> guess) that means that having two identical requests running at the same
> time in different apache processes will either result in one of them
> blowing up or simply returning the wrong result.

I certainly hope it's uncommon! Squid really isn't the point to fix race
conditions: syncronisation is your friend, in the server. That or a
loosely coupled approach to your server processes that avoids races.

Anyway, assuming you want to shoe-horn squid into solving this for you,
I'll try to not duplicate what Henrik has already said :}.

> B) When a client hits a webserver it's more or less random what
> webserver he hits, now my application does a lot of caching so the first
> time a client hits another apache process it's a much harder hit than if
> the client had hit a resently used one.

Do you mean webserver as in ip X, ip Y, or as in apache forked() X,
apache forked() Y.

> C) When the backlog is long enough clients will get impatient and abort
> the connection, but squidie seems more than happy to keep serving the
> request (I don't quite know if this is true or the clients just give up
> when the request is being run).

squid will stop serving once write() returns an error.

> D) Almost 100% of the content on the site is dynamically generated, the
> only static bits are css files and a tiny bit of graphics on very few
> pages, so very few different requests will be cache hits, so all this
> writing everything to disk business seems a litte wasted.

Squid offloads disk io, so writing cachable data to disk won't affect
performance much. If your app is sending non-cachable data marked as
cachable, then you have a bug!

> The solutions (if I can wrap my mind around squids guts)
> A) Add a lock on the requesting bit so a user can only have one request
> running in the webserver at any one time (users are identified by a
> session id in a cookie), this should take care of the race conditions,
> doing it in Apache is not a choice as you have already lost if you tie
> up an Apache process.

This seems very painful to me - you will slow down graphics as well as
database pages.

> B) When users are identified by their session id it's relatively easy to
> maintain a list of the 5-10 latest server processes that the client has
> talked to (this calls for the server connections to be kept alive, but
> squid already does this, right?). The number of open server connections
> will need to be limited, I havn't found that option anywhere.

There isn't an option for limiting the parallelism to the upstream in
squid today, AFAIK. There may be in the recent rproxy merges to HEAD
though.

> C) Maybe it would be possible to keep the client from disconnecting
> while waiting for the request to complete, the only way I can think of
> is to send the http header and keep appending a character to a special
> header (like X-calm-down-beavis).

This won't work. If you have *any* downstream proxies, they are unliekly
to start sending the response to their clients until they get a complete
HTTP header set. So, the use will get no response from that proxy, and
will either hit F5, click the link again, or their browser may
resubmit..

> D) Writing every response to disk seems like a big waste of time and
> file descriptors when almost none of the response are ever going to be
> needed again.

This has already been covered, but bears mentioning again:
Squid ONLY EVER writes CACHABLE responses to disk - and a cachable
response is by definition, able to be used again.

Now, for a point by point review of your web page solution points:

Request is recieved by the scheduler and put in a RAM buffer.
The scheduler sends the HTTP1.1 header and starts sending the
"X-hold-on" header to the client, the cache will append a character to
that header once pr 10 seconds to keep the connection alive and make
sure that an error occurs if the client gives up.

* See above - this will not achieve the results you appear to want.

If the request can be satisfied from the cache (only static content is
cached) then the output process is set up to feed content from there.
If the request needs to hit a webserver then it is put in a run queue of
requests to serve.

* This is exactly what happens today.

When a webserver becomes available or a job is inserted in the run
queue, then the request is fed to the apache (however only one apache
may be running requests from a single session at any one time, so if a
session already has a running request it is bypassed when selecting the
job from the queue).

* This needs two new (and worthwhile) concepts in squid -
1) session ID awareness, and
2) an access list for allowing connection reuse on a
per-forwarding-attempt basis.

When the apache returns the result it is first pulled out of the
webserver, the webserver is returned to the pool of idle webserver
connections and an output process is started to send the content to the
client.

* Ah, squid doesn't start new processes :}. Anyway, this is exactly what
squid does today, with one exception: squid doesn't read the entire
object in advance of the client - it only reads a few kb ahead - to
avoid huge memory races. This is tunable IIRC.

When an output process finishes it's work (either due to error or
normally) it checks the cachability of the reponse it served and removes
it unless it's supposed to stay in the cache.

* Again, this is *exactly* what squid does today.

When a request is put on the runqueue the system first checks to see if
there is a webserver connection available right now, the connections are
sorted so the one most resently used by the client is checked first. If
no webserver is available and we are not at the limit then we start a
new webserver connection and feed it the request, if we are at webserver
connection limit the request is left in the queue until a webserver
becomes available.

* Other than limiting the number of upstream connections (which may be
done as I already mentioned), this is already what squid does.

Only responses that are cachable will ever get an entry in the cache
table.

* This is what squid does.

Rob

-- 
GPG key available at: <http://users.bigpond.net.au/robertc/keys.txt>.

application/pgp-signature attachment: This is a digitally signed message part

Received on Fri Feb 21 2003 - 04:30:50 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:19:16 MST