Re: [RFC] Have-Digest and duplicate transfer suppression

From: Henrik Nordström <henrik_at_henriknordstrom.net>
Date: Mon, 15 Aug 2011 23:17:55 +0200

mån 2011-08-15 klockan 09:50 -0600 skrev Alex Rousskov:

> I do not like aborted retrievals as the default method of handling a
> digest-based hit. Aborted transactions have negative side-effects and
> some of those effects are not controlled by Squid (e.g., monitoring
> software may trigger an alert if too many requests are aborted).
>
> I agree that we can switch from entities to instances, provided we are
> OK with excluding 206, 302, and similar non-200 responses from the
> optimization. By instance definition, Squid would not be able to compute
> or use an instance digest if the response is not 200 OK. We can hope
> that the vast majority of non-200 responses are either not cachable or
> are very small and not worth optimizing.

The bulk bandwidth where you would find duplicates is in positive GET
responses.

Not being able to support 206 duplicate detection without caching the
full 200 in the "topmost" cache is a little annoying however.

> > In requests you can optionally add an digest based condition similar to
> > If-None-Match but here If-None-Match already serves the purpose quite
> > well, so use of the digest condition should probably be limited to cases
> > where there is no ETag.
>
> Or to cases where ETag lies about response content changes.

True, but I kind of doubt there is much bandwidth to be found in those
cases.

> > To optimize bandwidth loss due to unneeded transmission a slow start
> > mechanism can be used where the sending part waits a couple RTTs before
> > starting to transmit the body of a large response where an instance
> > digest is presented. This allows the receiving end to check the received
> > instance digest and abort the request if not interested in receiving the
> > body.
>
> Besides my general dislike for aborted transactions becoming a norm (see
> above), "a couple RTT" delay is a high price to pay because each RTT is
> a few seconds already.

Seconds? What kind of network is this?

I see now that there was one condition omitted from the above, "if the
body size is above a certain limit".

> I am not interested in supporting case (A) at this time (in part for the
> same reasons you mention above), but others might be.

I think for case (A) sending the digest is good enough.

> We wanted to be able to optimize transfer of non-200 responses and be
> able to update headers. In other words, we wanted the child cache to be
> able to restore the exact origin server response, including status code
> and header details.

Ok. that would be a bit beyond a normal cache then.

> Fixing relevant parts of If-None-Match support can indeed be a part of
> the project. I do not think we can do (A), but I agree that it would be
> nice if our solution for (B) can be, at least theoretically, extended to
> (A) later.

Just making sure there is a digest is a far step along the path of
enabling (A).
 1. In makes it possible to track exactly how much redundant data is
being seen at a given node.
 2. It enables implementation of the scheme I proposed for dealing with
(A).

The mentioned delay is an optimization, not a requirement.

> I agree that the original proposal for (B) should not cause
> Reload-into-IMS-problems. However, if we go the adjusted route of 304
> responses and ignoring certain origin server response headers, we may
> create similar, albeit less likely problems: The client will receive the
> right message body with wrong/stale headers because 304 responses
> prohibit inclusion of certain headers.

We can easily extend HTTP via a cache-control or similar (including out
of band configuration setting) to allow caches to always send full 304
responses in digest based 304. digest conditions are strong validators
and thus it's just a recommendation to not send entity headers, not a
requirement. Sending additional headers is not a violation, just not
recommended by default (SHOULD NOT, not MUST NOT)

> > If the conditional GET used a strong cache validator (see section
> > 13.3.3), the response SHOULD NOT include other entity-headers.

Regards
Henrik
Received on Mon Aug 15 2011 - 21:18:01 MDT

This archive was generated by hypermail 2.2.0 : Tue Aug 16 2011 - 12:00:03 MDT