Re: [RFC] Have-Digest and duplicate transfer suppression

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Thu, 11 Aug 2011 12:12:50 -0600

On 08/11/2011 12:53 AM, Amos Jeffries wrote:
> On 11/08/11 16:58, Alex Rousskov wrote:
>> On 08/10/2011 09:29 PM, Amos Jeffries wrote:
>>> On Thu, 11 Aug 2011 15:12:56 +1200, Amos Jeffries wrote:
>>>> On Thu, 11 Aug 2011 11:09:38 +1200, Robert Collins wrote:
>>>>> (But for clarity - I'm fine with what you proposed, I just wanted to
>>>>> consider whether the standards would let us do it more directly, which
>>>>> they -nearly- do AFAICT).
>>>>>
>>>>> -Rob
>>>>
>>>> Same. I don't mind this type of extension ...BUT...
>>>>
>>>> I think fixing bug 2112 (lack of If-None-Match support) and bug 2617
>>>> (wrong ETag validation handling) should be done first before any
>>>> extensions are tried. That will allow you to see who much of a problem
>>>> (or not) the potential failure cases actually are in practice.
>>>>
>>>> Amos
>>>
>>> Want-Digest: and Digest: validation mechanism from
>>> http://tools.ietf.org/html/rfc3230 covers the remainder of the proposal.
>>> So no custom extensions needed to meet all the requirements.
>>
>> We did reuse a few ideas from that part of the larger Jeff Mogul's work
>> I mentioned earlier, but I believe RFC 3230 Digest and Want-Digest
>> headers differ from what is being discussed here:
>>
>> - Their digests are for instances while our digests are for entities.
>
> I fail to see where entity MD5 can be better than instance MD5.

If I got the definitions right, the instance MD5 is useless when the
instance differs from the cached or response entity: The child cache
cannot restore the response even if the instance did not change (but
entity has) and the parent cache cannot compute the instance checksum
because all it has is a potentially new entity.

In most cases, the instance is going to be the same as entity, of
course, but the cache may not really know whether both are the same in
corner cases. This may not matter today, but as Squid becomes smarter
about partial caching, the difference will become critical.

> I would think there is a good case for Squid universally supplying MD5
> or SHA or both on all HIT/200. It can be (already is?) stored in the
> meta data for fast validation.

Computing the checksum is expensive and not all store modules support
updating metadata after the last byte is received. Moreover,
intermediaries MUST NOT set Content-MD5 and similar end-to-end checksum
headers.

>> - Their headers are end-to-end while ours are hop-by-hop.
>
> By my reading there is no end-to-end requirement.

All HTTP headers are end-to-end by default. IMO, they should have
specified that explicitly, but they did not. Reading their discussion
about content validation and related checks, I am inclined to think they
were thinking end-to-end.

> Leaking the Digest: reply header when used as per the spec is no loss
> and some potential benefit.
> Leaking a Digest: request header

IMO, just like with Content-MD5, intermediaries MUST NOT set end-to-end
checksums or such checksums would become useless for what they were
designed for.

>> - AFAICT, their Digest header is meant mostly for responses, while our
>> If-None-Match or Have-Digest header is used in requests.
>
> Yes the most obvious use is in responses. However I do not see anything
> actually mentioning a request/reply direction in the RFC.

It is not clear indeed, but I suspect they meant that a Digest in the
request would apply to the request body rather to the [cached] response
body that the request is refreshing.

> IMO the case for Have-Digest is a good case for Digest: request header.
> (and a few bytes less to salve the bloat complaints). Used as a (strong)
> conditional request validator it makes the first hop which can satisfy
> it the "end".
> ie if we implement this in Squid, how is the parent proxy to now that
> its own parent in turn wont assist with the optimization when it can't?
>
> There is no guarantee that any recipient parent will act on it to
> produce 304 instead of 200. So worst-case is the status quo.
>
> If you insist on this being hop-by-hop you are always free to add it to
> the Connection: header to be stripped at the next hop.

IMO, we should not violate the intent of the RFC authors when reusing
their stuff. If we do not agree what their intent was, we can ask them
to clarify, but even that is not 100% safe as there might be competing
implementations out there that would use a different interpretation. For
small things, it may not matter much, but end-to-end guarantees are no
small things, IMO.

>> Want-Digest or a similar support advertisement is wasteful in the common
>> case, but is also useful to prevent sending If-None-Match or Have-Digest
>> requests to servers that do not understand them. This is something we
>> may want to add.
>
>
> So what you want the child to be sending is:
> If-None-Match: FOO
> Digest: md5=BLAH
>
> or
> Digest: md5=BLAH

If we use the If-None-Match approach, it would be just

    If-None-Match: edigest_md5=foo

or similar.

> Response from the parent would be 304 or 200 + new object.

If we use the If-None-Match approach, then the response will be 304 or
something else, just like in the RFC.

> Noting that
> in response to a Digest validated request (strong validator) we can make
> 304 can send new Expires, Cache-Control, Date, and Vary values to update
> the child object headers. Without a body.

Yes, that is common to both If-None-Match and Have-Digest-based approaches.

Thank you,

Alex.
Received on Thu Aug 11 2011 - 18:13:11 MDT

This archive was generated by hypermail 2.2.0 : Sun Aug 14 2011 - 12:00:06 MDT