Re: [patch] refresh bugs

From: Henrik Nordstrom <henrik@dont-contact.us>
Date: Tue, 02 May 2006 11:29:28 +0200

tis 2006-05-02 klockan 15:52 +1200 skrev Doug Dixon:

> RFC2616 (13.5.3) says that ALL end-to-end headers in the 304 should
> override those in the cached entry

Yes.

> In the patch I supplied, the existing entry's headers are always
> overridden on receipt of a 304 from the origin server. This is done
> with a call to HttpReply::updateOnNotModified() which in turn calls
> httpHeaderUpdate() with a Denied304HeadersMask that says which
> headers should not be updated.

Unfortunately this only gets it partway done. This updates the in-memory
header struct, but isn't recorded in the persistent cache. So the update
is only remembered for as long as the object remains in memory. After
that only the timestamp updates is remembered.

In 2.5 it's even worse, there the in-memory header struct is not used
and the update actually gets nowere..

We need to respool the updated headers to the cache on update, not only
the memory object.

> However..... ulgh... looking at it further, the headers in the
> Denied304HeadersMask are completely wrong. Looks like it allows
> updating of many hop-by-hop headers (by not denying them), and
> prevents updating of many end-to-end headers. In fact, all it does is
> PREVENT end-to-end headers being updated that SHOULD be updated!
> Behold evil:
>
> static http_hdr_type Denied304HeadersArr[] =
> {
> HDR_ALLOW, HDR_CONTENT_ENCODING, HDR_CONTENT_LANGUAGE,
> HDR_CONTENT_LENGTH,
> HDR_CONTENT_LOCATION, HDR_CONTENT_RANGE, HDR_LAST_MODIFIED,
> HDR_LINK,
> HDR_OTHER
> };
>
> The only reason my patch appears to work is that Cache-Control,
> Expires and Date are mercifully not listed here!

Most of the above (save for possibly HDR_OTHER which lumps together
anything unknown) is entity headers, which MUST/SHOULD NOT be seen in a
304 response (SHOULD NOT if there was a strong cache validator, MUST NOT
if there only was a weak validator). See 10.3.5.

Updating these on a cached entry would be dangerous, as it's quite
likely they are not correct if seen in a 304 response. This is most
obvious for the Content-Length header which would be disastrous to allow
updates of.

The exception is Last-Modified which while technically MUST NOT be seen
in a 304 while quite often is, and also quite often is synthetic to hint
to HTTP/1.0 cache heuristics how long the object may be kept fresh.. but
as discussed a few days ago perhaps this falls under the second last
paragraph there..

> How about renaming it and put all the hop-by-hop headers (as defined
> in section 13.5.1 of the spec) in it?

Agreed there needs to be a clear-cut hop-by-hop definition. But we also
needs to be careful about which headers are updated by a 304 and when
(in future) joining of partial responses.

> HDR_KEEP_ALIVE
> HDR_TE
> HDR_TRAILERS

Unrelevant sidenote: Keep-Alive technically doesn't really need to be
there as it's always used together with Connection: Keep-Alive, but it
doesn't hurt to have it there.

> As I say, the HTTP headers should be updated, not just the store
> entry's attributes.

And what I am saying is that the timestamp attributes and their use
needs to be reviewed. Last time I looked at this the freshness algorithm
used by Squid diverts quite a bit from the RFC on time drift. Also there
is several cache-controls we should use but don't do yet..

> In any case, I'm assuming the timestamp issue is not limited to
> refreshes, and so should be fixed separately. Is there an open bug
> for this btw?

It's not very meaningful to look at until the headers have been fixed..
too easy to get sidetracked mixing the two problems otherwise. The
current timestamp management is kind of tailored as an approximation of
and HTTP/1.0 best effort without header updates. When there is header
updates I think a much cleaner timestamp model can be used.

Regards
Henrik

Received on Tue May 02 2006 - 03:29:46 MDT

This archive was generated by hypermail pre-2.1.9 : Thu Jun 01 2006 - 12:00:04 MDT