Re: Store_url_rewrite for squid 3+

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sun, 09 Sep 2012 20:49:27 +1200

On 9/09/2012 4:19 a.m., Alex Rousskov wrote:
> On 09/07/2012 09:13 PM, Amos Jeffries wrote:
>
>> Also, any revalidation requests done later must be done on the
>> original request URL. Not the stored URL nor the potentially different
>> current client request URL.
> This sounds like a very important point that could justify storing the
> original request URL -- exactly the kind of information I was asking
> for, thank you!
>
> Why do we have to use the original request URL for revalidation instead
> of the current one? We use current, not original request headers (we do
> not store the original ones), right? Is it better to combine current
> headers with the original URL than it is to use the current URL with
> current headers?

Revalidation requires very precise variant targeting to ensure updated
headers received from the revalidation is not corrupting the cached
object copy. Regardless what people may think the YouTube URLs and other
sites being de-duplicated with store-url *are* actually pointing at
different files on different servers with potentially different hashes
or encoding details. Particularly in the cases where the HD and standard
definition variants of a video are store-url mapped to the same cache
object.

The URL and ETag are both critical details to preserve here. Also,
anything else which is used for specific Squid->upstream identification
of the resource being revalidated.

> The store URL rewriting feature essentially assumes that any request URL
> that maps to URL X is equivalent and, hence, any response to any request
> URL that maps to URL X is equivalent. Why not use that assumption when
> revalidating? If we receive a 304, we can keep using the stored content.
> If we receive new response content, should not we assume that the stored
> content [under the original URL] is stale as well?

Assumes is the right word. They are equivalent only in the proxy
administrators thoughts. Which may be wrong or right. We have to let
them be wrong sometimes and cause clients display problems, but we
should not let them cause local cache corruption with revalidation
updating cached objects meta data from incorrect variant sources.

>
> Again, I am not trying to say that using original URL for revalidation
> is wrong -- I am just trying to understand what the design constraints are.

We could simply re-fetch and store a new copy from the new client
request details. Revalidation is an optimization, but requires correct
identification of the particular resource and variant we have in cache.
That goes for anything in cache, store-url is just tricky in that the
client-side request can't present us the accurate details for server-side.

>
> Thank you,
>
> Alex.
> P.S. The above still does not justify storing the rewritten URL(s), of
> course.

No. I think those are only useful for key purposes and can be discarded
once the object in cache is located for a HTI, or stored fro a MISS.

Amos
Received on Sun Sep 09 2012 - 08:49:37 MDT

This archive was generated by hypermail 2.2.0 : Sun Sep 09 2012 - 12:00:05 MDT