Re: What is The logic of Vary Headers cachiness?

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Wed, 24 Jul 2013 10:01:50 -0600

On 07/24/2013 03:43 AM, Eliezer Croitoru wrote:
> On 07/17/2013 02:34 PM, Amos Jeffries wrote:
>> On 17/07/2013 10:29 p.m., Eliezer Croitoru wrote:
>>> As I have seen some issues that "indicate" that the Vary headers cause
>>> some problems while caching..
>>>
>>> I want to make sure I understand how a Vary headers should be treated
>>> before diving into some code.
>>
>> Um, the "shodul be" rather than what Squid is currently doing with its
>> bugs...
>>
>> On request:
>> * lookup the URL / store-ID in the index
>> ==> finds the x-vary-* object created by Squid for storeing the
>> vary_headers detail
>
> Here or one step before it should only look for the hash of the url and
> then open the file if exists and verify if it's a vary object at all.

That is what Squid does today, bugs notwithstanding. If the found store
entry is the special "Vary" entry, then Squid does another lookup, with
the appropriate header values added to the store key hash.

> If the server has vary then we can still serve non-vary objects but
> first make sure that there is a reason to serve Vary like object from
> origin rather then just invalidate any cache option exists by Bogus Vary
> headers.

You lost me here. The initial lookup described above is unavoidable. At
the first lookup time, Squid does not yet know whether the cached
response(s) have Vary or not, so Squid has to use a regular lookup (no
Vary-controlled header values added to the store key).

Instead of returning a special Vary object during this first lookup,
Squid could return a regular cached object (one of the Vary variants),
with some special Vary flag set, but that is not how the code was
written. The current implementation is probably a lot simpler, albeit
probably not the most efficient than the alternatives.

> From my point of view on the code and after coding StoreID I know that
> there are two lookups for HEAD and GET and they are not the same object
> So why not just use 3 object level checking??
> This will add a little overhead on the cpu but not on memory since the
> objects are already there.
> As the example of in a 16 cores system there is more CPU then squid.. we
> can afford this one lookup and forget about loosing performance.
> Objections?

I am afraid I do not know what you mean by a "3 object level checking".
IIRC Squid does not lookup HEAD and GET for the same request under
normal conditions. Lookups with multiple methods happen for HEAD
requests and for purging. Both categories are relatively rare.

If you are suggesting that Squid starts with a lookup using Vary-listed
request header values, please note that Squid does not know which
request headers are under Vary control. Squid gets that information from
the special Vary object which is found during the first regular lookup.

Finally, as we are migrating to per-cache store indexes, more store
lookups should be avoided when possible because the number of mandatory
lookups has to be multiplied by (the number of cache_dirs plus one for
the memory cache index) to check all the indexes.

HTH,

Alex.

>> * looks up the URL+vary / store_id+vary in the index
>> ==> finds the real object
>> ==> deal with caching using that variants response headers and the
>> request headers, same as any non-variant object would be handled if the
>> x-vary step had not taken place.
>>
>> Also, I think if the variant needs to be invalidated Squid currently
>> coded to drop all variants and/or the main x-vary-* stub object during
>> revalidation. The HTTP/1.1 specs need to be checked to see if that is
>> right or if only the one variant object should be invalidated and the
>> others left for later requests to alter.
>>
>>
>>>
>>> My assumption is that there are "Vary" headers that the servers might be
>>> considering while answering the request.
>>>
>>> If the above is indeed right and my assumption of how squid caches vary
>>> headers There is a problem either in finding the corresponded responses
>>> or something else.
>>> Please help me understand how is the logic of The Vary headers works.
>>>
>>> Let say I have a requeset:
>>> ----
>>> GET /resource.dll HTTP/1.1
>>> Host: example.net
>>> Accept-Encoding: *.*
>>> -----
>>>
>>> the corresponding request will be:
>>> ----
>>> GET /resource.dll HTTP/1.1
>>> Host: example.net
>>> -----
>>>
>>> since there is no Vary field that indicates there is a Vary header
>>> present in the request, or the existence of a so called "Vary" header
>>> like "Accept-Encoding" is like stating "consider this vary header"?
>>
>> The request *never* contains any indication of a Vary header. This is
>> why the stub x-vary-* object exists. Only after Squid looks up its cache
>> index and finds that x-vary stub object does it know that variants exist
>> on this object and the x-vary objects "vary_headers" details tells it
>> how to look up the particular variant needed by this request.
>>
>>> if the request will be the same then OK.
>>> if the response is not the same then we have a problem.
>>> Would we calculate the Vary based on the request only or based on also
>>> the response?? and then when the response is not matching the request
>>> what will be the hash of the request?
>>
>> Calculated based on the request headers (only) using the key details
>> stored in x-vary fake response object.
>>
>> Amos
>>
Received on Wed Jul 24 2013 - 16:02:10 MDT

This archive was generated by hypermail 2.2.0 : Thu Jul 25 2013 - 12:00:11 MDT