Re: What is The logic of Vary Headers cachiness?

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 25 Jul 2013 22:06:08 +1200

On 25/07/2013 8:58 p.m., Henrik Nordström wrote:
> tor 2013-07-25 klockan 18:53 +1200 skrev Amos Jeffries:
>
>> Which problem specifically? that churn exists? that it can grow big +
>> churn? races between clients? or that letting it out to disk can cause
>> churn to be slooow?
> In the design used by Squid-2 there is quite a bit of churn in the
> x-vary object, and it's seen growing quite big in some extreme cases
> ("Vary: cookie" iirc).
>
> Races between clients have been seen.
>
> Also conflicts between x-vary updates and clients aborting, causing the
> new x-vary object to also be discarded making Squid forget the map, but
> that's a bug.
>
> Proper handling of cache validations is the main concern.
>
>> I have been playing with the idea of locking these into memory cache, or
>> using a dedicated memory area just for them to avoid the speed issues. A
>> specialized store for them will also allow us to isolate the
>> secondary-lookup logic in that stores lookup process - it can identify
>> the variant and recurse down to other stores for the final selection
>> using the extra key bits.
> What to use as permanent store?

The options were a disk backed mmap, or something like rock store, or
nothign at all (regenerate from existing cache scan on every startup).

> And you want to store each 304 mapping response separately so a scan can
> rebuild the map?

That should only be necessary between swap.state cleanups.

> And what about stores not having an index? IIRC we have as goal to
> optionally not have an in-memory cache index at all.

Yes. This is not a completey rounded out idea yet. They would need
something else.
If they operate like rock and build a new index on load gradually they
would still work adjusting the x-vary during that operation.

>
>> I believe that they can be generated from a disk scan and if necessary
>> we can add swap.state TLV entries for the missing x-vary meta details to
>> be reloaded quickly.
> The x-vary meta is not very small. For each request header combination
> it's
> - request header contents

I don't see why those are necessary. At the x-vary level all that is
necessary is the response details to be searched for in the request
headers. ie if x-vary says variant has "Content-Encoding:gzip" then
search for "Accept-Encoding:gzip" in request headers for a possible match.

> - timing details for validation

Only if you are doing validation using the x-vary alone, if we select
the variant then go deeper before revalidation we have the full variant
responses headers to work with including those.

> - which object variant to map to

That would be either the lookup key pattern adjustment/addition or the
explicit store+fileno details. The latter being slightly risky, but
doable and much faster with less risk when the x-vary is built from a
store-dir load scan instead of simply loaded from a long-term cache.

> And there is also a map of known object variants and their ETag values
> and also Content-Encoding, the latter to work around dynamic gzip
> brainfart in many major web servers including Apache.

If you have read the Key header specification this should be clearer. I
am thinking the x-vary acts like a list of ETag values (for ETag match),
Digest (for Digest match), and Key patterns (for Key/Vary matches) with
Vary being a special vague case of Key. It stores a set of variants
using *only* the exact details causing that variant to exist in the set,
none of the request-header garbage like entire Accept: header or ignored
field-values.

Even the Cookie case should be greatly reduced with the use of Key
header. Although that will take some time to occur. We can avoid that a
bit by hacking an omit for Vary:Cookie sites just like Varnish does,
unless they use Key to specify *which* cookie detail to inspect.

>
>> That would make them churn particularly badly on
>> startup, but avoid the necessity to store anywhere long-term, and help
>> detect obsolete variants undeleted from disk.
> The total system churn at startup is already majorly bad with both ufs
> and rock stores.. caches are growing quite large today with current disk
> & memory prices.

Sig. Yeah. The auto-generated x-vary is a nice dream. It is just an
optimization on top of the restructured layout though. The existing 2.7
design of static cached x-vary mostly works, even if it does have a few
issues.

Amos
Received on Thu Jul 25 2013 - 10:06:25 MDT

This archive was generated by hypermail 2.2.0 : Fri Jul 26 2013 - 12:01:00 MDT