Re: I would like to work into "caching of partial responses"

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 20 Apr 2012 22:09:38 -0600

On 04/20/2012 05:30 AM, Amos Jeffries wrote:
> On 20/04/2012 4:40 p.m., Paolo Malfatti wrote:
>> Hi, I would like to expand the Alex Rousskov's idea:
>>> I am not sure changing the URL is the best or even easiest way forward.
>>> Instead, I would try to change how cache key is computed by adding
>>> Range information to the hashing function and then adjust the "does
>>> the cached store entry match the request" code to account for Range
>>> request headers."
>>
>> Can I create a object that stores a list of range's hashes instead of
>> original data?

Sure, although it may be better, performance wise, to keep this
map-of-ranges metadata together with one of the ranges. This way, you
would not have to fetch two cache entries from disk to get one range. It
would complicate the storage format though. Probably best left to Phase
2 when the storage API is ready to handle object updates.

> Yes. Architecturally I would like to see the "variant pointer" object
> created in store for Vary: alternatives to becomes exactly that type of
> sub-index object. With a pair of (key-format, hash).
>
> In the range cases key-format would include the URI, ETag, some keys
> from the *response* values from Vary: details, and range.

> NP: Fileno is tempting long-term but short-term hits the problem that
> individual cache entries (fileno) can be replaced with unrelated entry
> without the variant pointer being updated. hash is slow but fine for
> initial use.

However, disappearing and corrupted ranges must be handled even if we
store full keys. Squid already checks that the store key matches the
entry when we begin to swap the entry in. That check will have to be
there for ranges as well and the range manipulation code would have to
deal with swap in failures.

A bigger problem with fileno reuse is that, conceptually, fileno belongs
to each store module. Ideally, core code should not really manipulate or
even see that internal-to-module information. If we want range handling
to be done in core, independently from store modules, then it is best to
stay away from fileno.

For example, a store module like COSS that rewrites large slices of its
storage might want to _move_ a bunch of entries, which will result in
changing their fileno.

Cheers,

Alex.

>> I want to store every request individually indexed with an hash
>> calculated with range, but I will keep track of them with another
>> object (indexed with "normal" key) wich will maintain the list of
>> ranges (and keys). That way will permit to HIT a subset of a wider
>> range, or merge ranges, etc.
>>
>> Flow can be something like that:
>>
>> Client request
>> range x-y
>> |
>> |
>> V
>> Store computes
>> the "normal" key and
>> look for it in the
>> hash table
>> |
>> |
>> V
>> if the object is a HIT and it is a
>> "list of ranges object" (1)
>> it looks in the list for a match
>> (subset, superset, whole object)
>>
>> |------> if any match,
>> | it will retrieve the corresponding(s) object(s)
>> | from disk/mem and it will send it to client
>> |
>> |------> if not, it will retrieve the range from the origin,
>> it will store it with a "range" key and it will add it
>> to the list in the (1)
>>
>> I really really appreciate if someone can tell me if it's a doable
>> idea, or if there is a best solution.
>
> Very doable. see also
>
> http://www.squid-cache.org/mail-archive/squid-dev/201203/0004.html
> http://www.squid-cache.org/mail-archive/squid-dev/201203/0001.html
>
> Amos
Received on Sat Apr 21 2012 - 04:09:51 MDT

This archive was generated by hypermail 2.2.0 : Mon Apr 23 2012 - 12:00:12 MDT