RE: range request cache

From: Zhu, Shan <Shan.zhu_at_ccur.com>
Date: Thu, 1 Mar 2012 18:09:02 -0500

Thanks Alex and Amos for your quick response.

Including the range request for cache key calculation sounds more generic than hacking the range into the object file names. I am moving toward this direction. However there may be an ambiguity problem.

Suppose we have Squid to be able to cache store the range, we may very likely have the following two contents cached, for example, if we get multiple HTTP requests with and without range requests.
1. A whole object file
2. A range of the object file

Now if we receive another HTTP request with the same range of the same object file, we need to decide whether to respond with stored content 2 or to respond with the range from stored content 1, as 1 contains 2.

I haven't found a good clue to solve this problem. Any idea?

-----Original Message-----
From: Alex Rousskov [mailto:rousskov_at_measurement-factory.com]
Sent: Thursday, February 23, 2012 12:02 PM
To: Zhu, Shan
Cc: squid-dev_at_squid-cache.org
Subject: Re: range request cache

On 02/21/2012 05:09 PM, Zhu, Shan wrote:
> I have an urgent need for caching requested ranges so I want to do a quick "hack" on this topic before the new feature becomes available.
>
> What I want to achieve is for Squid to cache a range request without pre-fetching and caching the whole object, so that if the cached range is requested again it can be served from the cache.
>
> What I want to do is to change the URL with range request into a unique file name, and once the response is received from the back-end server, the response can be cached as a single object.
> The workflow should be like this,
>
> (1) Change URL with range request into something like: "[original URL]_[range start]_[range end]", internally to Squid only.
>
> (2) Squid checks cache to see if it is cached. If yes, Squid responds to the client with the cached object. If no, go to Step 3.
>
> (3) When it is a cache miss in Step 2, Squid forwards the original URL to the back-end server, with range request, as normal. (Not to pre-fetch of = the whole object file.)
>
> (4) Squid receives the response from back-end server, for the requested range only, recognizes it as the response corresponding to the changed URL = "[original URL]_[range start]_[range end]".
>
> (5) Squid caches the range according to the changed URL "[original URL]_[range start]_[range end]", like a single object file.
>
> (6) Squid responds to the client for the URL with range request, as normal.
>
> Here I may have simplified the problem and omitted the time-stamp issue, etc.
>
> Is this doable? How difficult would it be? Can I get any suggestion on how to proceed? I am starting from scratch on the source code change.

Adding support for range caching is doable, of course. It is a difficult project though, even if you limit that support to the absolute minimum.

I am not sure changing the URL is the best or even easiest way forward.
Instead, I would try to change how cache key is computed by adding Range information to the hashing function and then adjust the "does the cached store entry match the request" code to account for Range request headers. As Amos has mentioned already, looking at Vary support may be helpful here. You can kind of treat Range responses as having an implicit "Vary: Range" header.

I suspect the most difficult parts would be to correctly adjust code responsible for computing expected/actual/maximum entry size to respect Content-Range limits and adjust swap code to expect/write/read the right number of bytes. The response size-related code is really messy. Squid
v3.2 has some improvements in that area, but we are still a long way from a good API.

HTH,

Alex.
Received on Thu Mar 01 2012 - 23:09:06 MST

This archive was generated by hypermail 2.2.0 : Fri Mar 02 2012 - 12:00:10 MST