Re: New functionality: Caching PUT bodies

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Wed, 19 Feb 2014 14:02:15 -0700

On 02/19/2014 12:42 PM, Rajiv Desai wrote:
> On Wed, Feb 19, 2014 at 11:09 AM, Alex Rousskov
> <rousskov_at_measurement-factory.com> wrote:
>> On 02/19/2014 03:11 AM, Rajiv Desai wrote:
>>> I am interested in adding functionality to squid to optionally add
>>> objects from PUT requests to cache. Has there been any related work
>>> done in the past or is being pursued currently that I can use as
>>> reference?
>>
>> Just to make sure we are all on the same page, do you want Squid to take
>> the body of a PUT request and store it in the cache so that subsequent
>> GET requests for the same URI will result in a cache hit?
>
> Yes.
>
>> If yes, what response headers do you want Squid to use when caching that PUT body?
>>
>
> The GET response only requires Content-Length to be accurate. Other
> time values can use Date from the PUT request header.
> The expiry time does not matter but can be set to a very large value
> (never expires).

I believe the cached PUT entity should get entity headers from the PUT
request, reusing them as response entity headers. RFC 2616 Section 9.6
seems to suggest that.

>> Will the PUT body contain response headers?
>>
> The PUT body does not contain response headers. It simply contains the object.
> PUT header has the following :
>
> PUT /mag-1363987602-cmbogo/c9e935e0-10812585 HTTP/1.1
> Host: s3-us-west-1.amazonaws.com
> Accept: */*
> Content-MD5: o8VChHm6LUVSQNSFg57DSA==
> Content-Type: application/octet-stream
> Date: Wed, 19 Feb 2014 19:30:19 GMT
> Content-Length: 10256
> Expect: 100-continue
>
>
>> What is your use case? That is, why do you want this feature?
>>
>
> I currently use squid as a caching gateway (forward proxy) for
> uploads(PUTs) and downloads(GETs) to/from an object store (eg: AWS
> S3).
> In a branch office when one client uploads content, other clients (or
> even the same client) should be able to fetch content from the squid
> cache to accelerate downloads.
> These objects are typically 64KB in size and are immutable so no
> freshness/expiry checks are required. So, if a PUT request is accepted
> by the server, the object uploaded should be cached by squid and
> subsequent GETs for these objects should be HITs.

Thank you for detailing your use case.

I believe this can be supported, but it will not be easy. You probably
should add write-to-store support to the Squid HTTP server (the code
currently residing in client_side*cc and related files) but all of the
examples doing so live in Squid HTTP clients (the code currently
residing in Server.cc and http.cc). Yes, I know this sounds backwards.
It will take some effort to extract reusable code (if any) into a class
and use that class in servers and clients, but it is possible.

The alternative approach is to add write-request-body-to-store to Squid
client code that already deals with writing response to store. However,
I believe that doing so will be even more confusing and, technically,
wrong because the next hop may not even be HTTP in some use cases. You
should store the body as Squid gets it from the HTTP client, not when
Squid forwards it to the next hop.

I am not aware of any existing code in that direction, but you should
double check by searching old postings and Squid2 change logs. I know
this question has been asked several times before (and some recent
answers contradict my suggestions in this email :-).

If you decide to code this feature, you may want to start by looking at
ServerStateData::setFinalReply() and ServerStateData::storeReplyBody().
Those two methods and ALL,9 cache.log analysis when caching a simple
response may help you find most of the necessary Store APIs. Again,
handling all corner cases correctly is not going to be easy.

HTH,

Alex.
Received on Wed Feb 19 2014 - 21:02:26 MST

This archive was generated by hypermail 2.2.0 : Thu Feb 20 2014 - 12:00:13 MST