Re: [RFC] post-cache REQMOD

From: Tsantilas Christos <chtsanti_at_users.sourceforge.net>
Date: Fri, 11 Jul 2014 20:46:58 +0300

On 07/11/2014 05:47 PM, Alex Rousskov wrote:
> On 07/11/2014 05:27 AM, Tsantilas Christos wrote:
>
>> The PageSpeed example fits better to a post-cache RESPMOD feature.
>
> I do not think so. Post-cache RESPMOD does not allow Squid to cache the
> adapted variants. Please let me know if I missed how post-cache RESPMOD
> can do that.

I did not read correctly the problem you want to solve. I had in my mind
a proxy which cache original content and then adapts the cached content
according client rules.
But you want to cache adapted content.

However still I am not sure I can understand how the post-cache reqmod
will help.
Assume the following scenario:
    - Client A requests original web page
    - Client B requests optimized web page (removed spaces and comments)

I am expecting a solution which will store to cache two copies of the
web page, the optimized and the original copy.
A solution on this is can be to use a mechanism similar to the vary
headers, for example define a ICAP header which should included to vary.
I did not look to storeID feature but probably can be used for the same
purpose.

>
> The key here is that PageSpeed and similar services want to create (and
> cache) many adapted responses out of a single virgin response. Neither
> HTTP itself nor the Squid architecture support that well. Post-cache
> REQMOD allows basic PageSpeed support (the first request for "small"
> adapted content gets "large" virgin content, but the second request for
> small content fetches it from the PageSpeed cache, storing it in Squid
> cache). To optimize PageSpeed support further (so that the first request
> can get small content), we will need to add another generally useful
> feature, but I would rather not bring it into this discussion (there
> will be a separate RFC if we get that far).

Probably I did not understand well how the PageSpeed works or what a
PageSpeed cache means. But in the above scenario squid looks that will
store only one version of the content (the small content).
Is this the only required?
What am I missing?

>
> The alternative is to create a completely new interface (not a true
> vectoring point) that allows an adaptation service to push multiple
> adapted responses into the Squid cache _and_ tell Squid which of those
> responses to use for the current request. While I have considered
> proposing that, I still think we would be better off supporting
> "standard" and "well understood" building blocks (such as standard
> adaptation vectoring points) rather than such highly-specialized
> interfaces. Please let me know if you disagree.
>
>
>> Is
>> the post-cacge REQMOD just a first step to support all post-cache
>> vectoring points?
>
> You can certainly view it that way, but I do not propose or promise
> adding post-cache RESPMOD :-).
>
>
> Thank you,
>
> Alex.
>
>
>
>> On 07/11/2014 01:15 AM, Alex Rousskov wrote:
>>> Hello,
>>>
>>> I propose adding support for a third adaptation vectoring point:
>>> post-cache REQMOD. Services at this new point receive cache miss
>>> requests and may adapt them as usual. If a service satisfies the
>>> request, the service response may get cached by Squid. As you know,
>>> Squid currently support pre-cache REQMOD and pre-cache RESPMOD.
>>>
>>>
>>> We have received many requests for post-cache adaptation support
>>> throughput the years, and I personally resisted the temptation of adding
>>> another layer of complexity (albeit an optional one) because it is a lot
>>> of work and because many use cases could be addressed without post-cache
>>> adaptation support.
>>>
>>> The last straw (and the motivation for this RFC) was PageSpeed[1]
>>> integration. With PageSpeed, one can generate various variants of
>>> "optimized" content. For example, mobile users may receive smaller
>>> images. Apache and Nginx support PageSpeed modules. It is possible to
>>> integrate Squid with PageSpeed (and similar services) today, but it is
>>> not possible for Squid to _cache_ those generated variants unless one is
>>> willing to pay for another round trip to the origin server to get
>>> exactly the same unoptimized content.
>>>
>>> The only way to support Squid caching of PageSpeed variants without
>>> repeated round trips to the origin server is using two Squids. The
>>> parent Squid would cache origin server responses while the child Squid
>>> would adapt parent's responses and cache adapted content. Needless to
>>> say, running two Squids (each with its own cache) instead of one adds
>>> significant performance/administrative overheads and complexity.
>>>
>>>
>>> As far as internals are concerned, I am currently thinking of launching
>>> adaptation job for this vectoring point from FwdState::Start(). This
>>> way, its impact on the rest of Squid would be minimal and some adapters
>>> might even affect FwdState routing decisions. The initial code name for
>>> the new class is MissReqFilter, but that may change.
>>>
>>>
>>>
>>> The other candidate location for plugging in the new vectoring point is
>>> the Server class. However, that class is already complex. It handles
>>> communication with the next hop (with child classes doing
>>> protocol-specific work and confusing things further) as well as
>>> pre-cache RESPMOD vectoring point with caching initiation on top of
>>> that. The Server code already has trouble distinguishing various content
>>> streams it has to juggle. I am worried that adding another vectoring
>>> point there would make that complexity significantly worse.
>>>
>>> It is possible that we would be able to refactor/encapsulate some of the
>>> code so that it can be reused in both the existing Server and the new
>>> MissReqFilter classes. I will look out for such opportunities while
>>> trying to keep the overall complexity in check.
>>>
>>>
>>> Any objections to adding post-cache REQMOD or better implementation
>>> ideas?
>>>
>>>
>>> Thank you,
>>>
>>> Alex.
>>> [1] https://developers.google.com/speed/pagespeed/
>>>
>
>
Received on Fri Jul 11 2014 - 17:47:11 MDT

This archive was generated by hypermail 2.2.0 : Sat Jul 12 2014 - 12:00:13 MDT