Re: [RFC] post-cache REQMOD from Amos Jeffries on 2014-07-10 (squid-dev)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 11 Jul 2014 15:12:53 +1200

On 11/07/2014 10:15 a.m., Alex Rousskov wrote:
> Hello,
>
> I propose adding support for a third adaptation vectoring point:
> post-cache REQMOD. Services at this new point receive cache miss
> requests and may adapt them as usual. If a service satisfies the
> request, the service response may get cached by Squid. As you know,
> Squid currently support pre-cache REQMOD and pre-cache RESPMOD.

Just to clarify you mean this to be the vectoring point which receives
MISS-only traffic, as the existing one(s) receive HIT+MISS traffic?

> We have received many requests for post-cache adaptation support
> throughput the years, and I personally resisted the temptation of adding
> another layer of complexity (albeit an optional one) because it is a lot
> of work and because many use cases could be addressed without post-cache
> adaptation support.
>
> The last straw (and the motivation for this RFC) was PageSpeed[1]
> integration. With PageSpeed, one can generate various variants of
> "optimized" content. For example, mobile users may receive smaller
> images. Apache and Nginx support PageSpeed modules. It is possible to
> integrate Squid with PageSpeed (and similar services) today, but it is
> not possible for Squid to _cache_ those generated variants unless one is
> willing to pay for another round trip to the origin server to get
> exactly the same unoptimized content.

Can you show how they are violating standard HTTP variant caching? the
HTTPbis should probably be informed of the problem.
If it is actually within standard then it would seem to be a missing
feature of Squid to cache them properly. We could improve better by
fixing Squid to cache more compliant traffic.

>
> The only way to support Squid caching of PageSpeed variants without
> repeated round trips to the origin server is using two Squids. The
> parent Squid would cache origin server responses while the child Squid
> would adapt parent's responses and cache adapted content. Needless to
> say, running two Squids (each with its own cache) instead of one adds
> significant performance/administrative overheads and complexity.
>
>
> As far as internals are concerned, I am currently thinking of launching
> adaptation job for this vectoring point from FwdState::Start(). This
> way, its impact on the rest of Squid would be minimal and some adapters
> might even affect FwdState routing decisions. The initial code name for
> the new class is MissReqFilter, but that may change.
>

Given that FwdState is the global selector to determine where MISS
content comes from this sounds reasonable.

I think after the miss_access tests is best position. We need to split
miss_access lookup off into an async step to be a slow lookup anyway.

>
> The other candidate location for plugging in the new vectoring point is
> the Server class. However, that class is already complex. It handles
> communication with the next hop (with child classes doing
> protocol-specific work and confusing things further) as well as
> pre-cache RESPMOD vectoring point with caching initiation on top of
> that. The Server code already has trouble distinguishing various content
> streams it has to juggle. I am worried that adding another vectoring
> point there would make that complexity significantly worse.

Agreed. Bad idea.

>
> It is possible that we would be able to refactor/encapsulate some of the
> code so that it can be reused in both the existing Server and the new
> MissReqFilter classes. I will look out for such opportunities while
> trying to keep the overall complexity in check.
>
>
> Any objections to adding post-cache REQMOD or better implementation ideas?

Just the above details about variant caching.

Amos
Received on Fri Jul 11 2014 - 03:13:06 MDT

This archive was generated by hypermail 2.2.0 : Fri Jul 11 2014 - 12:00:11 MDT