Re: Problem with squid interpretting gzipped content from Apache

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 17 Apr 2013 23:21:48 +1200

On 17/04/2013 9:58 p.m., anita wrote:
> Hi Amos,
>
> I realise this could be a development related post. I will repost it there.
> Sorry for the inconvenience.
>
> I am raising this request internally in Squid code to fetch some urls that
> is present in the already replied object.
> Say I am using a client to request fetchcompress.html (without any encoding
> set). The Squid fetches this fetchcompress.html from the origin
> server(apache) and returns it to the client.
> At the same time, it parses the fetchcompress.html to see if it has any
> prefetchable urls.
>
> In my case, fetchcompress.html has a prefetchable link compress.html.
>
> To fetch this, I setup a fake request header with "Accept-Encoding: gzip" in
> it. This is done internally by the squid code itself. I believe this is
> successfully done as I can see it in the tcpdump (refer to "Request sent to
> Apache (tcpdump)" section in my prev post).
> When I retrieve this object using a StoreClientCopy(), it gives me an empty
> object ie. Object length was 0.
>
> a) Now did this happen because I simply retrieved the object based on the
> url alone?
> b) Why is the Content-Encoding tag absent from the reply header?

Aha. The answer then is *because* you fetched the object from cache via
URL alone. This URL points at a variant resource. The Squid cache entry
for the URL alone points at that internal/vary marker object. The real
object is stored at hash location built from URL plus the
Accept-Encoding header text "gzip".

You need to handle these vary marker objects as a tool to determine that:
  a) the pre-fetcher needs to make several lookups for this URL, and
  b) the marker objects Vary: header to decide what permutations of
which headers the prefetcher should use on its followup fetches.

Here is a little sequence of client transactions and what to expect in
the store contents state for your edification:

For simplicity the start state is a new URL with no stored contents.

1) your prefetch pulls using Accept-Encoding:gzip.

Store determines MISS on the URL.

After the server responds Squid store contains
    + HASH(URL) --> vary marker
    + HASH(URL+"gzip") --> server gzipped response

2) client fetches URL using "Accept-Encoding:gzip,deflate"

Store loads the vary marker, determines MISS on the URL+"gzip,deflate"

After the server responds Squid store contains
    + HASH(URL) --> vary marker
    + HASH(URL+"gzip") --> server gzipped response
    + HASH(URL+"gzip,deflate") --> server gzipped response

3) client fetches URL using "Accept-Encoding:deflate"

Store loads the vary marker, determines MISS on the URL+"deflate"

After the server responds Squid store contains
    + HASH(URL) --> vary marker
    + HASH(URL+"gzip") --> server gzip encoded response
    + HASH(URL+"gzip,deflate") --> server gzip encoded response
    + HASH(URL+"deflate") --> server deflate encoded response

4) client fetches URL using "Accept-Encoding:sdch"

Store loads the vary marker, determines MISS on the URL+"sdch"

After the server responds Squid store contains
    + HASH(URL) --> vary marker
    + HASH(URL+"gzip") --> server gzip encoded response
    + HASH(URL+"gzip,deflate") --> server gzip encoded response
    + HASH(URL+"deflate") --> server deflate encoded response
    + HASH(URL+"sdch") --> server sdch encoded response

Repeat for all possible combinations of all encodng types (including
whitespace padding permutations). With an occasional HIT when a client
repeats the Accept-Encoding header.
This is why prefetching is not very popular. You have to prefetch at
least 4 variants of the encoding header just to cover the most popular
browsers - I will leave it to you to figure out what those are.

HTH
Amos
Received on Wed Apr 17 2013 - 11:22:04 MDT

This archive was generated by hypermail 2.2.0 : Wed Apr 17 2013 - 12:00:06 MDT