RE: Problem with squid interpretting gzipped content from Apache

From: <anita.sivakumar_at_wipro.com>
Date: Wed, 17 Apr 2013 13:11:33 +0000

Hi Amos,

That's an eye opener for me.

So I believe this iterative lookup should be there somewhere in the existing code logic for normal requests right?

This is because, firefox can send request with gzip,deflate as accept-encoding.
The object would have already been stored in the cache with just url + "accept-encoding" without any encoding actually done. I noticed this when I am requesting an object from an Apache Server which has compression enabled in its configuration but the client does not request with a compression. In short, the cache still maintains 2 entries for the same object but the second entry has an uncompressed version with " http://10.145.66.205/anita.htmaccept-encoding" alone.

I still managed to retrieve this object through firefox without an additional entry registered in the cache.

Can you help me out with which API to look for this logic in the code?

Thanks.

Regards,
Anita

-----Original Message-----
From: Amos Jeffries [mailto:squid3_at_treenet.co.nz]
Sent: 17 April 2013 16:52
To: squid-dev_at_squid-cache.org
Subject: Re: Problem with squid interpretting gzipped content from Apache

On 17/04/2013 9:58 p.m., anita wrote:
> Hi Amos,
>
> I realise this could be a development related post. I will repost it there.
> Sorry for the inconvenience.
>
> I am raising this request internally in Squid code to fetch some urls that
> is present in the already replied object.
> Say I am using a client to request fetchcompress.html (without any encoding
> set). The Squid fetches this fetchcompress.html from the origin
> server(apache) and returns it to the client.
> At the same time, it parses the fetchcompress.html to see if it has any
> prefetchable urls.
>
> In my case, fetchcompress.html has a prefetchable link compress.html.
>
> To fetch this, I setup a fake request header with "Accept-Encoding: gzip" in
> it. This is done internally by the squid code itself. I believe this is
> successfully done as I can see it in the tcpdump (refer to "Request sent to
> Apache (tcpdump)" section in my prev post).
> When I retrieve this object using a StoreClientCopy(), it gives me an empty
> object ie. Object length was 0.
>
> a) Now did this happen because I simply retrieved the object based on the
> url alone?
> b) Why is the Content-Encoding tag absent from the reply header?

Aha. The answer then is *because* you fetched the object from cache via
URL alone. This URL points at a variant resource. The Squid cache entry
for the URL alone points at that internal/vary marker object. The real
object is stored at hash location built from URL plus the
Accept-Encoding header text "gzip".

You need to handle these vary marker objects as a tool to determine that:
  a) the pre-fetcher needs to make several lookups for this URL, and
  b) the marker objects Vary: header to decide what permutations of
which headers the prefetcher should use on its followup fetches.

Here is a little sequence of client transactions and what to expect in
the store contents state for your edification:

For simplicity the start state is a new URL with no stored contents.

1) your prefetch pulls using Accept-Encoding:gzip.

Store determines MISS on the URL.

After the server responds Squid store contains
    + HASH(URL) --> vary marker
    + HASH(URL+"gzip") --> server gzipped response

2) client fetches URL using "Accept-Encoding:gzip,deflate"

Store loads the vary marker, determines MISS on the URL+"gzip,deflate"

After the server responds Squid store contains
    + HASH(URL) --> vary marker
    + HASH(URL+"gzip") --> server gzipped response
    + HASH(URL+"gzip,deflate") --> server gzipped response

3) client fetches URL using "Accept-Encoding:deflate"

Store loads the vary marker, determines MISS on the URL+"deflate"

After the server responds Squid store contains
    + HASH(URL) --> vary marker
    + HASH(URL+"gzip") --> server gzip encoded response
    + HASH(URL+"gzip,deflate") --> server gzip encoded response
    + HASH(URL+"deflate") --> server deflate encoded response

4) client fetches URL using "Accept-Encoding:sdch"

Store loads the vary marker, determines MISS on the URL+"sdch"

After the server responds Squid store contains
    + HASH(URL) --> vary marker
    + HASH(URL+"gzip") --> server gzip encoded response
    + HASH(URL+"gzip,deflate") --> server gzip encoded response
    + HASH(URL+"deflate") --> server deflate encoded response
    + HASH(URL+"sdch") --> server sdch encoded response

Repeat for all possible combinations of all encodng types (including
whitespace padding permutations). With an occasional HIT when a client
repeats the Accept-Encoding header.
This is why prefetching is not very popular. You have to prefetch at
least 4 variants of the encoding header just to cover the most popular
browsers - I will leave it to you to figure out what those are.

HTH
Amos

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com
Received on Wed Apr 17 2013 - 13:11:41 MDT

This archive was generated by hypermail 2.2.0 : Wed Apr 17 2013 - 12:00:06 MDT