Re: [squid-users] Accelerating proxy not matching cgi files from Amos Jeffries on 2011-08-25 (squid-users)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 26 Aug 2011 02:22:34 +1200

On 25/08/11 21:38, Mateusz Buc wrote:
> 2011/8/24 Amos Jeffries<squid3_at_treenet.co.nz>:
>>
>> Maybe. We would need to see the HTTP headers produced by gen.cgi to be sure.
>> From the description of how index.cgi/gen.cgi interact I think it highly
>> likely the lack of Cache-Control and Last-Modified information from gen.cgi
>> is causing the cache algorithms to determine its unsafe to store.
>>
>
> I gained access to the code of gen.cgi and made few changes:
>
> printf("Cache-Control: max-age=600, s-maxage=300\n");
> printf("Last-Modified: %s\n",mdate);
>
> It now fetches timestamp from the URL, parses it to appropriate format
> and then outputs as Last-Modified header. Plus I added Cache-Control.
> Results are noticable - now I get most of TCP_REFRESH_UNMODIFIED/304
> on my test page (gen.cgi links don't change there, so all timestamps
> remain the same all the time).
>
> Thank you a lot for these suggestions!
>
> However, I still can't make these URLs/images cached on my squid. Is
> there any chance they can be served directly from squid cache when
> they do not change? Right now I have reduced network bandwidth
> obviously, but not sure about CPU load - it still takes almost the
> same time to load URL (about 8 seconds).

Halfway there. Stage 1 complete after a fashion.

Meaning of "TCP_REFRESH_UNMODIFIED/304" :
  - TCP_ = TCP transport used
  - REFRESH = If-Modified-Since sent to origin (aka gen.cgi)
  - UNMODIFIED = full object came back. Headers +body apparently
identical to the known cached copy.
  - /304 = converted to a 304 "no change" response for the client half
of the transaction.

The 304 portion going across client<->Squid is where you are getting
*all* the bandwidth savings right now.

As I said earlier:

>>
>> At this point incoming requests will either be requesting brand new content or
>> have an If-Modified-Since: header containing the cached objects Last-Modified: timestamp.
>>
>> NOTE: You will not _yet_ see any reduction in the 200 requests. Potentially you might
>> actually see an increase as "must-revalidate" causes middleware caches to start working better.

The difference you are seeing to what I predicted is caused by your use
of max-age instead of must-revalidate.

max-age allows the browsers to cache the graphs for 600 seconds. So
you will get _zero_ repeat traffic for that duration. The exact opposite
of what must-revalidate will do for you.
On top of that you cannot see Squid serving HIT requests because of
s-maxage. Its set at 300 so Squid will expire before the browser cache
does. When the browser _does_ request an IMS request the Squid copy has
already expired and forces a contact to gen.cgi to check for updates.

Okay fine, use max-age and s-maxage. To get HITs under the current
circumstances set s-maxage larger than max-age. Or omit it and have
Squid cache the same length as any browser. Its shared by all clients,
so you will get some, but not a lot more.

>
> Do you have any further tips?
>

Just this: Keep going.

You are roughly up to the end of Step 1 of my earlier instructions.
Step 2 is where the CPU benefits start appearing.

Every time gen.cgi can decide If-Modified-Since is newer than graph
data. It saves all the graph production CPU time AND the graph size
worth of bandwidth.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.14
   Beta testers wanted for 3.2.0.10

Received on Thu Aug 25 2011 - 14:22:47 MDT

This archive was generated by hypermail 2.2.0 : Thu Aug 25 2011 - 12:00:02 MDT