[squid-users] Caching a Script

From: David Groden <squid-cache@dont-contact.us>
Date: Thu, 18 Jul 2002 22:45:43 -0500

Packages (RedHat 7.3):
squid 2.4.STABLE6-6.7.3
squirm 1.23-7
apache 1.3.23-14

Need:
My website is only one script (http://xxx/xxx.xxx). The querystring or post
data determines what content is displayed. To save loading time and
processor usage, I want to cache certain hits to my script that match
certain querystrings.

Changes I've made in squid.conf:
1. I want to run apache on another port and squid on port 80 in "httpd
accelerator" mode so I:
   changed some acl's under "ACCESS CONTROLS"
   changed "http_port" from "3128" to "80"
   changed "icp_port" from "3130" to "0"
   changed "httpd_accel_port" from "80" to "virtual"
   changed "httpd_accel_uses_host_header" from "off" to "on"

2. I need it to cache hits with querystrings so I:
   commented out the line "acl QUERY urlpath_regex cgi-bin \?"
   commented out the line "no_cache deny QUERY"
   changed "hierarchy_stoplist" from "cgi-bin ?" to "cgi-bin"

3. I'm using squirm so I:
   added the line "redirect_program /usr/lib/squid/squirm"
   changed "redirect_rewrites_host_header" from "on" to "off"
   changed "redirect_children" from "5" to "10"

So what's the problem(s)?
1. how do I make squid decide whether or not to cache a hit by examining the
querystring? Is that part of squid, squirm, both, or neither? How so? Have
any examples? I plan on exploring this more if I can ever get past my next
problem which is...

2. Squid kept generating a "TCP_MISS" for hits to my script and grabbing a
fresh copy from apache. I AM ASSUMING <--red flag? :) that this was because
the "Last Modified" header returned with my script by apache changes every
time to the current time, causing squid to "miss" on the IMS
(If-Modified-Since) check. Am I right? The only way I managed to get a
"TCP_HIT" for my script from squid was to drop a couple of lines in
httpd.conf that made apache send out an "Expires: (now plus 1 month)" header
with the script. As soon as I did that, it was all "TCP_HIT"'s. Now, squid
is successfully returning the cached version of hits to my script, but the
side effect of the "Expires" header, which is to cache the hit at every
cache along the way as well as the client's browser (and browser memory!),
is unacceptable. I see that squid can remove headers from outgoing requests
with the "anonymize_headers" option, but it can't remove or modify the
headers of outgoing responses. Is there a patch or feature available that
allows squid to do this? ...Or am I just going about this all wrong
(wouldn't be surprising)? I'm assuming that the only reason my script wasn't
being returned from cache was because of the "Last Modified" header, and the
"Expires" header is overriding the IMS check and generating a "TCP_HIT". Is
this correct? If so, maybe I'm supposed to make apache always return a
specific "Last Modified" header for my script instead of generating a new
one each time?

What you need to know before replying : :)
The website is very complex with all sorts of different areas and screens
and involves a couple hundred databases. Please understand that the script
is not flexible, and I am only asking questions about the use and
functionality of squid itself. Also, yes, I understand that different orders
of the values in the querystring will create redundant files in the cache.
It won't be a problem. The only links in the world to my script containing a
querystring are generated by the script itself and are always in the same
order. I know I'm probably like the ten millionth person to want to use a
cache as an accelerator to speed up those (less than) dynamic pages that
normally take 20 seconds or more to process. Every one of those ridiculously
priced website accelerators out there tout this as one of their "most
exciting" features. Yet, the only web pages and discussions I've found that
concern using squid in this manner seem to revolve around arguing about the
principle of caching cgi's, like "fix your slow script" or "make your script
generate a hard coded page, blah blah blah." ...and other such gibberish. Is
it possible with squid and how still seems to be the unanswered question. At
least I haven't found the whole answer in one place yet, though I have seen
the question posed about a hundred times. Could some nice person please
reveal the secret few steps it takes to selectively cache a script using
querystrings with squid to myself and the rest of the people out there who
don't want to spend $10,000 on a $400 server running some "proprietary
software" that looks suspiciously like squid?

Thank you,
David
Received on Thu Jul 18 2002 - 21:46:37 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:09:18 MST