Re: [squid-users] cache dynamically generated images

From: Amos Jeffries <>
Date: Wed, 23 Feb 2011 11:24:50 +1300

 On Tue, 22 Feb 2011 11:26:51 -0500, Charles Galpin wrote:
> Hi Amos, thanks so much for the help. More questions and
> clarification needed please
> On Feb 18, 2011, at 5:47 PM, Amos Jeffries wrote:
>> Make sure your config has had these changes:
>> which allows Squid to play with query-string (?) objects properly.
> Yes these were default settings for me. I don't think this is
> necessarily an issue for me though since I am sending URLs that look
> like static image requests, but converting them via mod_rewrite in
> apache to call my script.
>> TCP_REFRESH_MISS means the backend sent a new changed copy while
>> revalidating/refreshing its existing copy.
>> max-age=0 means revalidate that is has not changed before sending
>> anything.
>> > I have set an Expires, Etag, "Cache-Control: =
>>> max-age=3D600, s-max-age=3D600, must-revalidate", "Content-Length
>>> and =
>> must-revalidate from the server is essentially the same as max-age=0
>> form the client. It will also lead to TCP_REFRESH_MISS.
> I'll admit I threw in the must-revalidate as part of my incfreasingly
> desperate attempts to get things behaving the way I wanted, and
> didn't fully understand it's ramifications, nor the client side
> max-age=0 implications, but your explanation helps!
>> BUT, these controls are only what is making the problem visible. The
>> server logic itself is the actual problem.
> Agreed!
>> ETag should be the MD5 checksum of the file or something similarly
>> unique. It is used alongside the URL to guarantee version differences
>> are kept separate.
> Yes, this was another desperate attempt to force caching to occur,
> and will implement something more sane for the actual app. But this
> should have helped shouldn't it? For my testing this should have
> uniquely identified this image right?
> I guess I have a fundamental mis-understanding, but my assumption was
> all these directives were ways to tell squid to not keep asking the
> origin, but server from the cache until the age expired and at that
> point check if it changed. I totally didn't expect it to check every
> time, and this still doesn't sit well with me. Should it really check
> every time? I know a check is faster than an actual GET but it still
> seems more than necessary if caching parameters have been specified.
>> Your approach is reasonable for your needs. But the backend server
>> system is letting you down by sending back a new copy every
>> validation.
>> If you can get it to present 304 not-modified responses between file
>> update times this will work as intended.
>> This would mean implementing some extra logic in the script to
>> handle If-Modified-Since, If-Unmodified-Since, If-None-Match and
>> If-Match headers.
>> The script itself needs to be in control of whether a local static
>> duplicate is used, apache does not have enough info to do it as you
>> noticed. Most CMS call this server-side caching.
> Ok, I can return 304 and it gets a cache hit as expected so this is
> great. I am not sure I'll waste any time making my test script any
> smarter as it's just a simple perl script and the actual
> implementation will be in java and be able to make these
> determinations, but one of the things that has been throwing me off,
> is I see no signs in the apache logs of a HEAD request, they all show
> up as GETs. I assume this is my mod_rewrite rule, but I also tried
> with a direct url to the script and am not getting the
> If-Modified-Since header for example (the only one I know off the top
> of my head is set by the CGI module).

 Correct. This is a RESTful property of HTTP.
 HEAD is for systems to determine the properties of an object when they
 *never* want the body to come back as the reply. Re-validation requests
 do want changed bodies to come back when relevant so they use GET with
 If-* headers.

> But either way, this confirms it's just my dumb script to blame :)

 Cool, good to know its easily fixed.

>>> Lastly, I was unable to setup squid on an alternate port - say
>>> 8081, and =
>>> use an existing apache on port 80, both on the same box. This is
>>> for =
>>> testing so I can run squid in parallel with the existing service
>>> without =
>>> changing the port it is on. Squid seems to want to use the same
>>> port =
>>> for the origin server as itself and I can't figure out how to say =
>>> "listen in 8081 but send requests to port 80 of the origin server".
>>> Any =
>>> thoughts on this? I am using another server right now to get around
>>> =
>>> this, but it would be more convenient to use the same box.
>> cache_peer parameter #3 is the port number on the origin server to
>> send HTTP requests to.
>> Also, to make the Host: header and URL contain the right port number
>> when crossing ports like this you need to set the http_port vport=X
>> option to the port the backend-server is using. Otherwise Squid will
>> place its public-facing port number in the Host: header to inform the
>> backend what the clients real URL was.
> Yes I have this but it's still not working. Below are all uncommented
> lines in my squid.conf - can you see anything I have that's messing
> this up? The is an apache virtual host if it
> matters. With this, if I go to
> , squid calls
> instead of

 Hmm, that is a bit of a worry. vport=80 is supposed to be fixing that
 port number up so it disappears completely (implicit :80).

> acl all src all
> acl manager proto cache_object
> acl localhost src
> acl to_localhost dst
> acl http8081 port 8081
> acl local-servers dstdomain
> acl localnet src # RFC1918 possible internal network
> acl localnet src # RFC1918 possible internal network
> acl localnet src # RFC1918 possible internal network
> acl SSL_ports port 443
> acl Safe_ports port 80 # http
> acl Safe_ports port 8081 # http
> acl Safe_ports port 21 # ftp
> acl Safe_ports port 443 # https
> acl Safe_ports port 70 # gopher
> acl Safe_ports port 210 # wais
> acl Safe_ports port 1025-65535 # unregistered ports
> acl Safe_ports port 280 # http-mgmt
> acl Safe_ports port 488 # gss-http
> acl Safe_ports port 591 # filemaker
> acl Safe_ports port 777 # multiling http
> acl CONNECT method CONNECT

 For a reverse proxy it is a good idea to place http_access and ACL
 controls specific to the reverse proxy at this point in the file.

 What I would add here for your config is this:

  acl imageserver dstdomain
  http_access allow imageserver

 NP: this obsolete the http8081 limits.

> http_access allow manager localhost
> http_access deny manager
> http_access deny !Safe_ports
> http_access deny CONNECT !SSL_ports
> http_access allow localnet
> http_access allow http8081
> http_access deny all
> icp_access allow localnet
> icp_access deny all
> http_port 8081 vhost vport=80

 Optional with your Squid, but for future-proofing the upgrade you can
 add the accel mode flag explicitly as first on the options list:

  http_port 8081 accel vhost vport=80

> cache_peer parent 80 0 no-query originserver
> default
> hierarchy_stoplist cgi-bin ?
> access_log c:/squid/var/logs/access.log squid
> refresh_pattern ^ftp: 1440 20% 10080
> refresh_pattern ^gopher: 1440 0% 1440
> refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
> refresh_pattern . 0 20% 4320
> acl shoutcast rep_header X-HTTP09-First-Line ^ICY.[0-9]
> upgrade_http0.9 deny shoutcast
> acl apache rep_header Server ^Apache
> broken_vary_encoding allow apache
> always_direct allow all
> always_direct allow local-servers

 Absolutely remove the always_direct if you can. The "allow imageserver"
 line I recommend above will ensure that the website requests always are
 serviced. Squid will pass them on to the cache_peer securely *unless*
 always_direct bypasses the peer link.

 FWIW: When the cache_peer is configured with a FQDN the IPs are looked
 up on every request needing to go there. So a small amount of IP load
 balancing and failover happens there already, the same as you get from
 going direct based on the vhost name.

Received on Tue Feb 22 2011 - 22:24:55 MST

This archive was generated by hypermail 2.2.0 : Wed Feb 23 2011 - 12:00:03 MST