Re: [squid-users] cache dynamically generated images

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 23 Feb 2011 11:24:50 +1300

 On Tue, 22 Feb 2011 11:26:51 -0500, Charles Galpin wrote:
> Hi Amos, thanks so much for the help. More questions and
> clarification needed please
>
> On Feb 18, 2011, at 5:47 PM, Amos Jeffries wrote:
>>
>> Make sure your config has had these changes:
>> http://wiki.squid-cache.org/ConfigExamples/DynamicContent
>>
>> which allows Squid to play with query-string (?) objects properly.
>
> Yes these were default settings for me. I don't think this is
> necessarily an issue for me though since I am sending URLs that look
> like static image requests, but converting them via mod_rewrite in
> apache to call my script.
>
>> TCP_REFRESH_MISS means the backend sent a new changed copy while
>> revalidating/refreshing its existing copy.
>>
>> max-age=0 means revalidate that is has not changed before sending
>> anything.
>>
>> > I have set an Expires, Etag, "Cache-Control: =
>>> max-age=3D600, s-max-age=3D600, must-revalidate", "Content-Length
>>> and =
>>
>> must-revalidate from the server is essentially the same as max-age=0
>> form the client. It will also lead to TCP_REFRESH_MISS.
>
> I'll admit I threw in the must-revalidate as part of my incfreasingly
> desperate attempts to get things behaving the way I wanted, and
> didn't fully understand it's ramifications, nor the client side
> max-age=0 implications, but your explanation helps!
>
>> BUT, these controls are only what is making the problem visible. The
>> server logic itself is the actual problem.
>
> Agreed!
>
>> ETag should be the MD5 checksum of the file or something similarly
>> unique. It is used alongside the URL to guarantee version differences
>> are kept separate.
>
> Yes, this was another desperate attempt to force caching to occur,
> and will implement something more sane for the actual app. But this
> should have helped shouldn't it? For my testing this should have
> uniquely identified this image right?
>
> I guess I have a fundamental mis-understanding, but my assumption was
> all these directives were ways to tell squid to not keep asking the
> origin, but server from the cache until the age expired and at that
> point check if it changed. I totally didn't expect it to check every
> time, and this still doesn't sit well with me. Should it really check
> every time? I know a check is faster than an actual GET but it still
> seems more than necessary if caching parameters have been specified.
>
>> Your approach is reasonable for your needs. But the backend server
>> system is letting you down by sending back a new copy every
>> validation.
>> If you can get it to present 304 not-modified responses between file
>> update times this will work as intended.
>>
>> This would mean implementing some extra logic in the script to
>> handle If-Modified-Since, If-Unmodified-Since, If-None-Match and
>> If-Match headers.
>> The script itself needs to be in control of whether a local static
>> duplicate is used, apache does not have enough info to do it as you
>> noticed. Most CMS call this server-side caching.
>
> Ok, I can return 304 and it gets a cache hit as expected so this is
> great. I am not sure I'll waste any time making my test script any
> smarter as it's just a simple perl script and the actual
> implementation will be in java and be able to make these
> determinations, but one of the things that has been throwing me off,
> is I see no signs in the apache logs of a HEAD request, they all show
> up as GETs. I assume this is my mod_rewrite rule, but I also tried
> with a direct url to the script and am not getting the
> If-Modified-Since header for example (the only one I know off the top
> of my head is set by the CGI module).

 Correct. This is a RESTful property of HTTP.
 HEAD is for systems to determine the properties of an object when they
 *never* want the body to come back as the reply. Re-validation requests
 do want changed bodies to come back when relevant so they use GET with
 If-* headers.

>
> But either way, this confirms it's just my dumb script to blame :)
>

 Cool, good to know its easily fixed.

>>>
>>> Lastly, I was unable to setup squid on an alternate port - say
>>> 8081, and =
>>> use an existing apache on port 80, both on the same box. This is
>>> for =
>>> testing so I can run squid in parallel with the existing service
>>> without =
>>> changing the port it is on. Squid seems to want to use the same
>>> port =
>>> for the origin server as itself and I can't figure out how to say =
>>> "listen in 8081 but send requests to port 80 of the origin server".
>>> Any =
>>> thoughts on this? I am using another server right now to get around
>>> =
>>> this, but it would be more convenient to use the same box.
>>
>> cache_peer parameter #3 is the port number on the origin server to
>> send HTTP requests to.
>>
>> Also, to make the Host: header and URL contain the right port number
>> when crossing ports like this you need to set the http_port vport=X
>> option to the port the backend-server is using. Otherwise Squid will
>> place its public-facing port number in the Host: header to inform the
>> backend what the clients real URL was.
>
> Yes I have this but it's still not working. Below are all uncommented
> lines in my squid.conf - can you see anything I have that's messing
> this up? The imageserver.my.org is an apache virtual host if it
> matters. With this, if I go to
> http://imageserver.my.org:8081/my/image/path.jpg , squid calls
> http://imageserver.my.org:8081/my/image/path.jpg instead of
> http://imageserver.my.org:80/my/image/path.jpg

 Hmm, that is a bit of a worry. vport=80 is supposed to be fixing that
 port number up so it disappears completely (implicit :80).

>
> acl all src all
> acl manager proto cache_object
> acl localhost src 127.0.0.1/32
> acl to_localhost dst 127.0.0.0/8 0.0.0.0/32
> acl http8081 port 8081
> acl local-servers dstdomain .my.org
> acl localnet src 10.0.0.0/8 # RFC1918 possible internal network
> acl localnet src 172.16.0.0/12 # RFC1918 possible internal network
> acl localnet src 192.168.0.0/16 # RFC1918 possible internal network
> acl SSL_ports port 443
> acl Safe_ports port 80 # http
> acl Safe_ports port 8081 # http
> acl Safe_ports port 21 # ftp
> acl Safe_ports port 443 # https
> acl Safe_ports port 70 # gopher
> acl Safe_ports port 210 # wais
> acl Safe_ports port 1025-65535 # unregistered ports
> acl Safe_ports port 280 # http-mgmt
> acl Safe_ports port 488 # gss-http
> acl Safe_ports port 591 # filemaker
> acl Safe_ports port 777 # multiling http
> acl CONNECT method CONNECT

 For a reverse proxy it is a good idea to place http_access and ACL
 controls specific to the reverse proxy at this point in the file.

 What I would add here for your config is this:

  acl imageserver dstdomain imageserver.my.org
  http_access allow imageserver

 NP: this obsolete the http8081 limits.

> http_access allow manager localhost
> http_access deny manager
> http_access deny !Safe_ports
> http_access deny CONNECT !SSL_ports
> http_access allow localnet
> http_access allow http8081
> http_access deny all
> icp_access allow localnet
> icp_access deny all
> http_port 8081 vhost vport=80 defaultsite=imageserver.my.org

 Optional with your Squid, but for future-proofing the upgrade you can
 add the accel mode flag explicitly as first on the options list:

  http_port 8081 accel vhost vport=80 defaultsite=imageserver.my.org

> cache_peer imageserver.my.org parent 80 0 no-query originserver
> default
> hierarchy_stoplist cgi-bin ?
> access_log c:/squid/var/logs/access.log squid
> refresh_pattern ^ftp: 1440 20% 10080
> refresh_pattern ^gopher: 1440 0% 1440
> refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
> refresh_pattern . 0 20% 4320
> acl shoutcast rep_header X-HTTP09-First-Line ^ICY.[0-9]
> upgrade_http0.9 deny shoutcast
> acl apache rep_header Server ^Apache
> broken_vary_encoding allow apache
> always_direct allow all
> always_direct allow local-servers

 Absolutely remove the always_direct if you can. The "allow imageserver"
 line I recommend above will ensure that the website requests always are
 serviced. Squid will pass them on to the cache_peer securely *unless*
 always_direct bypasses the peer link.

 FWIW: When the cache_peer is configured with a FQDN the IPs are looked
 up on every request needing to go there. So a small amount of IP load
 balancing and failover happens there already, the same as you get from
 going direct based on the vhost name.

 Amos
Received on Tue Feb 22 2011 - 22:24:55 MST

This archive was generated by hypermail 2.2.0 : Wed Feb 23 2011 - 12:00:03 MST