Re: [squid-users] url rewrite and cache, which URL should be cached?

From: Sylvain Viart <sylvain.viart@dont-contact.us>
Date: Wed, 10 Oct 2007 10:20:49 +0200

Hi Henrik,

Henrik Nordstrom a écrit :
> On tis, 2007-10-09 at 17:47 +0200, Sylvain Viart wrote:
>
>> Hi,
>>
>> I use a redirector on an accel proxy config.
>>
>> url_rewrite_program /etc/squid/redirector.pl
>> url_rewrite_children 15
>> url_rewrite_concurrency 0
>> url_rewrite_host_header off
>>
>>
>> It seems, that the url used to store the requested url is the orginal
>> url, not the rewrited on.
>>
>
> The cache is using the rewritten URL.
>
here is some more detail:

the redirector script:
#--------------------------8< ------------------------------------
#!/usr/bin/perl
#
# request_URI client_IP/FQDN username HTTP_method
#
# url_rewrite_program
# URL <SP> client_ip "/" fqdn <SP> user <SP> method <SP> urlgroup <NL>
#
#
# access.log
# 1190984604.736 6 12.34.56.78 TCP_MISS/200 1752 GET
http://proxy-03.mydomain.com/thumb/100/default_woman.jpg - ROUNDROBI
# N_PARENT/php-03 image/jpeg
#
#
# !perl -n -e ' m@(http://[^ ]+)@; print "$1\n";' <
/var/log/squid/access.log
$|=1;

$filer = 'filer-01';
@filer_domain = qw /img.mydomain.com/;

$filer_host = join('|', @filer_domain);

# dumy domain name for canonical rewriting
$php = 'static-php';

$n = 0;
while (<>)
{
        if(m[^http://$filer_host/] ||
m[^http://([^/]+)/(js/static_file|media|thumb)])
        {
                #$host = &get_filer;
                s#http://[^/]+(:[0-9]+)?#http://$filer#;

                # add urlgroup
                s/^/!filer! /;
        }
        else
        {
                if($_ !~ /\.php/)
                {
                        s#http://[^/]+(:[0-9]+)?#http://$php#;
                }

                # add urlgroup
                s/^/!php! /;
        }

        print;
}
#--------------------------8< ------------------------------------

the redirector returns:
echo "http://mes-test2.mydomain.com/js/mailbox.js" | perl t.pl
!php! http://static-php/js/mailbox.js

echo "http://sometestagain.mydomain.com/js/mailbox.js" | perl t.pl
!php! http://static-php/js/mailbox.js

some store.log entries:

1191938682.998 SWAPOUT 00 000002E5 2AC12C498B97741871A11F0290E927C8 200
1191938682 1180514999 -1 application/x-javascript
418/418 GET http://mes-test.mydomain.com/js/mailbox.js
1191938722.433 SWAPOUT 00 000002E7 C61344FEBB15FC2C7D039A36A2EE552D 200
1191938722 1180514999 -1 application/x-javascript
418/418 GET http://mes-test2.mydomain.com/js/mailbox.js

for me it should be stored under
http://static-php/js/mailbox.js

associated acl and urlgroup:

# urlgroup matching acl from url_rewrite_program
acl static_doc urlgroup filer
acl php_doc urlgroup php

# filer server access rules
cache_peer_access filer-01 allow static_doc
cache_peer_access filer-01 deny all

# php server access exclusion for static_doc matched on the filer
#cache_peer_access php-01 deny static_doc
#cache_peer_access php-02 deny static_doc
#cache_peer_access php-03 deny static_doc
cache_peer_access php-04 deny static_doc

Could it be associated with the urlgroup which somewhat hides the rewriting?

> Sequence is approximately
>
> * Request accepted
> * http_access Access controls
> * URL rewriting, replacing Squid's idea of the URL
> * http_access2 Access controls
> * Cache lookup
> * Forwarding on cache miss
> * http_reply_access Reply access controls
>
>
> Because of this using "url_rewrite_host_header off" can be a very bad
> thing as it makes the requested URL sent to the web server differ from
> the cache URL, and can easily bite you..
>
it seems it bites, :-)

but that what I want, it worked with squid2.5 redirector without urlgroup.
 * cache canonized URL
 * peer: original URL.

would be simpler if it works like that for my config

More documentation on "url_rewrite_host_header off"?

Regards,
Sylvain.
Received on Wed Oct 10 2007 - 02:20:49 MDT

This archive was generated by hypermail pre-2.1.9 : Thu Nov 01 2007 - 13:00:01 MDT