Re: [squid-users] Reverse Proxy URL rewriter/hostname scrubber

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 20 Jan 2009 14:18:15 +1300 (NZDT)

> Amos Jeffries wrote:
>>> All,
>>>
>>>
>>> I am looking for a squid based solution for a problem I have. I
>>> have a squid reverse proxy sitting in front of three web servers, set
>>> up
>>> so I can test and QA each server by itself without a problem, and yet
>>> still server to the www site as if they were one. While this works fine
>>> inside my security boundary, it really breaks outside due to http being
>>> blocked to the servers themselves.
>>>
>>>
>>> The idea with the proxy is to go internet --> squid --> http farm
>>> and the problem that comes about is that occasionally the servers will
>>> response with hostname.tld instead of www.tld like they are supposed
>>> to.
>>> Since I have it set up to be desirable both ways for my set up, this is
>>> ok inside. I need to get my squid to grab those requests and re-write
>>> them as they are served to the client. This is especially troublesome
>>> for embedded generated code (JavaScript links, and silly stuff like
>>> that). I have basic idea's on how to do this, but I haven't found
>>> either
>>> a good step by step telling how to do this, or a good redirect client
>>> that can handle something like this.
>>>
>>>
>>> Any suggestions would help out a lot
>>>
>>>
>>> Thanks,
>>> Seann
>>>
>>>
>>
>> Squid really can't do this properly for replies.
>>
>> You need to find out why the servers are breaking their configuration.
>> It
>> may come down to replacing their software, but using broken apps is not
>> a
>> good idea.
>>
>>
>> Assuming you are using the correct reverse-proxy configuration with no
>> re-writes anywhere:
>> http://wiki.squid-cache.org/SquidFaq/ReverseProxy
>>
>> Amos
>>
> My current squid set up for the reverse proxy (not sure if it is 100%
> right, but when it was done, there was very little I could find on this,
> aside from small entries in the wiki):

Mostly correct. Comments below.
So its not a configuration problem with Squid. Definitely the web server
breaking.

As I said, the best fix is to fix the server software.
If you really don't want such mangled pages to be publicly visible, you
could block the replies coming back out. That would let you leave users
with a clean 'sorry something is broken' page instead of a connection
failure.

The config for that would be something like this:
 acl brokenHost rep_header Host -i ^(hikari|minazuki).com
 deny_info http://example.com/sorry_broken_host.html brokenHost
 http_reply_access deny brokenHost

>
> # $Id: squid.conf,v 1.1 2005/07/16 22:24:57 jmates Exp $
> # This is the inital configuration file for the reverse proxy that will
> be hosted on yukiko
> # and will support HTTPS communication for the site external of the
> network. It will be
> # hardened with firewall software, and chrooting it as well.
> #
> visible_hostname www.tsukinokage.net
>
> http_port 80 accel defaultsite=www.tsukinokage.net
> https_port 443 accel defaultsite=www.tsukinokage.net
> cert=/etc/pki/tls/certs/haruhi.crt key=/etc/pki/tls/private/haruhi.key
> icp_port 0
> #htcp_port 0
> # Security measure : Chroot the proccess (may require values?)
> # disable usual block on cgi-bin and ? in URL, to avoid web robots
> # adding ? to requests to bypass the proxy. Little cgi-bin and ? use on
> # my site, so not a problem for me...
> #
> #hierarchy_stoplist cgi-bin ?
> #acl QUERY urlpath_regex cgi-bin \?
> #no_cache deny QUERY
>
> cache_mem 256 MB
> # cache_swap_low 90
> # cache_swap_high 95
>
> maximum_object_size 1024 KB
> # minimum_object_size 0 KB
>
> maximum_object_size_in_memory 64 KB
>
> cache_replacement_policy lru
> #cache_replacemnet_policy heap lfuda
> memory_replacement_policy lru
>
> cache_dir ufs /var/spool/squid/cache 5000 16 256
>
> logfile_rotate 3
>
> cache_log none
> cache_store_log none
>
> cache_access_log /var/log/squid/access_log
> #cache_log /var/log/squid/cache_log
> #cache_store_log /var/log/squid/store_log
>
> emulate_httpd_log on
> log_ip_on_direct on
> log_mime_hdrs on
>
> log_fqdn on
>
> ftp_user nobody_at_tsuknokage.net
> ftp_sanitycheck on
>
> # use local caching name server, avoid /etc/hosts
> dns_nameservers 127.0.0.1
> hosts_file none
> #redirect_program /usr/local/squirm/bin/squirm
> #redirect_program /etc/squid/rewriter.pl
> #redirect_children 1
> #redirect_program /etc/squid/rewriter/redirector.py
> #redirect_children 2
> redirect_rewrites_host_header on
> #Error code redirects
> error_map http://haruhi.tsukinokage.net/defaults/301.html 301
> error_map http://haruhi.tsukinokage.net/defaults/400.html 400
> error_map http://haruhi.tsukinokage.net/defaults/401.html 401
> error_map http://haruhi.tsukinokage.net/defaults/403.html 403
> error_map http://haruhi.tsukinokage.net/defaults/404.html 404
> error_map http://haruhi.tsukinokage.net/defaults/414.html 414
> error_map http://haruhi.tsukinokage.net/defaults/500.html 500
> error_map http://haruhi.tsukinokage.net/defaults/503.html 503
> error_map http://haruhi.tsukinokage.net/defaults/505.html 505
>
> #location_rewritre_program /etc/squid/rewriter.pl
> # TODO needed??
> auth_param basic children 5
> auth_param basic realm Squid proxy-caching web server
> auth_param basic credentialsttl 2 hours
>
> request_header_max_size 16 KB
> # request_body_max_size 0 KB
>
> refresh_pattern ^ftp: 1440 20% 10080
> refresh_pattern ^gopher: 1440 0% 1440
> refresh_pattern . 0 20% 4320
>
> negative_ttl 1 minutes
> negative_dns_ttl 1 minutes
>
> # connect_timeout 1 minutes
> # peer_connect_timeout 30 seconds
> # read_timeout 10 minutes
> # request_timeout 5 minutes
> # persistent_request_timeout 2 minute
> # half_closed_clients on
>
> ident_timeout 1 seconds
>
> shutdown_lifetime 17 seconds
>
> acl manager proto cache_object
> acl tsuki src 192.168.10.0/28
> acl mariko src 192.168.10.4/255.255.255.255
> acl hotaru src 192.168.10.14/255.255.255.255
> acl haruhicm src 192.168.10.2/255.255.255.255
> acl minazuki src 192.168.10.6/255.255.255.255
> acl yuki-priv src 172.20.1.1/255.255.255.255
> acl localhost src 127.0.0.1/255.255.255.255
> acl to_localhost dst 127.0.0.0/8
> acl all src 0.0.0.0/0.0.0.0
> acl tsuki_sites dstdomain www.tsukinokage.net
> #acl some_site dstdomain www.somesite.com
> acl bigsis urlpath_regex ^/bigsis
> acl mh urlpath_regex ^/misterhouse
> acl zm urlpath_regex ^/zm
> acl base urlpath_regex ^/base
> acl nulog urlpath_regex ^/nulog
> acl squid_yuki urlpath_regex ^/squid
> acl squid_yukiko urlpath_regex ^/yukiko-squid
> acl music1 urlpath_regex ^/pitchfork
> acl music2 urlpath_regex ^/phpMp
> acl music3 urlpath_regex ^/phpMp2
> acl jin urlpath_regex ^/jin
> acl mmm urlpath_regex ^/mmm
> acl lm urlpath_regex ^/Haruhi_sensors
> acl lm2 urlpath_regex ^/Minazuki_sensors
> acl amp urlpath_regex ^/amp
> acl ups urlpath_regex ^/ups
> acl fax urlpath_regex ^/fax
> acl webmail urlpath_regex ^/webmail
> acl otrs urlpath_regex ^/otrs
> acl cacti urlpath_regex ^/cacti
> acl tsukinokage urlpath_regex ^/tsukinokage_new
> acl dnd urlpath_regex ^/dnd
> acl cgi urlpath_regex ^/cgi-bin
> acl dnd_upload urlpath_regex ^/dnd_upload
> acl sysinfo urlpath_regex ^/phpsysinfo
> acl SSL_ports port 443 563
> acl Safe_ports port 80 # http
> acl Safe_ports port 21 # ftp
> acl Safe_ports port 443 563 # https, snews
> #acl Safe_ports port 70 # gopher
> #acl Safe_ports port 210 # wais
> #acl Safe_ports port 280 # http-mgmt
> acl Safe_ports port 1025-65535 # unregistered ports
> #acl Safe_ports port 488 # gss-http
> #acl Safe_ports port 591 # filemaker
> #acl Safe_ports port 777 # multiling http
> acl CONNECT method CONNECT
>
> http_access allow manager tsuki
> http_access deny manager
> http_access allow tsuki_sites
> # Deny requests to unknown ports
> http_access deny !Safe_ports
> # Deny CONNECT to other than SSL ports
> http_access deny CONNECT !SSL_ports
>
> #server load balancing
> #Directive hostname type proxy port icp
> port options
> #cache_peer server.com parent/sib 80/443/3128
> 3130 stuff
> cache_peer 192.168.10.2 parent 80 0 no-query originserver name=haruhi
> login=PASS round-robin Weight=5
> cache_peer 192.168.10.6 parent 80 0 no-query originserver name=minazuki
> login=PASS round-robin Weight=1
> #cache_peer 192.168.10.5 parent 80 0 no-query originserver name=hikari
> #cache_peer 192.168.10.3 parent 80 0 no-query originserver name=akari
> cache_peer_access haruhi allow cgi
> cache_peer_access minazuki deny cgi
> cache_peer_access haruhi allow sysinfo
> cache_peer_access minazuki deny sysinfo
> cache_peer_access haruhi allow dnd
> cache_peer_access minazuki deny dnd
> cache_peer_access haruhi allow dnd_upload
> cache_peer_access minazuki deny dnd_upload
> cache_peer_access haruhi allow otrs
> cache_peer_access minazuki deny otrs
> cache_peer_access haruhi allow tsukinokage
> cache_peer_access minazuki deny tsukinokage
> cache_peer_access haruhi allow cacti
> cache_peer_access minazuki deny cacti
> cache_peer_access haruhi allow webmail
> cache_peer_access minazuki deny webmail
> cache_peer_access haruhi allow amp
> cache_peer_access minazuki deny amp
> cache_peer_access haruhi allow ups
> cache_peer_access minazuki deny ups
> cache_peer_access minazuki allow bigsis
> cache_peer_access haruhi deny bigsis
> cache_peer_access minazuki allow fax
> cache_peer_access haruhi deny fax
> cache_peer_access minazuki allow lm2
> cache_peer_access haruhi allow mmm
> cache_peer_access minazuki deny mmm
> cache_peer_access haruhi allow jin
> cache_peer_access minazuki deny jin
> cache_peer_access haruhi deny lm2
> cache_peer_access haruhi allow lm
> cache_peer_access minazuki deny lm
> cache_peer_access minazuki allow mh
> cache_peer_access haruhi deny mh
> cache_peer_access minazuki allow zm
> cache_peer_access haruhi deny zm
> cache_peer_access haruhi allow base
> cache_peer_access minazuki deny base
> cache_peer_access haruhi allow nulog
> cache_peer_access minazuki deny nulog
> cache_peer_access haruhi allow squid_yuki
> cache_peer_access minazuki deny squid_yuki
> cache_peer_access haruhi allow squid_yukiko
> cache_peer_access minazuki deny squid_yukiko
> cache_peer_access haruhi allow music1
> cache_peer_access minazuki deny music1
> cache_peer_access haruhi allow music2
> cache_peer_access minazuki deny music2
> cache_peer_access haruhi allow music3
> cache_peer_access minazuki deny music3
> #http_access deny to_localhost

The access to each peer is independent. So... you can simplify the above
regex (which are slow) a lot by removing the 'allow' ones and adding these
at the bottom of the list:

  cache_peer_access haruhi allow tsuki_sites
  cache_peer_access haruhi deny all

  cache_peer_access haruhi allow tsuki_sites
  cache_peer_access haruhi deny all

>
> acl okdomains dstdomain tsukinokage.net
> http_access deny !to_localhost
> http_access allow all

These should really have their allow/deny inverted:
  http_access deny !to_localhost -> http_access allow to_localhost
  http_access allow all -> http_access deny all

> # And finally deny all other access to this proxy
> #http_access deny all
>
> http_reply_access allow all
>
> icp_access deny all
> miss_access allow all
>
> ident_lookup_access deny all
>
> cache_mgr webmaster_at_tsukinokage.net
>
> cache_effective_user squid
>
>
>
> forwarded_for on
>
> log_icp_queries off
>
> snmp_port 3494
> acl snmpharuhi snmp_community wintersnow
> snmp_access allow snmpharuhi haruhicm
> snmp_access deny all
> snmp_incoming_address 172.20.1.2
> snmp_outgoing_address 255.255.255.255
>
> # offline_mode off
> coredump_dir /var/squid/core
>
> pipeline_prefetch on
>
>
Received on Tue Jan 20 2009 - 01:18:21 MST

This archive was generated by hypermail 2.2.0 : Tue Jan 20 2009 - 12:00:07 MST