Re: [squid-users] SQUID store_url_rewrite

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 31 May 2011 01:43:23 +1200

On 30/05/11 00:22, Ghassan Gharabli wrote:
> Hello,
>
> I was trying to cache this website :
>
> http://down2.nogomi.com.xn55571528exgem0o65xymsgtmjiy75924mjqqybp.nogomi.com/M15/Alaa_Zalzaly/Atrak/Nogomi.com_Alaa_Zalzaly-3ali_Tar.mp3
>
> How do you cache or rewrite its uRL to static domain! :
> down2.nogomi.com.xn55571528exgem0o65xymsgtmjiy75924mjqqybp.nogomi.com
>
> Does that URL matches this REGEX EXAMPLE or who can help me match this
> Nogomi.com CDN?
>
> #generic http://variable.domain.com/path/filename."ex", "ext" or "exte"

The line above describes what the 'm/' pattern produces for the $y array.
Well, kind of...

$1 is anything. utter garbage. could be a full worth of loose bits:
       "http://evil.example.com/cache-poison?url=http://"

$2 appears to be a two-part domain name (ie "example.com" as opposed to
a three-part "www.example.com")

$3 is the file or script name.
$4 is the file extension type.

> #http://cdn1-28.projectplaylist.com
> #http://s1sdlod041.bcst.cdn.s1s.yimg.com
> } elsif (m/^http:\/\/(.*?)(\.[^\.\-]*?\..*?)\/([^\?\&\=]*)\.([\w\d]{2,4})\??.*$/)
> {
> @y = ($1,$2,$3,$4);
> $y[0] =~
> s/([a-z][0-9][a-z]dlod[\d]{3})|((cache|cdn)[-\d]*)|([a-zA-A]+-?[0-9]+(-[a-zA-Z]*)?)/cdn/;

I assume you are trying to compress
"down2.nogomi.com.xn55571528exgem0o65xymsgtmjiy75924mjqqybp" down to
"cdn" without allowing any non-FQDN garbage to compress?

I would use: s/[a-z0-9A-Z\.\-]+/cdn/
and add a fixed portion to ensure that $y[1] is one of the base domains
in the CDN. Just in case some other site uses the same host naming scheme.

> print $x . "storeurl://" . $y[0] . $y[1] . "/" . $y[2] . "." .
> $y[3] . "\n";
>
> I also tried to study more about REGULAR EXPRESSIONS but their
> examples are only for simple URLS .. I really need to study more about
> Complex URL .

Relax. You do not have to combine them all into one regex.

You can make it simple and efficient to start with and improve as your
knowledge does. If in doubt play it safe, storeurl_rewriting has at its
core the risk of XSS attack on your own clients (in the example above
$y[0] comes very close).

The hardest part is knowing for certain what all the parts of the URL
mean to the designers of that website. So that you only erase the
useless trackers and routing tags, while keeping everything important.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.12
   Beta testers wanted for 3.2.0.7 and 3.1.12.1
Received on Mon May 30 2011 - 13:43:32 MDT

This archive was generated by hypermail 2.2.0 : Tue May 31 2011 - 12:00:03 MDT