Re: [squid-users] SQUID store_url_rewrite from Ghassan Gharabli on 2011-05-30 (squid-users)

From: Ghassan Gharabli <sounarose_at_googlemail.com>
Date: Tue, 31 May 2011 02:54:05 +0300

Hello again,

        #generic http://variable.domain.com/path/filename."ex", "ext" or "exte"
        #http://cdn1-28.projectplaylist.com
        #http://s1sdlod041.bcst.cdn.s1s.yimg.com
#} elsif (m/^http:\/\/(.*?)(\.[^\.\-]*?\..*?)\/([^\?\&\=]*)\.([\w\d]{2,4})\??.*$/)
{
# @y = ($1,$2,$3,$4);
# $y[0] =~
s/([a-z][0-9][a-z]dlod[\d]{3})|((cache|cdn)[-\d]*)|([a-zA-A]+-?[0-9]+(-[a-zA-Z]*)?)/cdn/;
# print $x . "storeurl://" . $y[0] . $y[1] . "/" . $y[2] . "."
. $y[3] . "\n";

Why we had to use arrays in this example.
I understood that m/ indicates a regex match operation , "\n" to break
the line and we assined @y as an array which has
4 values we used to call each one for example we call $1 the first
record as y[0] ..till now its fine for me
and we assign a value to y[0] =~ $y[0] =~
s/([a-z][0-9][a-z]dlod[\d]{3})|((cache|cdn)[-\d]*)|([a-zA-A]+-?[0-9]+(-[a-zA-Z]*)?)/cdn/;
...

Please correct me if im wrong here.Im still confused about those
values $1 , $2 , $3 ..
how does the program know where to locate $1 or $2 as there is no
values or $strings anyway
as I have noticed that $1 means an element for example
http://cdn1-28.projectplaylist.com can be grouped as elements .. Hope
Im correct on this one
http://(cdn1-28) . (projectplaylist) . (com) should be http:// $1 . $2 . $3

Then let me see if I can solve this one to match this URL
http://down2.nogomi.com.xn55571528exgem0o65xymsgtmjiy75924mjqqybp.nogomi.com/M15/Alaa_Zalzaly/Atrak/Nogomi.com_Alaa_Zalzaly-3ali_Tar.mp3

so I should work around the FQDN and leave the rest as is, please if
you found any wrong this then correct it for me
#does that match
http://down2.nogomi.com.xn55571528exgem0o65xymsgtmjiy75924mjqqybp.nogomi.com/M15/Alaa_Zalzaly/Atrak/Nogomi.com_Alaa_Zalzaly-3ali_Tar.mp3
  ??
elsif (m/^http:\/\/(.*?)(\.[^\.\-]*?\..*?)\/([^\?\&\=]*)\.([\w\d]{2,4})\??.*$/)
{
      @y = ($1,$2,$3,$4);
      $y[0] =~ s/[a-z0-9A-Z\.\-]+/cdn/
      print $x . "storeurl://" . $y[0] . $y[1] . "/" . $y[2] . "." .
$y[3] . "\n";

does this example matches Nogomi.com domain correctly ?

and why u used s/[a-z0-9A-Z\.\-]+/cdn/

I only understood that you are mnaking sure to find small letters ,
cap letters , numbers but I believe \. is to search
for one dot only .. how about if there is 2 dots or more that 3 dots
in this case! .. another one u r finding dash ..

The only thing im confused about is why we have added /cdn/ since the
url doesnt has a word "cdn"?

Why we have used storeurl:// because I can see some of examples are
print $x . "http://" . $y[0] . $y[1] . "/" . $y[2] . "." . $y[3] . "\n";

can you give me an example to add the portion of $y[1] please..

Which one have your interests , writing a script to match the most
similar examples in one rule or writing each script for each FQDN?

for example sometimes we see
http://down2.xn55571528exgem0o65xymsgtmjiy75924mjqqybp.example.com/folder/filename.ext
or
http://cdn.xn55571528exgem0o65xymsgtmjiy75924mjqqybp.example2.com/xn55571528exgem0o65xymsgtmjiy75924mjqqybp/folder/filename.ext

really that is interesting to me , that is why I would love to match
this too as well but the thing is if I knew all of these things ..
everything would be fine for me

Again I want to thank you for answering my questions as I felt like Im
writing a magazine heheheh

Regards,
Ghassan
Received on Mon May 30 2011 - 23:54:12 MDT

This archive was generated by hypermail 2.2.0 : Tue May 31 2011 - 12:00:03 MDT