Re: [squid-users] anyone knows some info about youtube "range" parameter? from Hasanen AL-Bana on 2012-04-27 (squid-users)

From: Hasanen AL-Bana <hasanen_at_gmail.com>
Date: Fri, 27 Apr 2012 11:56:14 +0300

I get around 40,000 req/min, the server is Dell R510 with Xeon cpu and
48GB of RAM, all disks are SAS (1.2TB)
Reducing the number of url_rewriters cause squid to stop working and
cache.log says more url_rewriters are needed...ah I forgot to say that
I have many URL_REWRITERS beside my store_url rewriters.

On Fri, Apr 27, 2012 at 10:04 AM, Eliezer Croitoru <eliezer_at_ngtech.co.il> wrote:
> On 27/04/2012 09:52, Hasanen AL-Bana wrote:
>>
>> On Fri, Apr 27, 2012 at 7:43 AM, Eliezer Croitoru<eliezer_at_ngtech.co.il>
>> wrote:
>>>
>>> On 25/04/2012 20:48, Hasanen AL-Bana wrote:
>>>>
>>>>
>>>> wouldn't be better if we save the video chunks ? youtube is streaming
>>>> files with 1.7MB flv chunks, youtube flash player knows how to merge
>>>> them and play them....so the range start and end will alaways be the
>>>> same for the same video as long as user doesn't fast forward it or do
>>>> something nasty...even in that case , squid will just cache that
>>>> chunk...that is possible by rewriting the STORE_URL and including the
>>>> range start& end
>>>>
>>>>
>>>> On Wed, Apr 25, 2012 at 8:39 PM, Ghassan Gharabli
>>>> <sounarose_at_googlemail.com> wrote:
>>>
>>>
>>> <SNIP>
>>>
>>> i have written a small ruby store_url_rewrite that works with range
>>> argument
>>> in the url.
>>> (on the bottom of this mail)
>>>
>>> it's written in ruby and i took some of andre work at
>>> http://youtube-cache.googlecode.com
>>>
>>> it's not such a fancy script and ment only for this specific youtube
>>> problem.
>>>
>>> i know that youtube didnt changed the this range behavior for the whole
>>> globe cause as for now i'm working from a remote location that still has
>>> no
>>> "range" at all in the url.
>>> so in the same country you can get two different url patterns.
>>>
>>> this script is not cpu friendly (uses more the same amount of regex
>>> lookups
>>> always) but it's not what will bring your server down!!!
>>
>>
>> That is why I am going to write it in perl, in my server I might need
>> to run more than 40 instances on the script and perl is like the
>> fastest thing I have ever tested
>
> i have tried couple of languages to do almost the same thing.
> what i do want is that the code will be readable and efficient.
> i have used until now ruby perl python and JAVA for this specific task ruby
> was fast and usable but JAVA was much more superior to all the others
> combining most of what i needed.
> regex on JAVA was so different then perl and the others so i used the basic
> string classes of JAVA to implement these features.
>
> i hope to see your perl code.
>
> by the way 40 instances are not really needed for most of the servers i have
> seen until now.
> 20 should be more then you need.
>
> what is the "size" of this server? req per sec? bandwidth?cpu? ram? cache
> space?
>
> Regards,
> Eliezer
>
>
>>
>>>
>>> this is only a prototype and if anyone wants to add some more domains and
>>> patterns i will be more then glad to make this script better then it's
>>> now.
>>>
>>> this is one hell of a regex nasty script and i could have used the uri
>>> and
>>> cgi libs in order to make the script more user friendly but i choose to
>>> just
>>> build the script skeleton and move on from there using the basic method
>>> and
>>> classes of ruby.
>>>
>>> the idea of this script is to extract each of the arguments such as id
>>> itag
>>> and ragne one by one and to not use one regex to extract them all because
>>> there are couple of url structures being used by youtube.
>>>
>>> if someone can help me to reorganize this script to allow it to be more
>>> flexible for other sites with numbered cases per
>>> site\domain\url_structure i
>>> will be happy to get any help i can.
>>>
>>> planned for now to be added into this scripts are:
>>> source forge catch all download mirrors into one object
>>> imdb HQ (480P and up) videos
>>> vimeo videos
>>>
>>> if more then just one man will want:
>>> bliptv
>>> some of facebook videos
>>> some other images storage sites.
>>>
>>> if you want me to add anything to my "try to cache" list i will be help
>>> to
>>> hear from you on my e-mail.
>>>
>>> Regards,
>>> Eliezer
>>>
>>>
>>> ##code start##
>>> #!/usr/bin/ruby
>>> require "syslog"
>>>
>>> class SquidRequest
>>> attr_accessor :url, :user
>>> attr_reader :client_ip, :method
>>>
>>> def method=(s)
>>> @method = s.downcase
>>> end
>>>
>>> def client_ip=(s)
>>> @client_ip = s.split('/').first
>>> end
>>> end
>>>
>>> def read_requests
>>> # URL<SP> client_ip "/" fqdn<SP> user<SP> method [<SP>
>>> kvpairs]<NL>
>>> STDIN.each_line do |ln|
>>> r = SquidRequest.new
>>> r.url, r.client_ip, r.user, r.method, *dummy =
>>> ln.rstrip.split(' ')
>>> (STDOUT<< "#{yield r}\n").flush
>>> end
>>> end
>>>
>>> def log(msg)
>>> Syslog.log(Syslog::LOG_ERR, "%s", msg)
>>> end
>>>
>>> def main
>>> Syslog.open('nginx.rb', Syslog::LOG_PID)
>>> log("Started")
>>>
>>> read_requests do |r|
>>> idrx = /.*(id\=)([A-Za-z0-9]*).*/
>>> itagrx = /.*(itag\=)([0-9]*).*/
>>> rangerx = /.*(range\=)([0-9\-]*).*/
>>>
>>> newurl = "http://video-srv.youtube.com.SQUIDINTERNAL/id_" +
>>> r.url.match(idrx)[2] + "_itag_" + r.url.match(itagrx)[2] + "_range_" +
>>> r.url.match(rangerx)[2]
>>>
>>> log("YouTube Video [#{newurl}].")
>>>
>>> newurl
>>> end
>>> end
>>>
>>> main
>>> ##code end#
>>>
>>>
>>>
>>> --
>>> Eliezer Croitoru
>>> https://www1.ngtech.co.il
>>> IT consulting for Nonprofit organizations
>>> eliezer<at> ngtech.co.il
>
>
>
> --
> Eliezer Croitoru
> https://www1.ngtech.co.il
> IT consulting for Nonprofit organizations
> eliezer <at> ngtech.co.il
Received on Fri Apr 27 2012 - 08:56:42 MDT

This archive was generated by hypermail 2.2.0 : Sun Apr 29 2012 - 12:00:04 MDT