Re: [squid-users] anyone knows some info about youtube "range" parameter?

From: Eliezer Croitoru <eliezer_at_ngtech.co.il>
Date: Sun, 29 Apr 2012 00:14:10 +0300

On 27/04/2012 11:56, Hasanen AL-Bana wrote:
> I get around 40,000 req/min, the server is Dell R510 with Xeon cpu and
> 48GB of RAM, all disks are SAS (1.2TB)
> Reducing the number of url_rewriters cause squid to stop working and
> cache.log says more url_rewriters are needed...ah I forgot to say that
> I have many URL_REWRITERS beside my store_url rewriters.
i must say i'm impressed!
it's the second server i'm hearing about in this size and quality of system.

if you do have 40,000 req/min it's makes more sense.
for this kind of system a "compiled" solution is much better with
performance and memory print.
JAVA is one step above the interpreted scripts\programs.
my opinion is that in your case you should use something else then perl
as a url_rewriter store_url_rewrite if the system has kind of static
options.

Regards,
Eliezer
>
> On Fri, Apr 27, 2012 at 10:04 AM, Eliezer Croitoru<eliezer_at_ngtech.co.il> wrote:
>> On 27/04/2012 09:52, Hasanen AL-Bana wrote:
>>>
>>> On Fri, Apr 27, 2012 at 7:43 AM, Eliezer Croitoru<eliezer_at_ngtech.co.il>
>>> wrote:
>>>>
>>>> On 25/04/2012 20:48, Hasanen AL-Bana wrote:
>>>>>
>>>>>
>>>>> wouldn't be better if we save the video chunks ? youtube is streaming
>>>>> files with 1.7MB flv chunks, youtube flash player knows how to merge
>>>>> them and play them....so the range start and end will alaways be the
>>>>> same for the same video as long as user doesn't fast forward it or do
>>>>> something nasty...even in that case , squid will just cache that
>>>>> chunk...that is possible by rewriting the STORE_URL and including the
>>>>> range start& end
>>>>>
>>>>>
>>>>> On Wed, Apr 25, 2012 at 8:39 PM, Ghassan Gharabli
>>>>> <sounarose_at_googlemail.com> wrote:
>>>>
>>>>
>>>> <SNIP>
>>>>
>>>> i have written a small ruby store_url_rewrite that works with range
>>>> argument
>>>> in the url.
>>>> (on the bottom of this mail)
>>>>
>>>> it's written in ruby and i took some of andre work at
>>>> http://youtube-cache.googlecode.com
>>>>
>>>> it's not such a fancy script and ment only for this specific youtube
>>>> problem.
>>>>
>>>> i know that youtube didnt changed the this range behavior for the whole
>>>> globe cause as for now i'm working from a remote location that still has
>>>> no
>>>> "range" at all in the url.
>>>> so in the same country you can get two different url patterns.
>>>>
>>>> this script is not cpu friendly (uses more the same amount of regex
>>>> lookups
>>>> always) but it's not what will bring your server down!!!
>>>
>>>
>>> That is why I am going to write it in perl, in my server I might need
>>> to run more than 40 instances on the script and perl is like the
>>> fastest thing I have ever tested
>>
>> i have tried couple of languages to do almost the same thing.
>> what i do want is that the code will be readable and efficient.
>> i have used until now ruby perl python and JAVA for this specific task ruby
>> was fast and usable but JAVA was much more superior to all the others
>> combining most of what i needed.
>> regex on JAVA was so different then perl and the others so i used the basic
>> string classes of JAVA to implement these features.
>>
>> i hope to see your perl code.
>>
>> by the way 40 instances are not really needed for most of the servers i have
>> seen until now.
>> 20 should be more then you need.
>>
>> what is the "size" of this server? req per sec? bandwidth?cpu? ram? cache
>> space?
>>
>> Regards,
>> Eliezer
>>
>>
>>>
>>>>
>>>> this is only a prototype and if anyone wants to add some more domains and
>>>> patterns i will be more then glad to make this script better then it's
>>>> now.
>>>>
>>>> this is one hell of a regex nasty script and i could have used the uri
>>>> and
>>>> cgi libs in order to make the script more user friendly but i choose to
>>>> just
>>>> build the script skeleton and move on from there using the basic method
>>>> and
>>>> classes of ruby.
>>>>
>>>> the idea of this script is to extract each of the arguments such as id
>>>> itag
>>>> and ragne one by one and to not use one regex to extract them all because
>>>> there are couple of url structures being used by youtube.
>>>>
>>>> if someone can help me to reorganize this script to allow it to be more
>>>> flexible for other sites with numbered cases per
>>>> site\domain\url_structure i
>>>> will be happy to get any help i can.
>>>>
>>>> planned for now to be added into this scripts are:
>>>> source forge catch all download mirrors into one object
>>>> imdb HQ (480P and up) videos
>>>> vimeo videos
>>>>
>>>> if more then just one man will want:
>>>> bliptv
>>>> some of facebook videos
>>>> some other images storage sites.
>>>>
>>>> if you want me to add anything to my "try to cache" list i will be help
>>>> to
>>>> hear from you on my e-mail.
>>>>
>>>> Regards,
>>>> Eliezer
>>>>
>>>>
>>>> ##code start##
>>>> #!/usr/bin/ruby
>>>> require "syslog"
>>>>
>>>> class SquidRequest
>>>> attr_accessor :url, :user
>>>> attr_reader :client_ip, :method
>>>>
>>>> def method=(s)
>>>> @method = s.downcase
>>>> end
>>>>
>>>> def client_ip=(s)
>>>> @client_ip = s.split('/').first
>>>> end
>>>> end
>>>>
>>>> def read_requests
>>>> # URL<SP> client_ip "/" fqdn<SP> user<SP> method [<SP>
>>>> kvpairs]<NL>
>>>> STDIN.each_line do |ln|
>>>> r = SquidRequest.new
>>>> r.url, r.client_ip, r.user, r.method, *dummy =
>>>> ln.rstrip.split(' ')
>>>> (STDOUT<< "#{yield r}\n").flush
>>>> end
>>>> end
>>>>
>>>> def log(msg)
>>>> Syslog.log(Syslog::LOG_ERR, "%s", msg)
>>>> end
>>>>
>>>> def main
>>>> Syslog.open('nginx.rb', Syslog::LOG_PID)
>>>> log("Started")
>>>>
>>>> read_requests do |r|
>>>> idrx = /.*(id\=)([A-Za-z0-9]*).*/
>>>> itagrx = /.*(itag\=)([0-9]*).*/
>>>> rangerx = /.*(range\=)([0-9\-]*).*/
>>>>
>>>> newurl = "http://video-srv.youtube.com.SQUIDINTERNAL/id_" +
>>>> r.url.match(idrx)[2] + "_itag_" + r.url.match(itagrx)[2] + "_range_" +
>>>> r.url.match(rangerx)[2]
>>>>
>>>> log("YouTube Video [#{newurl}].")
>>>>
>>>> newurl
>>>> end
>>>> end
>>>>
>>>> main
>>>> ##code end#
>>>>
>>>>
>>>>
>>>> --
>>>> Eliezer Croitoru
>>>> https://www1.ngtech.co.il
>>>> IT consulting for Nonprofit organizations
>>>> eliezer<at> ngtech.co.il
>>
>>
>>
>> --
>> Eliezer Croitoru
>> https://www1.ngtech.co.il
>> IT consulting for Nonprofit organizations
>> eliezer<at> ngtech.co.il

-- 
Eliezer Croitoru
https://www1.ngtech.co.il
IT consulting for Nonprofit organizations
eliezer <at> ngtech.co.il
Received on Sat Apr 28 2012 - 21:14:21 MDT

This archive was generated by hypermail 2.2.0 : Sun Apr 29 2012 - 12:00:04 MDT