Re: [squid-users] Youtube Issue!

From: Ghassan Gharabli <sounarose_at_googlemail.com>
Date: Sun, 27 Nov 2011 05:01:45 +0200

BTW, That what was happeing to me while testing YT & Ofcourse you cant
even think of caching videos after being skipped by the client .

Concerning the FLV Object , yes I have noticed from before that when
you upload a youtube Video then they split the whole video into frames
which seems to send different objects with the same Video ID ..
ofcourse this one should be ignored by Squid .

302 Redirection was only found in "240p" FLV by default and for sure I
have applied the code just not to hit LOOP .

ACCESS.LOG
-------------------
1322360339.081 88 192.168.10.14 TCP_HIT/200 86436 GET
http://o-o.preferred.orange-par1.v3.lscache3.c.youtube.com/videoplayback?sparams=id%2Cexpire%2Cip%2Cipbits%2Citag%2Csource%2Calgorithm%2Cburst%2Cfactor%2Ccp&fexp=907605%2C912600%2C915002&algorithm=throttle-factor&itag=34&ip=84.0.0.0&burst=40&sver=3&signature=712F1A94A31D43D03E1DB0F67FF9B7F1A9EDA4EC.029774C29E789ACC1D557E1172163D90F6610205&source=youtube&expire=1322384400&key=yt1&ipbits=8&factor=1.25&cp=U0hRR1NTUl9FSkNOMV9LTVZFOkpsV3BkS1RxZXNF&id=283246f338ece5ad
- NONE/- video/x-flv
1322360339.242 445 192.168.10.14 TCP_MISS/204 229 GET
http://clients1.google.com/generate_204 - DIRECT/209.85.148.138
text/html
1322360339.549 453 192.168.10.14 TCP_MISS/204 422 GET
http://s.youtube.com/stream_204?event=streamingerror&erc=1&retry=1&ec=100&fexp=912600,907605,915002&plid=AASyrgMkZZEo1OUT&v=KDJG8zjs5a0&el=detailpage&rt=0.749&fmt=34&shost=o-o.preferred.orange-par1.v3.lscache3.c.youtube.com&scoville=1&fv=WIN%2011,0,1,152
- DIRECT/74.125.39.100 text/html
1322360339.619 434 192.168.10.14 TCP_MISS/204 422 GET
http://s.youtube.com/stream_204?fv=WIN%2011,0,1,152&event=streamingerror&el=detailpage&erc=2&rt=0.873&fexp=912600,907605,915002&fmt=34&v=KDJG8zjs5a0&shost=tc.v3.cache3.c.youtube.com&plid=AASyrgMkZZEo1OUT&scoville=1&ec=100
- DIRECT/74.125.39.101 text/html
1322360340.112 10781 192.168.10.14 TCP_MISS/204 230 GET
http://o-o.preferred.orange-par1.v3.lscache3.c.youtube.com/generate_204?sparams=id%2Cexpire%2Cip%2Cipbits%2Citag%2Csource%2Calgorithm%2Cburst%2Cfactor%2Ccp&fexp=907605%2C912600%2C915002&algorithm=throttle-factor&itag=34&ip=84.0.0.0&burst=40&sver=3&signature=712F1A94A31D43D03E1DB0F67FF9B7F1A9EDA4EC.029774C29E789ACC1D557E1172163D90F6610205&source=youtube&expire=1322384400&key=yt1&ipbits=8&factor=1.25&cp=U0hRR1NTUl9FSkNOMV9LTVZFOkpsV3BkS1RxZXNF&id=283246f338ece5ad
- DIRECT/64.15.118.50 text/html
1322360341.351 10833 192.168.10.14 TCP_MISS/204 422 GET
http://s.youtube.com/stream_204?rt=0.460&fmt=34&el=detailpage&shost=o-o.preferred.orange-par1.v3.lscache3.c.youtube.com&scoville=1&ec=100&event=streamingerror&retry=1&erc=1&fv=WIN%2011,0,1,152&plid=AASyrgKgSyateKe8&fexp=912600,907605,915002&v=KDJG8zjs5a0
- DIRECT/74.125.39.102 text/html
1322360341.818 2729 192.168.10.14 TCP_HIT/200 2376087 GET
http://tc.v3.cache3.c.youtube.com/videoplayback?fexp=907605%2C912600%2C915002&key=yt1&ipbits=8&burst=40&sver=3&algorithm=throttle-factor&signature=712F1A94A31D43D03E1DB0F67FF9B7F1A9EDA4EC.029774C29E789ACC1D557E1172163D90F6610205&id=283246f338ece5ad&factor=1.25&expire=1322384400&itag=34&source=youtube&sparams=id%2Cexpire%2Cip%2Cipbits%2Citag%2Csource%2Calgorithm%2Cburst%2Cfactor%2Ccp&ip=84.0.0.0&cp=U0hRR1NTUl9FSkNOMV9LTVZFOkpsV3BkS1RxZXNF&playretry=1
- NONE/- video/x-flv

AS you can see , It is moving one time but causing error at FLV Player .

I need someone to test this URL
http://www.youtube.com/watch?v=KDJG8zjs5a0

If someone is interested :

#your perl location in here, mine is #!/bin/perl
$|=1;
while (<>) {
    @X = split;
    $x = $X[0];
    $_ = $X[1];
                        # youtube 1024p HD itag=37, 720p HD itag=22
} if (m/^http:\/\/([0-9.]{4}|.*\.youtube\.com|.*\.googlevideo\.com|.*\.video\.google\.com).*?\&(itag=37|itag=22).*?\&(id=[a-zA-Z0-9]*)/)
{
        print $x . "http://video-srv.youtube.com.SQUIDINTERNAL/" . $2 . "&" .
$3 . "\n";
                        # youtube 360p itag=34 ,480p itag=35 and others
} elsif (m/^http:\/\/([0-9.]{4}|.*\.youtube\.com|.*\.googlevideo\.com|.*\.video\.google\.com)\/.*?(itag=[0-9]*).*?(id=[a-zA-Z0-9]*)/)
{
        print $x . "http://video-srv.youtube.com.SQUIDINTERNAL/" . $3 . "\n";
                
} elsif (m/^http:\/\/([0-9.]{4}|.*\.youtube\.com|.*\.googlevideo\.com|.*\.video\.google\.com)\/.*?(id=[a-zA-Z0-9]*).*?(itag=[0-9]*)/)
{
        print $x . "http://video-srv.youtube.com.SQUIDINTERNAL/" . $2 . "\n";
    } else {
        print $x . $_ . "\n";
    }
}

I didnt add "\&" because sometimes "ITAG" comes like
"videoplayback?itag=34" same thing for "ID"

Now Im only getting errors on those videos with 302 Redirection and
Loop patch was applied successfully before compiling Squid and
access.log shows that it is normally moving to the location of the
video url but the 2 URLs are being cached since we are caching
"/videoplayback\?" and both are producing FLV Videos.

When somebody skip the portion of the video to a timestap which hasnt
been downloaded yet then YT adds to its URL something like
&begin=[0-9]. I have denied caching those URLs because it will make
your cache directory bigger & more bigger by a short time.

Ghassan

On Sun, Nov 27, 2011 at 4:02 AM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
> On 27/11/2011 5:32 a.m., Ghassan Gharabli wrote:
>>
>> Hello Amos,
>>
>>
>> Finally, I have almost captured the most YouTube Videos except
>> something I want to get some asistance from you .
>>
>>
>> As I have tested before and tried so many times .. Chudy's script is
>> outdated.
>>
>> After testinig and logging Youtube Videos . I finally have found
>> something not being fully cached . If you still remember I have said
>> before with my old messages that ID isnt being captured in all places
>> but its okay I have done this . I will post my details after I
>> completelly finish them.
>>
>> Could you please explain to me whats happening here?
>>
>> If&range=13-2375679 was found in a URL then Squid doesnt understand
>> how to cache the full video .. as it only cache the first 13 seconds I
>> guess! and then it stops . If I try to download this finished cached
>> movie then you notice its size about 2.2 MB . You try to remove it
>> from cache then Squid cant even find it as it claims not cached but
>> shows TCP_HIT in access.log . STRANGE!
>
> (NP: by remove you mean PURGE request? HUT just means cached data was found
> to service the request, which is right since purging the data involves
> locating it (HITing) before erasing the cached entry. Followup requests
> after the purge should not be HIT.).
>
> I took a look at these"range" replies being generated by YT a while back.
>
> What I found was that a request for video URL would send back a FLV object
> with bytes eg "[SWF...]ABCDEFGH". All fine and good this is the cacheable
> video.
>
> If the user skips around in the video the player generates a range= request
> stating what timestamp or bytes they want to strat at. Its not clear which
> due to the reply which comes back having a *different* byte sequence than
> the video at the same URL.  For example, on the "[SWF...]ABCDEFGH" video it
> would produce:   "[SWF...]EFGH" or something similar.
>
> Under the HTTP rules the range object to be combined must be a snippet
> portion of the base object (range 4-999, should have been just "DEFGH"). By
> adding the SWF headers on each reply YT are making them unique and different
> objects. Combining them in the middle (ie by a caching app) will cause
> errors in the binary object and crash the Flash player or cause it to
> display an error message instead of the video
>
> This range request only seems to happen if the user skips into a portion of
> video the player has not yet downloaded. So sending them the whole video,
> which is what we try to do with Squid, will cause a display lag for the user
> but not cause problems in their player.
>
>
>>
>> Now look into this URL:
>> -------------------------------
>>
>>
>> "http://o-o.preferred.orange-par1.v4.lscache7.c.youtube.com/videoplayback?sparams=id%2Cexpire%2Cip%2Cipbits%2Citag%2Csource%2Calgorithm%2Cburst%2Cfactor%2Ccp&fexp=907605%2C912600%2C915002&algorithm=throttle-factor&itag=34&ip=84.0.0.0&burst=40&sver=3&signature=8223490C23E48CB708E04666E4
>>
>> A550422757CEC6.9D8D78E66DD14FEFC4B5F960F493ED4CDFD7C51C&source=youtube&expire=13
>>
>> 22348400&key=yt1&ipbits=8&factor=1.25&cp=U0hRR1NPVl9FSkNOMV9LSVpFOkpsV3BkS1B1ZXN
>> F&id=e120643085f56831&range=13-2375679"
>>
>> HTTP/1.0 200 OK
>> Last-Modified: Fri, 27 Nov 2009 12:44:54 GMT
>> Content-Type: video/x-flv
>> Date: Sat, 26 Nov 2011 16:06:29 GMT
>> Expires: Sat, 26 Nov 2011 16:06:29 GMT
>> Cache-Control: private, max-age=24511
>> Accept-Ranges: bytes
>> Content-Length: 2375667
>> X-Content-Type-Options: nosniff
>> Server: gvs 1.0
>> X-Cache: MISS from Peer6
>> X-Cache-Lookup: MISS from Peer6:3128
>> Connection: close
>>
>> Whats the job of "Accept_ranges: bytes" here?
>
> Accept-* means the software producing that reply or request supports a
> certain HTTP feature. In this case it is Squid and maybe the server as well
> supporting HTTP range requests. Not related to YT particulary.
>
>>
>> And the very confusion again you can see another similar URL with the
>> same "/videoplayback?.*(id)" and here comes the ID inthe end of this
>> URL then moves temporary just . I must mention that this URL sends the
>> FLV url as Squid already read it in access.log and then it dds
>> &ir=1&playretry=1 or pr=1&playretry which means Squid would be
>> confused to cache it 2 times (FLV).
>>
>> EXAMPLE:
>> ---------------
>>
>>
>> "http://o-o.preferred.orange-par1.v3.lscache3.c.youtube.com/videoplayback?sparams=id%2Cexpire%2Cip%2Cipbits%2Citag%2Csource%2Calgorithm%2Cburst%2Cfactor%2Ccp&fexp=908525%2C910207%2C916201&algorithm=throttle
>>
>> -factor&itag=34&ip=84.0.0.0&burst=40&sver=3&signature=0489805DCC95F6EADBA9D43C3F
>>
>> D8C107FC768662.73AA6897FE78CF78BE7819E089F1A4FC47534C7D&source=youtube&expire=13
>>
>> 22344800&key=yt1&ipbits=8&factor=1.25&cp=U0hRR1NPUl9FSkNOMV9LSVZJOmdmQWdwWC01dlp
>> n&id=283246f338ece5ad"
>>
>> HTTP/1.0 302 Moved Temporarily
>> Last-Modified: Wed, 02 May 2007 10:26:10 GMT
>> Date: Sat, 26 Nov 2011 15:50:47 GMT
>> Expires: Sat, 26 Nov 2011 15:50:47 GMT
>> Cache-Control: private, max-age=900
>> Location:
>> http://r9.orange-par2.c.youtube.com/videoplayback?sparams=id%2Cexpire%
>>
>> 2Cip%2Cipbits%2Citag%2Csource%2Calgorithm%2Cburst%2Cfactor%2Ccp&fexp=908525%2C91
>>
>> 0207%2C916201&algorithm=throttle-factor&itag=34&ip=84.0.0.0&burst=40&sver=3&sign
>>
>> ature=0489805DCC95F6EADBA9D43C3FD8C107FC768662.73AA6897FE78CF78BE7819E089F1A4FC4
>>
>> 7534C7D&source=youtube&expire=1322344800&key=yt1&ipbits=8&factor=1.25&cp=U0hRR1N
>> PUl9FSkNOMV9LSVZJOmdmQWdwWC01dlpn&id=283246f338ece5ad&ir=1
>> X-Content-Type-Options: nosniff
>> Content-Type: text/html
>> Server: gvs 1.0
>> Age: 2068
>> Content-Length: 0
>> X-Cache: HIT from Peer6
>> X-Cache-Lookup: HIT from Peer6:3128
>> Connection: close
>
> This is the 302 redirect Adrian and Chudy were discussing at the end of the
> wiki page. If you cache it with storeurl_access reductions it will loop
> infinitely back at itself.
>
> Amos
>
>
Received on Sun Nov 27 2011 - 03:01:52 MST

This archive was generated by hypermail 2.2.0 : Sun Nov 27 2011 - 12:00:02 MST