Re: xxx-rated & cache traffic management

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Tue, 20 May 1997 17:40:09 +0300 (EETDST)

> > We are here approaching situation that more than 50% of all Web traffic
> > is porno ;( filling all available resources (disks, links, minds).
>
> Let me have a copy of your analysis script :) Seriously, if this
> really saves bandwidth, this is good...

 I used 3 acl's to redirect porn traffic:

acl ipnummer url_regex ://([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)/
 ( as these usually are porn or warez sites. Of course, some legal sites
   might fall in here, but thats very few.)

acl porno url_regex
(pussy|blonde|fuck|porn|sex|babe|girl|xxx|erotic|adult|female|oriental|nude|
naked|hardcore|teen|amateur)

acl pornsites url_regex
(freepics|playboy|fetish|tittycity|happytime|penthouse|hardporn|netbeauties)

 If You want to analyze your access.logs I suggest to feed this to egrep
 and sum its output by any access.log analyzing scripts. For now, I have
 only one script to analyze my cache swaplog and this gives me about 25%
 of cache volume. Most porn sites use alot of cgi-s and "members" access,
 so these do not pop up in swaplog as non-cacheable. Also, my regex miss
 lots of urls that have no hint on their content and i didn't want to do
 redirection on a per site basis.

 If you think you'd like to have a script, no prob, but it's quick&dirty
 and pretty slow...

> > Also, while stall time ticks, incoming tcp windows fill up and next read
> > will give upto 16KB in a row (more than average object size)
>
> How did you implement the delay without holding up the rest of the scheduled
> events?

    In comm_select(), after icp sockets are serviced, I call set_stall for
 every FD that has lifetime > 1000, and made stallDelay configurable. Thus,
 next time select loop just ignores these FD's until stallDelay expires.
 No problems with other activities...

> > As this is actually more general resource management and is partly in
> > ToDo list, I wanted to know if anyone has been working in this direction
> > and what are general thoughts regarding this matter.
>
> Before getting into this, I have a simpler idea. How would you write
> code to compute the effective miss KB/s for certain clients or groups
> of clients? Can the stats in access_log be used?

    Yes, I used to calc this in my small awk script that I use to analyze
 my logs (http://cache.online.ee:81/cache/stats/)
    I just sum up all times that misses have taken and summ up all bytes
 imported. then just divide the two to get average rate per object, or,
 divide total bytesum by total time of running to get overall rate for
 this type of traffic. To find out these stats for your group of interest,
 just feed this script with access.log grepped by your criteria.

-------------------------------------------------------------------
 Andres Kroonmaa Telefon: 6308 909
 Network administrator
 E-mail: andre@ml.ee Phone: (+372) 6308 909
 Organization: MicroLink Online
 EE0001, Estonia, Tallinn, Sakala 19 Fax: (+372) 6308 901
-------------------------------------------------------------------
Received on Tue Jul 29 2003 - 13:15:41 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:18 MST