Re: Deny access for non standard browsers

From: Benarson Behajaina <Benarson.Behajaina@dont-contact.us>
Date: Wed, 28 Jul 1999 09:48:51 +0200 (MET DST)

Dave J Woolley wrote:
>
> > From: Benarson Behajaina [SMTP:Benarson.Behajaina@swh.sk]
> >
> > I reconfigured my Squid to deny access for non standard
> > browers (GetRight, Wget, Lynx, GetSmart etc ...)
> >
> Why? Is your HTML that badly broken? Lynx is probably
> more standard (in the sense of HTML/HTTP, etc.,
> compliance) than Netscape 4.x, and is actually
> reccommended on the squid home page!

Excuse me sir, of course I didn't want to block
Lynx, since I personally have installed Lynx on my Linux long time ago,
and I sometimes use it.
All I wanted is to deny clients that are able to saturate our
complete bandwidth. Those clients are so called "download managers"
or "crawlers", some users run them in background 24 hours a day.

So, thank you Dave Woolley for your answer, and I'm sorry
telling you that Lynx is a non standard browser. It was my fault,
really I'm not against the use of Lynx.

I'll solve my problem not selecting a user agent (ACL browser),
but using the Squid DELAY POOL.

> The only real effect of discriminating against Lynx is
> a lot of mail hostile to you on the lynx-dev mailing
> list and an increase in the number of people who override
> the User Agent. wget users have the same option.
>
> If this is really an attempt to exclude crawlers (and
> IE 4 is a crawler, although I think it changes its user
> agent string when crawling), then I admit Lynx is weak in
> not supporting robots.txt, but wget when actually crawling,
> is certainly compliant - it is also used as a front for
> other tools. You can force wget to be badly behaved, but
> then you can write your own crawler or modify the source code
> quite easily.
>
> From what I've heard of IMDB's attempts to control crawlers
> and pre-fetchers, the main problem is from ones that do
> not identify themself in the user agent. IMDB analyse the
> log files, presumably to look for the typical access patterns.
> Most of these will not be configurable like the power users tools,
> Lynx and wget, but will be typical Windows plug and play
> shareware.
>
> Also, some people suppress user agent in their proxies for
> privacy reasons.
>
> The first thing to do if you don't want to be crawled is to
> make sure that you:
>
> - have a policy that makes sense to the users;
>
> - have a robots.txt file that accurately implements that policy
> and is commented to explain the policy;
>
> - explain the policy clearly in a way accessible to interactive
> browsers.
>
> Specifically with Lynx, you should donate code to support
> robots.txt when operating in crawling mode. People may still
> disable it, but those people will try to frustrate any attempt
> you make to shift the balance in favour our your advertisers,
> etc.
>
> (Incidentally, someone is getting quite heavily flamed by
> most of the Lynx developers at the moment for trying to
> defeat IMDB's measures - most of the developers are sympathetic
> to the wishes of content providers.)
>
> Hope I've read correctly between the lines here.
>
>

-- 
*-**-**-**-**-**-**-**-**-**-**-**-**-**-*
| Benarson Rodriguez Behajaina  
| Unix System Administrator     
| SWH Siemens Business Services
|------------------------------
| email :  benarson@swh.sk      
| phone :  +421-7-5968 4921     
| fax   :  +421-7-5968 5403     
*-**-**-**-**-**-**-**-**-**-**-**-**-**-*
Received on Wed Jul 28 1999 - 01:51:50 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:47:35 MST