On Wed, Aug 21, 2013 at 05:27:55PM +0100, Andrew Wood wrote:
> Hi
> 
> Can someone please help me work out an algorithm  to remove overlapping 
> subdomains from a blackclist such as shallalist to prevent errors such as:
> 
>  ERROR: 'interracialcandy.tumblr.com' is a subdomain of '.tumblr.com'
> 2013/08/21 17:18:41| ERROR: because of this '.tumblr.com' is ignored to 
> keep splay tree searching predictable
> 2013/08/21 17:18:41| ERROR: You should remove 
> 'interracialcandy.tumblr.com' from the ACL named 'ProhibitedSitesDomains'
Is it your intention to block tumblr.com and all subdomains ?
And are .tumblr.com (non-adult) and interracialcandy.tumblr.com (adult) 
both in the same list ?
You can use ufdbguard, a URL filter for Squid.
The ufdbguard software suite has a utility called ufdbGenTable that
converts text files with domains and URLs to a database table and
in this conversion process emits similar errors but behaves different
from Squid: if both subdomain.example.com and example.com are in a 
list, ufdbGenTable puts example.com in the table which effectively blocks
example.com and all subdomains.
Marcus
> The problem is that TLDs like .com or .net are easy but some domains 
> have two 'tlds' such as .co.uk (yes I know strictly thats not a tld but 
> you know what i mean!)  and there are so many different country domains 
> some with two levels and the possibility of more in the future how can I 
> make it future proof?
> 
> Im sure Im not the only one to tear my hair out on this but I cant find 
> a solution anywhere.
> Perhaps we can calaborate on here to produce a Perl or Python script 
> which anyone can use?
> 
> Thanks
> Andrew
> 
> 
Received on Wed Aug 21 2013 - 19:48:18 MDT
This archive was generated by hypermail 2.2.0 : Thu Aug 22 2013 - 12:00:10 MDT