Re: google's safe browsing API and URL canonicalization

From: Henrik Nordstrom <henrik@dont-contact.us>
Date: Sat, 23 Jun 2007 21:04:31 +0200

fre 2007-06-22 klockan 17:14 +0800 skrev Adrian Chadd:
> I've been toying around with implementing an external_acl module
> to check against phishtank.org's database, but the problem is
> comparing URLs to make sure that minor semantic variations to a
> malware URL (/./, host capitalisation, user:pass@, %-escape, etc)
> are worked around.

Is it a problem? Just do aggressive canonilazation when used in access
control.

> I then stumbled across the Google Safe Browsing API which has
> a section on URL canonicalization, which pretty much encompasses
> all the bits I was thinking about.
>
> http://code.google.com/apis/safebrowsing/developers_guide.html

It's a little more aggressive on unescaping than what can be technically
motivated, but it doesn't really matter much in the context.

It can't be used as a general method as it destroys the actual requested
URL, but quite fine in access control.

Also to label IP addresses on any other form than quad dotted decimal as
legal within HTTP is misnomer, even if many browsers incorrectly accept
them as valid..

Regards
Henrik

Received on Sat Jun 23 2007 - 13:04:37 MDT

This archive was generated by hypermail pre-2.1.9 : Sun Jul 01 2007 - 12:00:07 MDT