google's safe browsing API and URL canonicalization

From: Adrian Chadd <adrian@dont-contact.us>
Date: Fri, 22 Jun 2007 17:14:58 +0800

I've been toying around with implementing an external_acl module
to check against phishtank.org's database, but the problem is
comparing URLs to make sure that minor semantic variations to a
malware URL (/./, host capitalisation, user:pass@, %-escape, etc)
are worked around.

I then stumbled across the Google Safe Browsing API which has
a section on URL canonicalization, which pretty much encompasses
all the bits I was thinking about.

http://code.google.com/apis/safebrowsing/developers_guide.html

Comments?

Adrian
Received on Fri Jun 22 2007 - 03:13:37 MDT

This archive was generated by hypermail pre-2.1.9 : Sun Jul 01 2007 - 12:00:07 MDT