
WebLog Classes - Web Logfile Parsing and Manipulation in Python
---------------------------------------------------------------

Version 0.9

(c) 1998 Mark Nottingham
<mnot@pobox.com>

This software may be freely distibuted, modified and used,
provided that this copyright notice remain intact.

THIS SOFTWARE IS PROVIDED 'AS IS' WITHOUT WARRANTY OF ANY KIND.


These classes allow the user to parse and postprocess Web logfiles. One
of the Parsing classes must always be used first and only once, and then
Postprocessing classes may be used on the resulting instance, if desired.

Thanks to Ben Golding and Jeremy Hylton for their advice.


Parsing Classes
---------------

CommonWebLog - Common (NCSA) Web log parser.
CombinedWebLog - Combined/extended Web log parser (adds referer and agent).
SquidCacheLog - Squid Web Proxy Cache log parser (access.log v1.1)

Postprocessing Classes
----------------------

WebLogUrlParse - parses url and referer (if availalble) for components.
WebLogClean - normalises attributes of Web Log for more accurate analysis.
WebLogResolve - resolves client address to host and/or ip.
WebLogReferType - determines whether a hit is local, offsite, manual, or file.

For full details of the actions of the classes, and their interfaces, read
the comments of the individual modules, as well as their __doc__ methods.
Note that several of the postprocessing classes have specific requirements
for their input.

Examples
--------

A WebLog class can be as easy to use as this, which will print how many hits
pages on your site get:

import CommonWebLog, sys
log = CommonWebLog.Parser(sys.stdin)
hits = {}
while log.getlogent():
	hits[log.url] = hits.get(log.url, 0) + 1
for (page, hit_num) in hits.items():
	print "%s %s" % (hit_num, page)

Several moderately more complex demo scripts come with the WebLog package:

bad_passwords.py - identify bad HTTP authentication attempts.
referers.py - shows what referers go into your pages, by page and referer.
search_terms.py - shows what search terms are used to reach your pages on 
                  popular search engines.
