Re: [squid-users] Searching squid logs for pornographic sites

From: Rob Asher <rasher_at_paragould.k12.ar.us>
Date: Wed, 11 Jun 2008 16:03:13 -0500

Here's something similar to what you're already doing except comparing to a file of "badwords" to look for in the URL's and then emailing you the results.

#!/bin/sh
# filter.sh
#
cd /path/to/filterscript
cat /var/log/squid/access.log | grep -if /path/to/filterscript/badwords > hits.out

/path/to/filterscript/wordfilter.gawk hits.out

cat /path/to/filterscript/word-report | /bin/mail -s "URL Filter Report" you_at_yourdomain.com

rm hits.out

#!/bin/gawk -f
# wordfilter.gawk

BEGIN {
print "URL Filter Report:" > "/path/to/filterscript/word-report"
print "--------------------------------------" >> "/path/to/filterscript/word-report"
sp = " -> "
}

{
print strftime("%m-%d-%Y %H:%M:%S",$1), sp, $8 >> "/path/to/filterscript/word-report"
print $7 >> "/path/to/filterscript/word-report"
print "" >> "/path/to/filterscript/word-report"
}

You may need to adjust the columns printed in the awk script. They're set for username instead of IP's. Also, you'll need to make a "/path/to/filterscript/badwords" file with the words/regex you want to search for....one per line. Someone with better regex skills could probably eliminate a lot "false" hits with specific patterns in the "badwords" file. I'm using this in addition to squidGuard and blacklists to catch URL's that were missed so the output isn't near as large as what you're getting.

Rob

-------------------------------------
Rob Asher
Network Systems Technician
Paragould School District
(870)236-7744 Ext. 169

>>> "Steven Engebretson" <sengebretson_at_blakeschool.org> 6/11/2008 1:32 PM >>>
I am looking for a tool that will scan the access.log file for pornographic sites, and will report the specifics back. We do not block access to any Internet sites, but need to monitor for objectionable content.

What I am doing now is just greping for some key words, and dumping the output into a file. I am manually going through about 60,000 lines of log file, following my grep. 99% of these are false. Any help would be appreciated.

Thank you all.

-Steven E.

----------

This message has been scanned for viruses and
dangerous content by the Paragould School District
MailScanner, and is believed to be clean.

----------

This message has been scanned for viruses and
dangerous content by the Paragould School District
MailScanner, and is believed to be clean.
Received on Wed Jun 11 2008 - 21:03:55 MDT

This archive was generated by hypermail 2.2.0 : Thu Jun 12 2008 - 12:00:04 MDT