[squid-users] Squid & Spamassassin

From: Chijioke Kalu <kchijioke@dont-contact.us>
Date: Mon, 09 Jun 2003 22:07:59 -0700

Hi Daune, Henrik & Adrian,

I figured I write to you guys first, cause your on the developers list and
may be able to point me in the right direction if possible assist in the
project.

I attached a letter written to Spam Assassin developers list, pls read it to
first understand the problem. The second letter, a reply from a developer
of spamassassin is what prompted me to write to you, ave just cut out the
part which makes reference to squid.

Basically, its how to modify the squid proxyto filter http POST request to
spamassassin, which will then filter it for spam contents.

Hope to hear from any of you soon

Attached Text:
----------------
Chijioke Kalu said:

>That Subject is what I hope to achieve by joining the SpamAssassin
>Developers Forum, if possible Contribute to via programming of this aspect
>of Spam Assassin.
>
>I am a Nigerian Developer/System Administrator, and as many know, high
>volume of spammails originate from this country, its difficult to eliminate
>by manual means and always bring reprimands from the US government on the
>Nigerian ISPs and in general makes sys admin life a living hell.
>
>The Problem:
>
>This spam mails are sent via http (web mail traffic), thus cant be filtered
>even if you dont have a mail server running, Secondly, mass mailing
>programs, contributed in proxying, thus avoiding smtp ports that are
>blocked and deliver their payload.
>
>My Question:
>
>Can spammassassin be tuned to filter such traffic before it even leaves the
>gateway of the source network?

Yes, this should be possible!

What needs to be done is to make a HTTP proxy server, which can detect a
HTTP POST form submission that looks like a mail message. This should be
quite easy, since mail messages will have at least 1 CGI parameter that is
quite long, over 2Kb or so, e.g.

    POST http://somewebmailservice.com/cgi-bin/submit.cgi HTTP/1.0
    HTTP headers...
    ....

    from=from@address&to=to@address&body=The%20encoded%20text%20of%20
    the%20mail%20message%20lots%20of%20text%20here....

Note also that a mail "body" must contain:

  - lots of %20 space characters
  - several %0a line-feed characters for newlines

if such a CGI parameter is found, the proxy then has to create a "fake"
mail message using that as the text, and pass it to SpamAssassin
somehow. Using the "spamc" client is a low-overhead way to do this.

Given the resulting score from "spamc", it can figure out if there is
a likelihood that the body text looks like spam.

If it does, then the proxy server should return a 4xx HTTP error code,
with an explanatory message in the text, indicating that it was
filtered as possible spam.

Training SpamAssassin's Bayesian learner with a lot of examples of 419
scam mails, will also help a lot in gaining accuracy.

Then, once this is working, what you need is a "transparent HTTP proxy"
which can be used to ensure all HTTP traffic from the internet cafes pass
through this proxy. In other words, a user opens a connection on
port 80 to a website, and the router transparently connects the TCP/IP
traffic to the proxy server, which proxies the HTTP traffic.

http://www.tldp.org/HOWTO/mini/TransparentProxy.html gives some info
on a way to do this with Squid and Linux.

Given this, I would suggest a good way to implement this would be to:

  - write a patch in C for the Squid proxy server, which looks for HTTP
    POST form submissions using long CGI parameters that may be the
    body of a mail message
  - if one is found, it creates a "mail message" using that long text
    parameter, and runs "spamc"
  - if "spamc" says it may be spam, refuse the HTTP request
  - otherwise let it continue

Regarding blocking proxy abuse; this is not so hard. Here's how to do it.
Institute a policy of filtering outgoing IP traffic at your routers, and
block *outgoing* access to the following TCP ports:

        1080, 8080, 3128, 8081, 8001, 8000, 10080

These are common ports used by proxies, which are almost never used for
other services. (in the old days, port 8001 and 8000 were occasionally
used for websites, but I haven't seen one of these in about 4 years ;)

Since proxies are commonly used *inside* a private network to share a
connection, but are virtually never used *outside* a private net for
legitimate purposes, this should not have any serious side effects for
"normal" internet use.

Good luck -- this is an interesting idea, and could have major effects
on the 419'ers. Any other help you may need, feel free to contact
this group and we'll be happy to help!

--j.

----------------

_________________________________________________________________
Protect your PC - get McAfee.com VirusScan Online
http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963

Received on Mon Jun 09 2003 - 23:08:13 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:17:18 MST