[PATCH] regular expression optimisation patch for squid 3.1.14

From: Marcus Kool <marcus.kool_at_urlfilterdb.com>
Date: Tue, 05 Jul 2011 16:14:54 -0300

Attached is a patch for optimisation of REs.
This is the second submission of the patch and the comments from
Amos' review are addressed.

This patch is inspired by the work that I did for ufdbGuard and a few emails with Amos.

The new code optimises lists of regular expressions.
The optimisations are:
* initial .* is stripped
* RE-1 RE-2 ... RE-n are joined into one large RE: (RE-1)|(RE-2)|...|(RE-n)
* -i ... -i options are optimised: the second one is ignored, same for +i

The only modified file is src/acl/RegexData.cc

attached are the patch (RegexData.cc.patch) and files for a unit test:
squidtest.conf
re.4lines - used in squidtest.conf; contains REs
re.200lines - used in squidtest.conf; contains REs
unittest_re_optim_wget - script with wget commands to trigger squid to evaluate REs

unittest_re_optim_wget contains instructions on how to setup and perform a unit test

I tried to get a member of the squid-dev mailing list but are not yet
so comments should also go to my email address directly.

Marcus Kool

Marcus Kool wrote:
>
> Amos Jeffries wrote:
>> > Amos Jeffries wrote:
>> >> Hi Marcus,
>> >> Did my audit feedback on this make it to you? I've just noticed my
>> >> mailer has not marked the thread as responded.
>> >>
>>
>> On 01/07/11 00:52, Marcus Kool wrote:
>>> No, it did not.
>>
>> Okay. My mailer seems to have screwed up badly. There were a few
>> little minor bits.
>>
>> * the patch being reversed. Just order the files the other way around
>> on next patch.
>>
>> compileOptimisedREs/compileUnoptimisedREs have duplicate code checking
>> for (RElen > BUFSIZ+1) case on the wordlist key. They are already
>> checked for that criteria by aclParseRegexList before adding.
>>
>> debugs() WARNING to the user should be DBG_IMPORTANT in the second
>> parameter.
>>
>> The major problem debugs() need DBG_CRITICAL in parameter #2 and
>> "ERROR:" instead of the function name.
>>
>> The >100 messages only need to be shown when checking the config for
>> problems. ie.
>> debugs(28, (opt_parse_cfg_only?DBG_IMPORTANT:2), ....
>
> Thanks for the feedback, I will make a new patch. I was not able to
> do it to be included in the next releases but it will be soon.
>
>>
>> None else has mentioned anything, so with these style tweaks it can go
>> in. The next releases are planned to happen tomorrow. If you want to
>> submit a new patch in the next 12hrs I'll use that.
>>
>>>
>>> I tried to subscribe to the squid-dev mailing list the other day
>>> but got no reply yet. But in the list archives I did not see any
>>> response/feedback either.
>>
>> I saw that arrive. So whoever was moderating this week appears to have
>> has okayed you for posting. If you went through the regular ezmail
>> subscription process (mail to squid-dev-subscribe_at_squid-cache.org) you
>> should have been receiving list mail for a few days?
>
> I have not yet received emails from squid-dev. Should I resend
> the application ?
>
>> Amos
>
> Marcus
>

Received on Tue Jul 05 2011 - 19:15:06 MDT

This archive was generated by hypermail 2.2.0 : Tue Jul 12 2011 - 12:00:03 MDT