[PATCH] regular expression optimisation patch for squid 3.1.12

From: Marcus Kool <marcus.kool_at_urlfilterdb.com>
Date: Thu, 02 Jun 2011 10:30:14 -0300

This patch is inspired by the work that I did for ufdbGuard and a few emails with Amos.

Attached is a patch for squid 3.1.12 to optimise lists of regular expressions.
The optimisations are:
* initial .* is stripped
* RE-1 RE-2 ... RE-n are joined into one large RE: (RE-1)|(RE-2)|...|(RE-n)
* -i ... -i options are optimised: the second one is ignored, same for +i

The only modified file is src/acl/RegexData.cc

attached are the patch (RegexData.cc.patch) and files for a unit test:
squidtest.conf
re.4lines - used in squidtest.conf; contains REs
re.200lines - used in squidtest.conf; contains REs
unittest_re_optim_wget - script with wget commands to trigger squid to evaluate REs

unittest_re_optim_wget contains instructions on how to setup and perform a unit test

I am not subscribed to the squid-dev mailing list.
Please reply to my email address also.

Marcus Kool
Marcus.Kool_at_urlfilterdb.com

Amos Jeffries wrote:
> On 01/06/11 09:18, Marcus Kool wrote:
>> Hi,
>>
>> after some emails with Amos I agreed to make a patch for
>> squid to optimise lists of regular expressions. The
>> optimisations are:
>> * initial .* is stripped
>> * RE-1 RE-2 ... RE-n are joined into one large RE:
>> (RE-1)|(RE-2)|...|(RE-n)
>> * -i ... -i options are optimised: the second one is ignored, same for +i
>>
>> The only modified file is src/acl/RegexData.cc
>>
>> My question for submitting the patch:
>> how do want the patch? is the output of the following command OK?
>> LC_ALL=C TZ=UTC0 diff -Naur src/acl/RegexData.cc
>> src/acl/RegexData.cc.orig
>
> That should be fine.
>
>>
>> I used a test set: a squid.conf, two files with regular expressions
>> and a file with wget commands to test URLs.
>> Do you want/need these?
>
> That would be helpful for unit-tests. So yes, thank you.
>
>>
>> How to post the patch ?
>
> As attachment please, with [PATCH] subject prefix and a description
> suitable for commit message. From an email you are happy adding
> permanently to the credits records.
>
>>
>> I am not subscribed to the squid-dev mailing list. Please reply
>> to my email address also.
>>
>> Thanks
>>
>> Marcus Kool
>
> Amos

abc.com
urlfilterdb.com/secret
xs4all.nl/verysecret
cnn.com/public

-i
abc.example.com/scripts/cgi-bin/40example.cgi
-i
foo\.example\.com/html/index\.php
-i
foo\.example\.com/html/asfsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
01john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/01example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/skdfhsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
02john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/02example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/234second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
03john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/03example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/sdfsaassecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
04john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/04example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/345nsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
05john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/05example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/asfkdhsadsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
06john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/06example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/2345234nnsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
07john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/07example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/asd0second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
08john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/08example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/sdgw1second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
09john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/09example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/safn2nsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
10john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/10example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/345n2second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
11john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/11example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/sdfn3nsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
12john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/12example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/sdfbdfsbdsf9second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
13john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/13example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/nsdnfds92nsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
14john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/14example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/ndsnsdansdasecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
15john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/15example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/dsfn3n3nsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
16john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/16example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/nfsdnaosecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
17john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/17example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/oodfmsdjsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
18john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/18example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/dsansdnn3second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
19john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/19example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/n31n1n2nsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
20john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/20example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/nsdadndnxnxnsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
21john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/21example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/fjfkdkdkdsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
22john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/22example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/ndndnddndndsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
23john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/23example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/mmckcmcmcsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
24john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/24example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/gdgdgdgdsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
25john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/25example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/krkrkrkrsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
26john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/26example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/utututututsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
27john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/27example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/kfkfkfkfksecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
28john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/28example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/kkkkksecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
29john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/29example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/qqqqnsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
30john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/30example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/kkkkskskssecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
31john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/31example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/33k3k3second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
32john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/32example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/44k44k4k4second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
33john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/33example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/00d0d0d0second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
34john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/34example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/2k2k2k2second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
35john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/35example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/aaananasecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
36john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/36example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/kwkwkwkwkwsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
37john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/37example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/qkqkqkqkqjsdsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
38john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/38example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/oododofofnsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
39john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/39example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/oeoeoeoekkfnsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
40john.*doe.example.com/.*/index.php

--- src/acl/RegexData.cc 2011-06-02 13:18:27.000000000 +0000
+++ src/acl/RegexData.cc.orig 2011-05-28 13:54:06.000000000 +0000
@@ -4,12 +4,6 @@
  * DEBUG: section 28 Access Control
  * AUTHOR: Duane Wessels
  *
- * Regular Expression Optimisation added by Marcus Kool, June 2011.
- * optimisations:
- * initial .* is stripped
- * RE-1 RE-2 ... RE-n are joined into one large RE: (RE-1)|(RE-2)|...|(RE-n)
- * -i ... -i options are optimised: the second one is ignored
- *
  * SQUID Web Proxy Cache http://www.squid-cache.org/
  * ----------------------------------------------------------
  *
@@ -119,262 +113,49 @@
     return W;
 }
 
-
-_SQUID_INLINE_ static char * removeUnnecessaryWildcards( char * t )
-{
- char * orig = t;
-
- /* NOTE: an initial '.' might seem unnessary but is not;
- * it can be a valid requirement that cannot be optimised
- */
- while (*t == '.' && *(t+1) == '*') {
- t += 2;
- }
-
- if (*t == '\0') {
- debugs(28, 0, "" << cfg_filename << " line " << config_lineno << ": " << config_input_line);
- debugs(28, 0, "WARNING: regular expression '" << orig << "' has only wildcards and matches all strings." );
- return orig;
- }
-
- if (t != orig) {
- debugs(28, 0, "" << cfg_filename << " line " << config_lineno << ": " << config_input_line);
- debugs(28, 0, "WARNING: regular expression '" << orig << "' has unnecessary wildcard(s)" );
- }
-
- return t;
-}
-
-
-static relist ** compileRE( relist **Tail, char * RE, int flags )
-{
- int errcode;
- relist *q;
- regex_t comp;
-
- if (RE == NULL || *RE == '\0')
- return Tail;
-
- if ((errcode = regcomp(&comp, RE, flags)) != 0) {
- char errbuf[256];
- regerror(errcode, &comp, errbuf, sizeof errbuf);
- debugs(28, 0, "" << cfg_filename << " line " << config_lineno << ": " << config_input_line);
- debugs(28, 0, "compileRE: invalid regular expression: '" << RE << "': " << errbuf);
- return NULL;
- }
- debugs(28, 2, "compileRE: compiled '" << RE << "' with flags " << flags );
-
- q = (relist *) memAllocate(MEM_RELIST);
- q->pattern = xstrdup(RE);
- q->regex = comp;
- *(Tail) = q;
- Tail = &q->next;
-
- return Tail;
-}
-
-
-static int compileOptimisedREs( relist **curlist, wordlist * wl )
-{
- relist **Tail;
- relist *newlist;
- relist **newlistp;
- int numREs = 0;
- int totalNumREs = 0;
- int flags = REG_EXTENDED | REG_NOSUB;
- int largeREindex = 0;
- char largeRE[BUFSIZ];
-
- largeRE[0] = '\0';
- newlist = NULL;
- newlistp = &newlist;
-
- while (wl != NULL) {
- int RElen;
- RElen = strlen( wl->key );
-
- if (strcmp(wl->key, "-i") == 0) {
- if (flags & REG_ICASE) {
- /* optimisation of -i ... -i */
- debugs(28, 3, "compileOptimisedREs: optimisation of -i ... -i" );
- }
- else {
- debugs(28, 2, "compileOptimisedREs: -i" );
- newlistp = compileRE( newlistp, largeRE, flags );
- if (newlistp == NULL) {
- aclDestroyRegexList( newlist );
- return 0;
- }
- if (numREs > 1)
- debugs(28, 2, "compileOptimisedREs: " << numREs << " REs are optimised into one RE." );
- flags |= REG_ICASE;
- totalNumREs += numREs;
- largeREindex = numREs = 0;
- largeRE[largeREindex] = '\0';
- }
- }
- else if (strcmp(wl->key, "+i") == 0) {
- if ((flags & REG_ICASE) == 0) {
- /* optimisation of +i ... +i */
- debugs(28, 3, "compileOptimisedREs: optimisation of +i ... +i" );
- }
- else {
- debugs(28, 2, "compileOptimisedREs: +i" );
- newlistp = compileRE( newlistp, largeRE, flags );
- if (newlistp == NULL) {
- aclDestroyRegexList( newlist );
- return 0;
- }
- if (numREs > 1)
- debugs(28, 2, "compileOptimisedREs: " << numREs << " REs are optimised into one RE." );
- flags &= ~REG_ICASE;
- totalNumREs += numREs;
- largeREindex = numREs = 0;
- largeRE[largeREindex] = '\0';
- }
- }
- else if (RElen > BUFSIZ-1) {
- debugs(28, 0, "" << cfg_filename << " line " << config_lineno << ": " << config_input_line);
- debugs(28, 0, "compileOptimisedREs: regular expression is larger than " << BUFSIZ-1 << " characters: " << wl->key );
- debugs(28, 0, "compileOptimisedREs: the above regular expression is skipped" );
- }
- else if (RElen + largeREindex + 3 < BUFSIZ-1) {
- debugs(28, 4, "compileOptimisedREs: adding RE '" << wl->key << "'" );
- if (largeREindex > 0)
- largeRE[largeREindex++] = '|';
- largeRE[largeREindex++] = '(';
- for (char * t = wl->key; *t != '\0'; t++)
- largeRE[largeREindex++] = *t;
- largeRE[largeREindex++] = ')';
- largeRE[largeREindex] = '\0';
- numREs++;
- } else {
- debugs(28, 2, "compileOptimisedREs: buffer full, generating new optimised RE..." );
- newlistp = compileRE( newlistp, largeRE, flags );
- if (newlistp == NULL) {
- aclDestroyRegexList( newlist );
- return 0;
- }
- if (numREs > 1)
- debugs(28, 2, "compileOptimisedREs: " << numREs << " REs are optimised into one RE." );
- totalNumREs += numREs;
- largeREindex = numREs = 0;
- largeRE[largeREindex] = '\0';
- continue; /* do the loop again to add the RE to largeRE */
- }
- wl = wl->next;
- }
-
- newlistp = compileRE( newlistp, largeRE, flags );
- if (newlistp == NULL) {
- aclDestroyRegexList( newlist );
- return 0;
- }
-
- if (numREs > 1)
- debugs(28, 2, "compileOptimisedREs: " << numREs << " REs are optimised into one RE." );
-
- /* no errors, so put the new list at the tail */
- if (*curlist == NULL) {
- *curlist = newlist;
- }
- else {
- for (Tail = curlist; *Tail != NULL; Tail = &((*Tail)->next))
- ;
- (*Tail) = newlist;
- }
-
- if (totalNumREs > 100) {
- debugs(28, 0, "" << cfg_filename << " line " << config_lineno << ": " << config_input_line );
- debugs(28, 0, "compileOptimisedREs: there are " << totalNumREs << " regular expressions. "
- "This is considered bad use of REs. "
- "Consider using less REs or use rules without expressions like 'dstdomain'." );
- }
-
- return 1;
-}
-
-
-static void compileUnoptimisedREs( relist **curlist, wordlist * wl )
+static void aclParseRegexList(relist **curlist);
+void
+aclParseRegexList(relist **curlist)
 {
- int totalNumREs = 0;
     relist **Tail;
- relist **newTail;
+ relist *q = NULL;
+ char *t = NULL;
+ regex_t comp;
+ int errcode;
     int flags = REG_EXTENDED | REG_NOSUB;
 
- for (Tail = curlist; *Tail != NULL; Tail = &((*Tail)->next))
- ;
-
- while (wl != NULL) {
- int RElen;
- RElen = strlen( wl->key );
- if (strcmp(wl->key, "-i") == 0) {
+ for (Tail = (relist **)curlist; *Tail; Tail = &((*Tail)->next));
+ while ((t = ConfigParser::strtokFile())) {
+ if (strcmp(t, "-i") == 0) {
             flags |= REG_ICASE;
+ continue;
         }
- else if (strcmp(wl->key, "+i") == 0) {
+
+ if (strcmp(t, "+i") == 0) {
             flags &= ~REG_ICASE;
+ continue;
         }
- else if (RElen > BUFSIZ-1) {
- debugs(28, 0, "" << cfg_filename << " line " << config_lineno << ": " << config_input_line);
- debugs(28, 0, "compileUnoptimisedREs: regular expression is larger than " << BUFSIZ-1 << " characters: " << wl->key );
- debugs(28, 0, "compileUnoptimisedREs: the above regular expression is skipped" );
- } else {
- newTail = compileRE( Tail, wl->key , flags );
- totalNumREs++;
- if (newTail == NULL) {
- debugs(28, 0, "compileUnoptimisedREs: the above regular expression is skipped" );
- }
- else
- Tail = newTail;
- }
- wl = wl->next;
- }
-
- if (totalNumREs > 100) {
- debugs(28, 0, "" << cfg_filename << " line " << config_lineno << ": " << config_input_line );
- debugs(28, 0, "compileUnoptimisedREs: there are " << totalNumREs << " regular expressions. "
- "This is considered bad use of REs. "
- "Consider using less REs or use rules without expressions like 'dstdomain'." );
- }
-}
 
+ if ((errcode = regcomp(&comp, t, flags)) != 0) {
+ char errbuf[256];
+ regerror(errcode, &comp, errbuf, sizeof errbuf);
+ debugs(28, 0, "" << cfg_filename << " line " << config_lineno << ": " << config_input_line);
+ debugs(28, 0, "aclParseRegexList: Invalid regular expression '" << t << "': " << errbuf);
+ continue;
+ }
 
-static void aclParseRegexList(relist **curlist)
-{
- char *t;
- wordlist *wl = NULL;
-
- while ((t = ConfigParser::strtokFile()) != NULL) {
- t = removeUnnecessaryWildcards(t);
- if (strlen(t) > BUFSIZ-1) {
- debugs(28, 0, "" << cfg_filename << " line " << config_lineno << ": " << config_input_line );
- debugs(28, 0, "aclParseRegexList: regular expression is larger than " << BUFSIZ-1 << " characters: '" << wl->key << "'" );
- debugs(28, 0, "aclParseRegexList: the above regular expression is skipped" );
- }
- else {
- debugs(28, 4, "aclParseRegexList: buffering RE '" << t << "'" );
- wordlistAdd(&wl, t);
- }
- }
-
- if (!compileOptimisedREs(curlist, wl)) {
- debugs(28, 0, "aclParseRegexList: optimisation of regular expressions failed; using fallback method without optimisation" );
- compileUnoptimisedREs(curlist, wl);
+ q = (relist *)memAllocate(MEM_RELIST);
+ q->pattern = xstrdup(t);
+ q->regex = comp;
+ *(Tail) = q;
+ Tail = &q->next;
     }
-
- wordlistDestroy(&wl);
 }
 
 void
 ACLRegexData::parse()
 {
     aclParseRegexList(&data);
-
-#ifdef _SQUID_VERY_VERBOSE_DEBUGGING
- for (relist * l = data; l != NULL; l = l->next) {
- debugs( 28, 2, "ACLRegexData::parse result: '" << l->pattern << "'" );
- }
-#endif
 }
 
 bool

# section 3 is options parsing
# section 28 is ACL
debug_options ALL,1 3,9 28,9

visible_hostname squidunittest.com
max_filedescriptors 1024

via off
follow_x_forwarded_for deny all
forwarded_for off

http_port 33129

icp_port 0
htcp_port 0

# TAG: hierarchy_stoplist
# A list of words which, if found in a URL, cause the object to
# be handled directly by this cache. In other words, use this
# to not query neighbor caches for certain objects. You may
# list this option multiple times.
#
# We recommend you to use at least the following line.
hierarchy_stoplist cgi-bin ?

# TAG: no_cache
# A list of ACL elements which, if matched, cause the reply to
# immediately removed from the cache. In other words, use this
# to force certain objects to never be cached.
#
# You must use the word 'DENY' to indicate the ACL names which should
# NOT be cached.
#
#We recommend you to use the following two lines.
acl QUERY urlpath_regex cgi-bin \?
no_cache deny QUERY

acl microsoft1 dstdomain .microsoft.com .windowsupdate.com .windows.com

acl iabc url_regex -i aaa bbb ccc
acl iabc url_regex -i xaaa xbbb xccc x?ddd
acl iabcIde url_regex -i aaaa bbbb cccc +i dddd eeee
acl iabcidef url_regex -i axx bxx cxx -i dxx exx fxx
acl owc1 url_regex -i .*ABCDEF ..*GHIJKLM
acl simple1 url_regex -i abc.com
acl simple2 url_regex .example.com/abc
acl long1 url_regex -i abcabc -i defdef -i defghi kkklll mmmnnn ooo ppp qqq sss hhh uuu zzz
acl error1 url_regex -i err -i erro -i error where-is-the-error-in-the-long-expression abc[missingbracket errors
acl URLREGEX1 url_regex "/local/test/etc/re.4lines"
acl URLREGEX2 url_regex "/local/test/etc/re.200lines"

# OPTIONS WHICH AFFECT THE CACHE SIZE
# -----------------------------------------------------------------------------
cache_mem 8 MB

cache_swap_low 90
cache_swap_high 91

# LOGFILE PATHNAMES AND CACHE DIRECTORIES
# -----------------------------------------------------------------------------

cache_dir aufs /local/test/cache 128 16 128

logformat combha %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %>Hs %<st %Ss:%Sh %>ha

access_log /local/test/logs/testaccess.log combha
cache_log /local/test/logs/testcache.log
cache_store_log none

pid_filename /local/test/logs/testsquid.pid

refresh_pattern ^ftp: 600 20% 10080
refresh_pattern ^gopher: 600 0% 600
refresh_pattern . 0 20% 4320

shutdown_lifetime 2 seconds

acl nohackers01 dstdomain .xupiter.com

#Recommended minimum configuration:
acl mynet1 src 10.8.0.0/24
acl mynet2 src 10.9.0.0/24
acl mynet3 src 10.0.8.138/32
acl manager proto cache_object
acl localhost src 127.0.0.1/32
acl SSL_ports port 443 563
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 563 # https, snews
acl Safe_ports port 322 554 # rtsps, rtsp
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535 # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl CONNECT method CONNECT

#Recommended minimum configuration:
#
# Only allow cachemgr access from localhost
http_access allow manager localhost
http_access deny manager

# Deny requests to unknown ports
http_access deny !Safe_ports

# Deny CONNECT to other than SSL ports
http_access deny CONNECT !SSL_ports

#
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
#
http_access deny nohackers01
http_access allow localhost
http_access allow mynet1
http_access allow mynet2
http_access allow mynet3
http_access deny all

cache_mgr squidunittest_at_example.com

cache_effective_user squid
cache_effective_group squid

cachemgr_passwd unittest all

# we use these always_direct directives to make sure that Squid evaluates the url_regex rules
always_direct allow microsoft1
always_direct allow iabc
always_direct allow iabcIde
always_direct allow iabcidef
always_direct allow owc1
always_direct allow simple1
always_direct allow simple2
always_direct allow long1
always_direct allow error1
always_direct allow URLREGEX1
always_direct allow URLREGEX2

# We want to see the whole URL:
strip_query_terms off

#!/bin/sh
#
# unittest_re_optim_wget - # test the RE optimisation patch
#
# The squid.conf file should have to see all debug output
# debug_options ALL,1 28,9
# squid.conf of this unit test assumes that there is a squid tree in /local/test
# the configuration file needs to be edited in case an other directory is used.
# squid.conf has various url_regex directives and 2 references to files with REs:
# acl URLREGEX1 url_regex "/local/test/etc/re.4lines"
# acl URLREGEX2 url_regex "/local/test/etc/re.200lines"
#
# NOTE: use "squid -X -f /local/test/etc/squidtest.conf" to see the debug output during startup

# To support multiple Squid instances, squidtest.conf has
# http_port 33129
http_proxy=localhost:33129
export http_proxy

# squidtest.conf has url_regex ACLs which are used in 'always_direct allow foo' directives.
# and there is a standard acl: acl QUERY urlpath_regex cgi-bin \?
# The following wget commands trigger the RE matching functions.

wget -q -O ttt01 http://www.example.com/abc/def?x=0
# aclRegexData::match: match '(cgi-bin)|(\?)' found in '/abc/def?x=0'
# aclRegexData::match: match '(.example.com/abc)' found in 'http://www.example.com/abc/def?x=0'

wget -q -O ttt02 http://www.example.com/cgi-bin/report.pl?a=9
# aclRegexData::match: match '(cgi-bin)|(\?)' found in '/cgi-bin/report.pl?a=9'

wget -q -O ttt03 http://www.example.com/-1-2-exx-3-a.html
# aclRegexData::match: match '(axx)|(bxx)|(cxx)|(dxx)|(exx)|(fxx)' found in 'http://www.example.com/-1-2-exx-3-a.html'

wget -q -O ttt04 http://40john-doe.example.com/foo/bar/index.php
# aclRegexData::match: match '(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(31john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/31example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/33k3k3second\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(32john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/32example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/44k44k4k4second\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(33john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/33example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/00d0d0d0second\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(34john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/34example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/2k2k2k2second\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(35john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/35example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/aaananasecond\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(36john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/36example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/kwkwkwkwkwsecond\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(37john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/37example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/qkqkqkqkqjsdsecond\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(38john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/38example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/oododofofnsecond\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(39john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/39example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/oeoeoeoekkfnsecond\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(40john.*doe.example.com/.*/index.php)' found in 'http://40john-doe.example.com/foo/bar/index.php'

wget -q -O ttt05 http://www.foo.example.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm.php?var=6
# aclRegexData::match: match '(cgi-bin)|(\?)' found in '/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm.php?var=6'

# 2011/06/02 14:10:59.889| aclRegexData::match: match '(abc.example.com/scripts/cgi-bin/40example.cgi)| ... (edited) ... |(foo\.example\.com/html/utututututsecond\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(27john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/27example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/kfkfkfkfksecond\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(28john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/28example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/kkkkksecond\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(29john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/29example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/qqqqnsecond\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(30john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/30example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/kkkkskskssecond\.php)' found in 'http://www.foo.example.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm.php?var=6'

wget -q -O ttt06 http://www.foo.example.com/error.html
# aclRegexData::match: match 'err' found in 'http://www.foo.example.com/error.html'

rm -f ./ttt??
Received on Thu Jun 02 2011 - 13:30:21 MDT

This archive was generated by hypermail 2.2.0 : Thu Jun 02 2011 - 12:00:05 MDT