Re: Initial patch for file suffix acl

From: Amos Jeffries <squid3@dont-contact.us>
Date: Tue, 4 Mar 2008 11:02:50 +1300 (NZDT)

> Hi Robert:
>
>>
>> Its probably better to do this as a combined regex:
>>
>> acl file_suffix exts .foo .bar .baz
>>
>> creating a regexp based acl .*(\.foo|\.bar|\.baz)$

The core idea of this suffix check was to get away from regex and make a
faster version.
Users can already configure a uripath regex if they want that.

If you think its slower this way, please provide numbers :-)

>>
>> any decent regexp engine will be better at this than your linear search.
>
> Do you think compiling a regex (ok, it's made once) and matching it to
> an url (maybe _huges_ urls, maybe many hundreds times per second) are
> cheapper than this file_suffix implementation ?
>

Hmm, I didn't look it over carefully earlier.
In light of the above comments, I did and now have these points for
optimization:

* strrchr() is probably the better for finding the '.'
     the forward-search is likely to be on average slower than the
reverse-search. Particularly for objects without parameters which are
in the majority.

* strlen(l->key) is relatively quite slow and CPU intensive. If possible
it needs to be moved outside the core while-loop. Probably by basing the
strncmp length on the tested files ext (or ACL_FILE_SUFFIX_SZ) than on the
key.

* with the above ACL_FILE_SUFFIX_SZ would be useful then to limit the
scanning time as well as the config parsing.

* ACL_FILE_SUFFIX_SZ should be linked with the appropriate RFC defining
the URI part limits instead of an arbitrary 10. I think its likely to be
in either RFC2181 or RFC1123.

* If you want to test for speed, test the difference between a wordlist
and a splay tree. The output of strncmp is suitable for the splay test and
splay allows a larger scaling over ACL length. Long-term in squid a better
tree entirely is probably needed for string comparisons.

Amos
Received on Mon Mar 03 2008 - 15:02:55 MST

This archive was generated by hypermail pre-2.1.9 : Tue Apr 01 2008 - 13:00:10 MDT