Some Squid 1.NOVM.18 oddities

From: WWW server manager <webadm@dont-contact.us>
Date: Wed, 17 Dec 1997 22:47:48 +0000 (GMT)

Some Squid 1.NOVM.18 oddities (running on a Sun SPARCserver 20/151
with Solaris 2.5, compiled with Sun's cc). Partly just to comment on various
points I've noticed, though including some apparent bugs and places where
there's room for improvement.

(1) A problem with (apparently) a truncated cached file at a parent server
brought to light an oddity with cachemgr.cgi. I tried using the refresh URL
facility to force reloading of http://www.macfixit.com/ and found that
roughly 80% of the time I got "Premature end of script headers" from the
Apache server running cachemgr.cgi, and the other 20% of the time,
cachemgr.cgi just went into a CPU-bound loop with no system calls (in the
time I watched it with truss, the Solaris 2 system call tracer) - nothing in
the cache server logs, no obviously-related entries in the file descriptor
display as shown by an independent cachemgr request. It works for other
URLs, though, and no problems were seen refreshing the same URL using the
Squid "client" program.

It might be useful to know whether cachemgr behaves like that for other
people, or if maybe it only affects me for some reason. Apart from that, in
the absence of any indication of what's going wrong, I mention it "for the
record", no expectation of a solution except by pure chance if someone spots
an explanation.

[I was also surprised that Squid was caching the problem document, since it
lacked last modification and expiry timestamps, and didn't specify content
length, but I haven't looked more closely yet to decide whether I think
there's anything genuinely odd there.]

(2) Prompted by recent discussion reminding me that Squid ACLs can be
defined in files, I had a look at whether that would be a good way to
define potentially long lists of hosts for which requests should be routed
direct (not via any parents) outside the main squid.conf (though it would
still need a HUP to get any changes into service). Several points arose from
that investigation:

(a) tying in with a recent query about dynamic ACLs, would it be possible
(and reasonable) for Squid to record when any configuration files were last
modified and to check for changes periodically (configurable, disable it if
paranoid it might load a partially-edited file one day...), doing a
reconfigure automatically if changes were detected? The tricky bit, I
suppose, is that it would have to track any subsidiary files to which
squid.conf had led it, not just check squid.conf.

(b) Enabling detailed logging showed, unsurprisingly, that with a
substantial number of parents configured and a lot of hosts listed in the
ACL, Squid had to check through the whole list for each parent (for which
I'd used "cache_host acl the.host.name ! always-direct.acl"), for every
request.

That seemed rather inefficient, so I looked around for a better option - I
was looking for an alternative to listing the relevant domains individually
with local_domain directives in squid.conf, to avoid having to edit
squid.conf just to update the list of special-case domains.

(c) I'd noticed when running squid under truss (the Solaris 2 system call
tracer) that it tried opening non-existent files with names corresponding to
some DNS host or domain names appearing in the configuration file.

Investigation showed that was for local_domain entries, though apparently
not documented (in particular, not mentioned in the comments in the sample
squid.conf). In contrast to acls, where filenames must be enclosed in
quotation marks, all names mentioned in local_domain definitions are
stat()ed and if found, scanned for hostnames.

That seems undesirable (doubly so since it is undocumented), for several
reasons, though the ability to use a file when identified as such (by
quotation marks, as in an acl) would certainly be useful.

One problem is that it's not too unlikely that files may occasionally be
created with names corresponding to hosta/domains (e.g. for a log file
extract relating to the particular system...). Having squid process such
files unexpectedly does not seem sensible.

In addition, there is a bug in the way that files named in local_domain
directives are handled. While the directive will accept multiple names, if a
file is found and is not the last item in the list, the remaining names are
ignored. That is a consequence of the way parseLocalDomain() uses strtok()
(relying on it to keep track of its position in the configuration file line)
but then calls parseLocalDomainFile which uses strok() independently to
process the file contents, losing track of the position in the original
input line.

(d) There are, however, some features of the local_domain file handling
which are better than the acl file handling ... lines starting with "#" or
which are completely empty are explicitly ignored when reading a
local_domain file, and whitespace-only lines will be ignored when parsed
using strtok(). In contrast, if acl files contain empty or comment lines,
they end up in the acl, as shown by detailed logging where the target
hostname of a request is compared repeatedly to "#" or the null string, if
the acl file contained any comments or empty lines (duplicates not ignored)!

So, in summary:

 * if you want to specify that a lot of hosts or domains should have
   requests routed direct, it looks as though local_domain should be
   substantially more efficient, bypassing all parents with one test instead
   of checking every entry in an acl separately for every parent. Not
   really surprising, but the documented availability of acl files nearly
   pushed me towards using an acl file as I wanted the names out of the
   main squid.conf.

 * names mentioned in local_domain definitions will be checked to see if
   they exist as files (though that is not documented), with their contents
   interpreted as domains to exclude. Unlike acls, the names do not have
   to be enclosed in quotation marks to be interpreted that way.

 * if a file is specified with local_domain it must be the last or only
   item, as any subsequent items in the line are ignored.

 * local_domain files can safely include comment and empty lines, with
   multiple entries on each line, whereas only the first token of each line
   in an acl file is added to the acl, including "#" from comments
   and null strings for empty lines. Harmless (?) but adds to the amount
   of checking to be done for any request to which the acl applies.

I'd suggest that (if not already fixed in the 1.2 beta - I've not had time
to look at it) the best way to tidy up these inconsistencies would be
  
 * require quotation marks around filenames in local_domain, and fix the
   use of strtok() so items after filenames are not ignored; non-quoted
   items should be used only as host/domain names. Document the (useful!)
   ability to use files for domain names.

 * fix the handling of acl files to match the handling of local_domain,
   ignoring comment and empty lines and possibly allowing multiple names
   on each non-comment line.

Plus there's the separate point about whether it would be feasible for Squid
to detect changes to configuration files and optionally reconfigure itself
automatically when it notices such a change.

Since the use of files with local_domain is not currently documented, I
suppose the question "is use of files with local_domain supported, or
historical and obsolete, liable to be removed in future?" needs to be asked,
as well.

                                John Line

-- 
University of Cambridge WWW manager account (usually John Line)
Send general WWW-related enquiries to webmaster@ucs.cam.ac.uk
Received on Wed Dec 17 1997 - 15:01:27 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:37:59 MST