Re: URL syntax checking in Squid

From: David Luyer <luyer@dont-contact.us>
Date: Thu, 07 May 1998 12:27:30 +0800

--MimeMultipartBoundary
Content-Type: text/plain; charset=us-ascii

Duane wrote:
> Maciej wrote:
> > The problem is that underscore _is_allowed_ in DNS names as rfc1033 says.
> >So shouldn't squid parse such an URL properly and fetch the object?

RFC 1033 is a "Status: UNKNOWN" RFC which doesn't update or obselete
any other RFC. RFC 1035 is a _standard_ covering DNS, and RFC 1101 is
marked as updating it. These documents, written by Paul Mockapetris,
define the DNS. RFC 1033, written by M. Lottor, is an operations guide
which refers to documents by Paul Mockapetris but unfortunately makes the
mistake with the underscore.

>From section 2.3.1 of RFC 1035 (which seems to be phrased as a
recommendation, rather than a requirement); this is more clearly
stated as an explicit requirement in RFC 1101 and RFC 952, but
since 1035 is marked as the standard...

---
The labels must follow the rules for ARPANET host names.  They must
start with a letter, end with a letter or digit, and have as interior
characters only letters, digits, and hyphen.  There are also some
restrictions on the length.  Labels must be 63 characters or less.
---
The underscore character has always been considered a 'bad thing'
for DNS and usernames, since many European mail gateway systems
(X.400 I think) could not handle addresses with underscores
in them in the 1980's (this has probably changed since then).
At UWA we don't permit underscores in email account names because of
complaints from European universities in the distant past.
I haven't read rfc987 fully, but it seems to mention the source of
the problem with underscores (and maybe some solution - it wasn't
widely implemented when we dropped the underscores from our email
account names here, but that was years back):
---
         The 822.3DIGIT in EBNF.ps-encoded-char must have range 0-127
         (Decimal), and is interpreted in decimal as the corresponding
         ASCII character. Special encodings are given for: at sign (@),
         percent (%), exclamation mark/bang (!), double quote ("), and
         underscore (_).  These characters are not included in
         PrintableString, but are common in RFC 822 addresses.  The
         abbreviations will ease specification of RFC 822 addresses from
         an X.400 system.
---
> Hm, RFC's 1101 and 952 do not mention '_' in the allowed characters.
> 
> Anyway, you can compile recent versions of squid with 
> -DALLOW_HOSTNAME_UNDERSCORES=1.
> 
> I'll try to find out more about the troublemaking underscore.
They shouldn't be there, but they're hard to get rid of.
Perhaps the best behaviour is:
  * test the OS and if you can use underscores, use them
  * if you have a parent cache and can't use underscores, pass the
    underscore-containing name to it
  * if you can't use underscores, or if some compile option is
    #defined to say not to use them, attempt to resolve the
    hostname with underscores mapped to hyphens [1]
  * if you can't resolve that name, print a *friendly* error
    message explaining the problem and maybe with a link to
    an explanatory web page on a central site?
David.
[1] the most common fix, and if you recommend this fix to people
    and implement this modification, the cache is actually making
    old links work if/when the underscore name has to go away due to
    a nameserver upgrade at the site.  After all, caches are meant to
    be there for the users as well as the bandwidth...
--MimeMultipartBoundary--
Received on Tue Jul 29 2003 - 13:15:49 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:46 MST