Re: URL syntax checking in Squid

From: Dancer <dancer@dont-contact.us>
Date: Thu, 07 May 1998 14:47:37 +1000

--MimeMultipartBoundary
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

David Luyer wrote:
> RFC 1033 is a "Status: UNKNOWN" RFC which doesn't update or obselete
> any other RFC. RFC 1035 is a _standard_ covering DNS, and RFC 1101 is
> marked as updating it. These documents, written by Paul Mockapetris,
> define the DNS. RFC 1033, written by M. Lottor, is an operations guide
> which refers to documents by Paul Mockapetris but unfortunately makes the
> mistake with the underscore.

Indeed. RFC1033 and I have had some violent arguments before now.

> I haven't read rfc987 fully, but it seems to mention the source of
> the problem with underscores (and maybe some solution - it wasn't

Part of the problem was file-naming, as well as character translation.
'_' represents other characters in some non-english encodings. Also,
some filesystems (probably extinct now) do not permit a path element to
contain an underscore. Since all the ARPA host/domain name elements are
supposed to be representable as file-system path elements, the '/' and
underscore, the dollar sign, and other such nuisances were prejudiced
against.

> They shouldn't be there, but they're hard to get rid of.
>
> Perhaps the best behaviour is:
>
> * test the OS and if you can use underscores, use them
>
> * if you have a parent cache and can't use underscores, pass the
> underscore-containing name to it

I'd be dubious about this. I'd hate to see a request passed all the way
to a top-level cache that subsequently rejects the URL based on it's
possession of an underscore. I figure that if a child can't handle it,
it shouldn't necessarily expect a parent to. Since we don't use
something like the HTTP/1.1 OPTIONS method (IIRC), we're not really able
to query capabilities easily from the hierarchy.

> * if you can't use underscores, or if some compile option is
> #defined to say not to use them, attempt to resolve the
> hostname with underscores mapped to hyphens [1]

Now, that's not a bad idea at all. It's utility might be a trifle
limited, but it's not a bad idea...

> * if you can't resolve that name, print a *friendly* error
> message explaining the problem and maybe with a link to
> an explanatory web page on a central site?

What? Like: "Some numbskull believed RFC1033 'cause they can't count any
higher, and defined a hostname with illegal characters"? :) :)

> David.
>
> [1] the most common fix, and if you recommend this fix to people
> and implement this modification, the cache is actually making
> old links work if/when the underscore name has to go away due to
> a nameserver upgrade at the site. After all, caches are meant to
> be there for the users as well as the bandwidth...

-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GAT d- s++: a C++++$ UL++++B+++S+++C++H++U++V+++$ P+++$ L+++ E-
W+++(--)$ N++ w++$>--- t+ 5++ X+() R+ tv b++++ DI+++ e- h-@ 
------END GEEK CODE BLOCK------
--MimeMultipartBoundary--
Received on Tue Jul 29 2003 - 13:15:49 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:46 MST