Re: [squid-users] Problem with Dotless IP Address

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Wed, 26 Dec 2001 04:14:13 +0100

On Tuesday 25 December 2001 01.14, Joel Jaeggli wrote:

> wrong. rfc 1738 specifies that url's may contain a fully qualified domain
> name or dotted quads. hex octal or dword representations of ip addresses
> are valid in dns but not in url's...

DNS has nothing to do with it. Host name resolution in the OS has, where DNS
is one of the components. DNS is usually queried if the requested address
does NOT look like an IP address (there may also be other name resolvers than
DNS, such as host tables, wins, NIS, netbios, ...).

Summary:

Long form IP is an unofficial IP notation known by some operating systems
mainly due to historic reasons and overgeneralization.

URLs is meant to be "universally understood" which requires that any notions
used is properly standardized, not only "happens to be understood by some due
to a historic accident. Because of this the URL specification is very strict
on what is acceptable in Internet URL schemes, namely quad dotted IP or fully
qualified domain name, nothing else.

Negated discussion:

Long IP addresses is NOT allowed in URL's since there is no defined standard
for long IP addresses, and will never be.

Short host names without domains is NOT allowed in URL's, as such names is
not universally valid, depending on specific settings of the users computer.

Long discussion:

The "long form IP address" representation is a artifact of an old
implementation of inet_addr(), designed to allow ease of use of "class A and
class B networks" where the host component is larger than 8 bits (B has 16
bit host addresses, A has 24 bit host addresses).

The "dotless" form is a sideeffect of how the support for "class A/B"
networks was implemented. This family of inet_addr() implementations divides
the IP address in two parts: The rigthmost component is the host address, the
other part(s) (left) is the network number. A dotless IP address is simply a
host address with no network number. I.e. a host that belongs to no network.

This "long form IP" (0-2 dots, or host component > 255) has never made it
into any official Internet standard, but exists in many OS:es due to the
above historical heritage. As it isn't standard there is no guarantee it will
exists, and there is indeed a number of OS:es not knowing about this odd form
of IP addresses.

As there exists and must be expected to exists OS:es not knowing about this
long form IP addresses, this form cannot be used in the URL specification, or
any other form that can be expected to be exchanged with users using such
OS:es or implementations.

The fact that these can be used in some browsers on some OS:es is because the
browser fully trusts inet_addr() to recognize IP addresses, completely
ignoring what the RFCs says should be read as an IP in a URL. If such browser
runs on a OS where inet_addr() accepts long form IP addresses then the
browser will do the same.

Users are customized to be able to access hosts in their own network by name
alone.

To further complicate matters, URLs (and DNS) allows hostnames consisting of
only digits, and users are customized to be able to access local hostnames
without specifying their domain.. I.e. you may well have a hostname
3574594650.your.domain if you like, but OS:es having buggy resolvers where
inet_aton() accepts 3574594650 as a IP address may incorrectly resolve this
name alone to my company IP address instead of your intended address.

Quiz: If a user sees the host name 3574594650, is this a name or an explicit
IP address?

Moreover, as some of you may know, browsers have also had their own
classification of host names such as "local", "intranet", "internet", etc,
and these classifications have been based on number of dots in the hostname
component of the URL, and for quite a while the classifications of "dotless
IP addresses" could end up quite wrongly as the browsers did not think the
dotless IP address could be an IP address, even when running on and designed
for an OS that happens to think 3574594650 is an IP.

The RFC takes the easy and correct path, and makes the specification strict.
Either quad dotted IP, or fully qualified host name. This way there is never
any ambiguity in what a valid address represents. "quad dotted IP" is
guaranteed to be uniquely separate from "fully qualified domain name" by DNS
policy (top level domains must contain letters).

-- 
MARA Systems AB, Giving you basic free Squid support
Customized solutions, packaged solutions and priority support
available on request
Received on Tue Dec 25 2001 - 20:29:42 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:05:29 MST