Re: IDN issues

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Wed, 4 Feb 2004 17:57:43 +0100 (CET)

On Wed, 4 Feb 2004, Bjoern Jacke wrote:

> Well, now IDN is standarized but I see a different problem here. Let's
> assume a non-IDN-aware client sends a http request for the site
> björn.j3e.de. Squid could of course resolve björn.j3e.de and return
> the site to the client BUT I think there is no standard, which says in
> which encoding clients should send the requests to the proxy.

There is a URL standard which says Internet style URLs MUST use ascii
characters only, in all medias including screen. In addition even stricter
rules on what characters are allowed are defined in the HTTP protocol.

There is a general consensus that over time the Internet protocols should
all use UTF-8 for strings, but nearly none of the existing protocols have
formally been revised to add UTF-8 support.

> So squid could just see that there comes a non-ASCII request but it
> doesn't know what to do with it because the encoding is unknown. Is it
> possible that IDN *just* makes sense to be implemented on client site,
> not on proxy site?

IDN specifies that it MUST be implemented on client side unless a protocol
is used which allows for unencoded transmission.

A proxy is both a client and a server. To it's clients it is a server but
for whatever server it forwards the clients requests to it is a client.

> In that case the above section in the FAQ should be updated that IDN is
> useless in squid.

It is not useless. It is just that how it is to be used is not yet
defined. But the quite obvious application is use of UTF-8 encoding in
HTTP and the proxy translating this to IDN in the host lookup if using a
resolver not UTF-8 capable.

IDN as such is just a transition standard on how to encode information at
the borders between full symbol support and older ASCII based protocols.

So what this means is that

a) A client talking to a server (including proxies) not known to support
UTF-8 must translate host names in the protocol according to the specified
IDN rules.

b) Any application looking up the IP of a host name known to be based on
non-ASCII characters must apply the rules of IDN before the DNS lookup,
unless it is known the DNS resolver is UTF-8 capable. If the DNS server is
known to be UTF-8 capable then it is the resolvers responsibility to apply
IDN where required when talking to other DNS servers.

Both rules above applies recursively. By extending protocols to support
UTF-8 the boundary where IDN needs to be applied is moved, with the
ultimate goal that some day UTF-8 may be used end-to-end over the major
parts of Internet.

Some browsers can already today be told to use UTF-8, which makes it
interesting to implement IDN in proxies. But it should be noted that the
HTTP standard has not yet been officially revised to add UTF-8 support,
there is only a general guideline from IETF and IAB that so should be done
in a future revision of the protocol.

Regards
Henrik
Received on Wed Feb 04 2004 - 09:57:47 MST

This archive was generated by hypermail pre-2.1.9 : Mon Mar 01 2004 - 12:00:04 MST