Re: Squid-2.5 changes

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Tue, 15 Feb 2005 02:05:07 +0100 (CET)

On Tue, 15 Feb 2005, Evgeny Kotsuba wrote:

>> Not considered a bug, merely a shortcoming.
>
> Well, consider you will see chinese letters intstead of english when you go
> to any ftp - it will be shortcoming ?

To me yes. You still get the data, the directory listings just looks a
little odd on non-ascii names.

> Againg, there was no such bug in 2.5s5 and earlier

True, but then you also received corrupt HTML in certain cases making it
impossible to browse some directory listings at all if the FTP server had
files with odd names.

> May be I am missing somethig, but I see all "human readable " parts of
> generated html page propelly encoded, as you can see at screnshort
> http://www.laser.ru/evgen/soft/Squid2/ftp_8bitCoding.png for reported ftp.

You are missing the reason why html_quote was added which is to ensure
the resulting HTML is valid even if the file names contains reserved
characters such as < > & etc.

The reason why it also encodes "high" characters is that some stupid HTTP
clients can not handle "high" characters at all and simply strip the
8-bit, making them misread '<' + 128 as '<'.

If you want an example try

   ftp://ftp.henriknordstrom.net/test/

>> But I am very doubtful about the unescaping of the requested URL.
>
> URL should not be unescaping. But all text exept URL - should.

Why then did you add the convertUrlToHumanReadable() call? This call
unescapes the URL the client sent to Squid, before presenting it back to
the client who originally sent it escaped to Squid. Yes, in many times the
escaped URL was originally given to the client by Squid in a directory
listing, but you can not be certain this is the case.

Also if you do this unescaping of the presented URL you should to do it
per element, not as one big chunk. There is a very big difference between
%2f and / in FTP URLs and by unescaping the URL in one chunk you loose
this difference, screwing up the HTML presentation somewhat. Also, you
need to take care to not unescape CTL characters or other unsafe
characters. You can find some directories with such "odd" names on the
test server linked above.

> In any case convert all 8-bit characters in text part of generated html
> to iso-8859-1 is not "politically correct" and mathematically right

It's mathematically correct, but not politically correct. With no charset
information available Squid assumes iso-8859-1 which is the default
charset for HTTP/1.0, not not the whole world is using iso-8859-1.

Anyway, from you pestering me about this I have now dropped the HTML
quoting of CTL or high characters leaving it entirely up to the user agent
to deduce what to do. This quoting of high characters should not be
required for any user-agents in use today, and the quoting of CTL
characters was broken anyway (the entity code got malformed).

I have not done anything about the presentation of the requested URL, and
do not intend to before 2.5.STABLE9. But you are welcome to propose a sane
patch for inclusion in Squid-3.

Regards
Henrik
Received on Mon Feb 14 2005 - 18:05:09 MST

This archive was generated by hypermail pre-2.1.9 : Fri Feb 25 2005 - 12:00:03 MST