Re: Content-Encoding and storage forma from Jon Kay on 2004-02-28 (squid-dev)

From: Jon Kay <jkay@dont-contact.us>
Date: Sun, 29 Feb 2004 00:30:45 -0600

> Please note that messing with Content-Encoding in a proxy is somewhat
> outside the HTTP specifications and some care is needed to do it correctly
> or your risk causing major headache for the HTTP/1.1 content negotiation
> and equality criterias and even risk causing object corruption at the
> clients. By applying Content-Encoding you technically create a new entity
> (which is supposed to only be done by origin servers). This new entity
> must use a different ETag to differentiate from the original to not mess
> up equality conditions at the clients, and any cache revalidations etc
> must be based on the original not your recoded version to make sure there
> is no confusion between the recoding proxy and the origin server as to the
> cached objects freshness.

This brings up an important observation. RFC2616 says:

   The Transfer-Encoding general-header field indicates what (if any)
   type of transformation has been applied to the message body in order
   to safely transfer it between the sender and the recipient. This
   differs from the content-coding in that the transfer-coding is a
   property of the message, not of the entity.

...regardless of that last sentence, virtually all browsers appear to
use Content-Encoding to indicate message encoding rather than entity
encoding. Virtually none support Transfer-Encoding of any sort.
Take a look at the following MS Explorer / slashdot.org conversation,
gathered via ethereal. it's utterly typical - note the lack of TE in
the request, the fact that it's transfering a text object in gzip
format, and that Apache's response has a different (minimal)
transfer-encoding.

  GET / HTTP/1.1
  Accept: */*
  Accept-Language: en-us
  Accept-Encoding: gzip, deflate
  User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)
  Host: slashdot.org
  Connection: Keep-Alive

  HTTP/1.1 200 OK
  Date: Sat, 28 Feb 2004 23:30:02 GMT
  Server: Apache/1.3.29 (Unix) mod_gzip/1.3.26.1a mod_perl/1.29
SLASH_LOG_DATA: shtml
X-Powered-By: Slash 2.003000
  X-Bender: Aw, this bends!
  Cache-Control: private
  Pragma: private
  Vary: Accept-Encoding,User-Agent
  Connection: close
  Transfer-Encoding: chunked
  Content-Type: text/html; charset=iso-8859-1
  Content-Encoding: gzip

  3377
  <
  < . . .

This means that to do anything of much use (especially since very few
sites use gzip coding), we will have to bend standards. This goes
beyond simply replacing the strings "TE" and "Transfer-Encoding" with
"Allow-Encoding" and "Content-Encoding". RFC2626 notes in 13.5.1 that
Transfer-Encoding is a hop-by-hop header, and Content-Encoding is
not, and you point out the Etag consequences.

My proposal is to treat Content-Encoding and Allow-Encoding as though
they were Transfer-Encoding and TE. That is, a string replacement
at the standards level instead of at the input/output text level.

Under this proposal, we would consider content-encoding to be
hop-by-hop, and content-encoded objects to be the identical entities
to unencoded ones.

Thoughts?

Jon
Received on Sun Feb 29 2004 - 00:04:54 MST

This archive was generated by hypermail pre-2.1.9 : Mon Mar 01 2004 - 12:00:04 MST