Re: Summary of "Features" thread on squid-users (fwd)

From: Dancer <dancer@dont-contact.us>
Date: Tue, 25 Nov 1997 12:06:14 +1000

I'd definitely recommend doing the compression HTTP/1.1 style, using
Content-coding. (excerpt from draft dropped in below, for those unfamiliar with
it). If the cache looks at Accept-Encoding then we're pretty much laughing all
the way home. We implement any Content-coding we like for Squid (say LZO..I
have no personal experience with that one, but I'm taking it on faith), and if
the client says it accepts LZO Content-coding, then fine, pass through
compressed and unaltered. Otherwise send the object uncompressed (or perhaps
later, we can arrange conversion to other formats).

Whether you compress the object in storage is a separate issue, IMO.

D
-- Excerpt from draft-ietf-http-v11-spec-rev-01 follows --

       3.5 Content Codings

       Content coding values indicate an encoding transformation that has been
       or can be applied to an entity. Content codings are primarily used to
       allow a document to be compressed or otherwise usefully transformed
       without losing the identity of its underlying media type and without
       loss of information. Frequently, the entity is stored in coded form,
       transmitted directly, and only decoded by the recipient.

              content-coding = token

       All content-coding values are case-insensitive. HTTP/1.1 uses content-
       coding values in the Accept-Encoding (section 14.3) and Content-Encoding

       (section 14.12) header fields. Although the value describes the content-

       coding, what is more important is that it indicates what decoding
       mechanism will be required to remove the encoding.

       The Internet Assigned Numbers Authority (IANA) acts as a registry for
       content-coding value tokens. Initially, the registry contains the
       following tokens:

       gzip An encoding format produced by the file compression program "gzip"
            (GNU zip) as described in RFC 1952 [25]. This format is a Lempel-
            Ziv coding (LZ77) with a 32 bit CRC.

       compress
            The encoding format produced by the common UNIX file compression
            program "compress". This format is an adaptive Lempel-Ziv-Welch
            coding (LZW).

       Fielding, et al [Page 22]

       INTERNET-DRAFT HTTP/1.1 Friday, November 21, 1997

         Note: Use of program names for the identification of encoding
         formats is not desirable and should be discouraged for future
         encodings. Their use here is representative of historical practice,
         not good design. For compatibility with previous implementations of
         HTTP, applications should consider "x-gzip" and "x-compress" to be
         equivalent to "gzip" and "compress" respectively.

       deflate The "zlib" format defined in RFC 1950 [31] in combination with
            the "deflate" compression mechanism described in RFC 1951 [29].

       identity
            The default (identity) encoding; the use of no transformation
            whatsoever. This content-coding is used only in the Accept-Encoding

            header, and SHOULD NOT be used in Content-Encoding header.

       New content-coding value tokens should be registered; to allow
       interoperability between clients and servers, specifications of the
       content coding algorithms needed to implement a new value should be
       publicly available and adequate for independent implementation, and
       conform to the purpose of content coding defined in this section.

       3.6 Transfer Codings

       Transfer coding values are used to indicate an encoding transformation
       that has been, can be, or may need to be applied to an entity-body in
       order to ensure "safe transport" through the network. This differs from
       a content coding in that the transfer coding is a property of the
       message, not of the original entity.

              transfer-coding = "chunked" | transfer-extension
              transfer-extension = token *( ";" parameter )
       Parameters may be in the form of attribute/value pairs.

              parameter = attribute "=" value
              attribute = token
              value = token | quoted-string
       All transfer-coding values are case-insensitive. HTTP/1.1 uses transfer
       coding values in the TE header field (section 14.Y) and in the Transfer-

       Encoding header field (section 14.40).

       Transfer codings are analogous to the Content-Transfer-Encoding values
       of MIME [7], which were designed to enable safe transport of binary data

       over a 7-bit transport service. However, safe transport has a different
       focus for an 8bit-clean transfer protocol. In HTTP, the only unsafe
       characteristic of message-bodies is the difficulty in determining the
       exact body length (section 7.2.2), or the desire to encrypt data over a
       shared transport.

       The Internet Assigned Numbers Authority (IANA) acts as a registry for
       transfer-coding value tokens. Initially, the registry contains the

       Fielding, et al [Page 23]

       INTERNET-DRAFT HTTP/1.1 Friday, November 21, 1997

       following tokens: "chunked" (section 3.6.1), "identity" (section 3.6.2),

       "gzip" (section 3.5), "compress" (section 3.5), and "deflate" (section
       3.5).

       New transfer-coding value tokens should be registered in the same way as

       new content-coding value tokens (section 3.5).

       A server which receives an entity-body with a transfer-coding it does
       not understand SHOULD return 501 (Unimplemented), and close the
       connection. A server MUST NOT send transfer-codings to an HTTP/1.0
       client.

       3.6.1 Chunked Transfer Coding

       The chunked encoding modifies the body of a message in order to transfer

       it as a series of chunks, each with its own size indicator, followed by
       an optional trailer containing entity-header fields. This allows
       dynamically-produced content to be transferred along with the
       information necessary for the recipient to verify that it has received
       the full message.

              Chunked-Body = *chunk
                               last-chunk
                               trailer
                               CRLF
              chunk = chunk-size [ chunk-extension ] CRLF
                               chunk-data CRLF
              chunk-size = 1*HEX
              last-chunk = 1*("0") [ chunk-extension ] CRLF

              chunk-extension= *( ";" chunk-ext-name [ "=" chunk-ext-val ] )
              chunk-ext-name = token
              chunk-ext-val = token | quoted-string
              chunk-data = chunk-size(OCTET)
              trailer = *entity-header
       The chunk-size field is a string of hex digits indicating the size of
       the chunk. The chunked encoding is ended by any chunk whose size is
       zero, followed by the trailer, which is terminated by an empty line.

       The trailer allows the sender to include additional HTTP header fields
       at the end of the message. The Trailer header field can be used to
       indicate which header fields are included in a trailer (see section
       14.49).

       A server using chunked transfer-coding in a response MUST NOT use the
       trailer for other header fields than Content-MD5 and Authentication-Info

       unless the "chunked" transfer-coding is present in the request as an
       accepted transfer-coding in the TE field (section 14.48). The

       Fielding, et al [Page 24]

       INTERNET-DRAFT HTTP/1.1 Friday, November 21, 1997

       Authentication-Info header is defined by RFC 2069 [32] or its successor
       [43].

       An example process for decoding a Chunked-Body is presented in appendix
       19.4.6.

       All HTTP/1.1 applications MUST be able to receive and decode the
       "chunked" transfer coding, and MUST ignore chunk-extension extensions
       they do not understand.

--
Note to evil sorcerers and mad scientists: don't ever, ever summon powerful
demons or rip holes in the fabric of space and time. It's never a good idea.
ICQ UIN: 3225440
Received on Mon Nov 24 1997 - 18:25:21 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:37:43 MST