Re: next version of content-encoding / gzip design doc

From: Jon Kay <jkay@dont-contact.us>
Date: Tue, 09 Mar 2004 03:15:06 -0600

Here's yet another design version, following many helpful suggestions
from Henrik.

      Gzip Content-Encoding in Squid Design

Version Choice

The goal will be to get these changes into Squid3 HEAD.

Content-Encoding Protocol

The content-encoding protocol is describedi

Header field cases from client:

    If Accept-Encoding field is present in client request

        If there is a cached response aleady available, and it
 contains a Content-Encoding field with encodings that are a
 subset of what the client accepts

            Then forward response to client unchanged

        Else (no cached response with right content-encoding)

            If uncoded response isn't available

                Then forward client request to server/cache

                If server/cache response contains Content-Encoding field

                    Then forward new response to client

                Else (server/cache response doesn't have Content-Encoding)

                    Then encode client response
                    Send encoded response to client

            Else (uncoded server response already available)

                    Then encode uncoded response
                    Send encoded response to client

    Else (no Accept-Encoding in client request)

        If uncoded server response already available

            Forward unchanged to client

        Else if coded server response already available

            Then decode server response
            send decoded response to client

        Else (no response available yet)

            Then forward request to client or cache, and behave unchanged
            with respect to this protocol.

There will be no explicit links between objects that are different
links to the same coding. Instead, StoreKeys of coded objects will be
chosen particularly as MD5(OriginalStoreKey,Content-Encoding). This
would allow one to derive the StoreKeys of all possible encodings
including original if only knowing the original StoreKey and not the
requested URL.

Searching for an uncoded version of an object is done by generating an
uncoded StoreKey and looking for an object with that key. It's needed
upon cache miss (see protocol above).

Upon original or encoded object update or PURGE, delete all the
possible encoding variants. As the encodings are applied locally the
possible combinations are known and finite so there is no problem on
purging all at once. If the number of encodings grows nontrivially,
we may need to add an additional mechanism to keep that check under
control.

Original-update deletion will be triggered on swapout of a new
original object (when it gets a public key).

Etags: Encoded objects will be given unique new entity tags.

There will be a configuration option to turn off content-encoding.

Content-Encoding Implementation

New HttpHdrContCode module, that parses related HTTP headers, and
arranges for encoding or decoding appropriately. Includes the
following functions:

  codeParseRequest(): Called from client_side:parseHttpRequest()
  after clientStreamInit() call. Checks for and parses Allow-Encoding
  headers. Instantiates content_coding appropriately, and calls
  codeClientStreamInit().
  codeClientStreamInit(): Adds a new node to clientStream with
  codeStreamRead(), codeStreamCallback(), and codeStreamStatus() functions.
  codeStreamCallback()set up encoding/decoding state depending on
  combination of Content-Encoding and Allow-Encoding fields seen.
  codeStreamRead(): call HttpContentCoder transformation functions
  appropriately.
  codeStreamStatus(): report status to stream.

New HttpContentCoder abstract type, with functions:

  encodeStart()
  encodeEnd()
  encodeChunk()

  decodeStart()
  decodeEnd()
  decodeChunk()

New per-coded-object ContentCoderState, to handle coding state. It'll
be referenced from the clientStream, and include fields:

  HttpContentCoder *coder
  off_t codedOffset

Objects will be stored both in unencoded and encoded formats. An
object will stay in the format in which Squid receives it until
requested by a client requesting a different Content-Encoding which
Squid supports (this could be immediate). Once this happens, the
object will be streamed coded into a different StoreEntry and on to
the client.

Other changes needed:

Add new content_coding field to HttpReply.

New httpHeaderGetContentEncoding(HttpReply *) function in HttpHeader.cc.

A new configuration flag to turn content-encoding off, if desired.

A new object flag, "encoded". Whenever an encoded or decoded object
is created, it's tagged as "encoded". Thus, a locally redecoded
object will be obviously so.

A new store.cc function, storeDeleteCodedCopies(), will do the
deletion of all (un)coded copies described above.

Gzip

A new GzipContentCoder module, which will be an instance of
HttpContentCoder.

Data encoding will be handled by the gzip.org <a
href=http://www.gzip.org/zlib/> zlib library</a>.

Functions:
  gzEncodeStart: call inflateInit2(), write header
  gzEncodeEnd: write trailer
  gzEncodeChunk: call inflate()

  gzDecodeStart: call deflateInit2(), read and verify header
  gzDecodeEnd: verify trailer
  gzDecodeChunk: call deflate()

  gzDoSaveEncoded(): true

Test Strategy

Must pass the test suite.

Must add appropriate tests, including sending gzipped content to
oneself successfully.

Will also test against Apache mod_gzip implementation, and maybe even
gunzip.
Received on Tue Mar 09 2004 - 02:16:24 MST

This archive was generated by hypermail pre-2.1.9 : Thu Apr 01 2004 - 12:00:04 MST