Re: [squid-users] Compression

From: Robert Collins <robert.collins@dont-contact.us>
Date: 24 Aug 2001 09:07:53 +1000

On 23 Aug 2001 11:58:41 +0200, Mike Kiernan wrote:
> > Checkout the te-modules branch on sourceforge. YMMV. I'm really not sure
> > how I left the source... I'm pretty sure it builds and works but no
> > guarantees.
>
> Sorry if this is a bit of a lame question (I've never used cvs) but; which
> source tree
> should I apply the te-patch to :~?

I think it applies to the te branch from memory - see the projects page,
there is an entry "based on".

You can also just use CVS to get the branch as-is - see the
http://www.sf.net/projects/squid page, click on CVS, for example usage.

You would want to change the bit that reads
co squid
to
co -r te-modules -d te-modules squid
 
> > I'm happy to discuss what you can do with it, but it's really not ready
> > for use until the squid internals shuffle around - and then it will
> > still need work to bring it up to production quality.
>
> understood - thanks for the info. would you mind writing a few lines on
> the methodology you've used/are planning to use for this functionality, and
> what needs to be done to bring it up to speed?

I picked up some previous work done on squid which had client-side
transfer-encoding(te - see rfc 2616), and generalised it to allow server
side and client side te.

The basic concept is that squid has an access list that controls what
requests squid is willing to compress, and with what compression
algorithm. For example you might make sure that squid only ever
compress's html or text files. The result is an ordered access list for
a given request - ie chunked, gzip, gz

On the client side this list is compared to what the client browser
indicated it was capable of recieving to decide what compression to
apply. (AFAIK no end user browser supports compressed transfer
encoding).

For the server side this list becomes a header indicating to the
upstream what squid can accept.

Then as data flows into squid, all transfer encodings are unwrapped,
giving squid the native body (which might itself be content-encoded).
This native body gets saved to the store if it is cachable - and the
vary support (not yet in that branch IIRC) allows squid to concurrently
store multiple content-encoded entities.into
Finally the client side encodes the body with the requested client
encoding as the last step before sending it to the socket.

Encoding is achieved by a chain of data filters, which have a standard
API, and the chain is setup before the data flows. This chain allows
filters that need to buffer data to do so, and also allows some filters
- such as chunking - to avoid memcpying data around.

Performance wise transfer encoding has the following issues:
1) Upstream content is always decoded - CPU overhead
2) Downsteam content is always re-encoded - there is no caching of the
compressed body.

Other than the internal rework already mentioned (which makes the store
sit adjacent to the data flow rather than in the middle of the data
flow) some further conceptual enhancements to squid-te include cacheing
the compressed body to reduce CPU overhead. (this is accomplished by
including a transfer-encoding header in the vary header for the cached
entity, with the chunking removed.). The Client side should be able to
opportunistically create the compressed cache objects when a entity is
requested for which there is no compressed entry, but there is a
non-compressed entry.

Does that about cover your questions? Any particular area I could
enlarge on for you?

Rob
Received on Thu Aug 23 2001 - 17:08:52 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:01:54 MST