Re: On the fly compression

From: Jesus Cea Avion <jcea@dont-contact.us>
Date: Tue, 23 Mar 1999 21:13:35 -0100

About "on the fly compression" in squid caches. Several responses in a
single message:

Bertrand Petit wrote:

> This is a great idea, but this would require quite a lot of
> CPU on busy caches.

Yes and no. You can, for example, have a different process scanning
cache directories and gzipping new entries, in background. Of course,
compression would be a configuration option. In Spain bandwidth is
scarce and expensive.

In my three year old 143 Mhz UltraSparc, I can gzip about
730Kbytes/second and gunzip about 5Mbytes/second (calculated about a
20Mbytes file packed to 4.6Mbytes). That's magnitude orders above my
actual bandwidth.

> > Though if one was paying extortionate rates for bandwidth, it may be
> > cunning.
>
> Yes, this would please any french leased lines subscribers...

European lines in general. In Spain, a 64kbps frame relay costs about
2.500-3.000 dollars... per month!!. If I can cut bandwidth usage by 10%,
i'm be really happy!!.

Bertold Kolics wrote:

> This is an ever-green topic, I think. Look at the archives, you will
> find several threads about this.

Archives have no search capability :-(. Altavista shows no relevant
pages, nevertheless.

> Furthermore, lower level compression on a dial-up connection is
> usually done. :-| So, it is also possible to get 100 kbps throughput
> for a given file with a 33.6 kbps modem as well.

Yes, but if your modem-pool is distant, any bit you could save is
valuable. At least in Europe.

Robert Federle wrote:

> What about a two way strategy? Storing two versions (one compressed
> and one uncompressed) of the same object requires some additional disk

Difficult to implement if hash tables are calculated only using URLs.

I'm thinking about the background process packing files in a "lazy"
manner :-)... Maybe an afternoon coding&check would be valuable...

Bertold Kolics wrote:

> First, examine a content of a cache. Typically, 70% of its content is
> GIF and JPG images and about 10% is textual files (HTML files, ftp
> directory listings, etc.).

Let me see...

Over 37377 objects stored in disk:

24200 image/gif
7138 text/html
5189 image/jpeg
168 text/plain
157 application/octet-stream
97 application/x-javascript
72 application/zip
59 text/css
...

About 20% of cache are "compressible". That it, objects stored in disk.
But the issue is about objects transferred, not objects stored. Let me
see again...

Object type transfered:

Graphics: 68.6%
HTML: 12.3%
Various: 18.8%

> Is it worth implementing this feature?

Uhmmm... If it cames for free... :-)

David J Woolley wrote:

> The correct place to compress is the origin server before putting the
> file onto the disk.

Yes, I know the project to implement "content-encoding" in apache
server.

http://www.apacheweek.com/features/negotiation
http://www.apache.org/docs/content-negotiation.html

Nevertheless, content negotiation disables caching, nowadays :-(

http://www.flora.org/lynx-dev/html/month0397/msg00724.html
http://theory.uwinnipeg.ca/CPAN/data/Apache-GzipChain/GzipChain.html
http://theory.uwinnipeg.ca/CPAN/by-name/Apache-GzipChain.html

> The only issue for Squid should be whether to force Accept-Encoding
> and decode if not in the request. Decoding is designed to be much
> faster that encoding for the deflate (aka gzip) method.

Since 99.9% of servers send data without doing gzip compressión although
clients send "Accept-Encoding: gzip", we never touch the following
problem:

Squid, currently, seems to ignore "Accept-Encoding" (client) and
"Content-Encoding" (server) headers. It simply copies headers, without
interpretation.

So, suppose the following:

1. Client A requests a page. It send "Accept-Encoding: gzip".

2. Cache miss. Squid forwards the request, including headers.

3. Since client agrees to use "gzip", server sends the page using
   "Content-Encoding: gzip".

4. Squid stores the page.

5. Client B requests the same page, but without "gzip".

6. Squid sees a HIT and returns the previous page, with gzip encoding.

Do you see the problem?.

Currently this issue is not of practical importance since 99.9% HTTP
servers ignore "Accept-Encoding" but in the future...

HTTP/1.1 (RFC-2068) has an entire chapter titled "Content Negotiation"
and a subsection called "Caching Negotiated Responses". Interesting
reading.

Does Squid support "Vary" header?.

For example, Vigo's University (http://www.uvigo.es/) shows a different
homepage according with the language selected in the browser. For
example:

>>>>>
$ telnet www.uvigo.es 80
Trying 193.146.32.67...
Connected to seteibis.uvigo.es.
Escape character is '^]'.
HEAD / HTTP/1.0

HTTP/1.1 200 OK
Date: Tue, 23 Mar 1999 19:31:53 GMT
Server: Apache/1.2.0
Vary: accept-language <- LOOK!
Connection: close
Content-Type: text/html
Content-Language: gl <- LOOK!
Expires: Tue, 23 Mar 1999 19:31:53 GMT

Connection closed by foreign host.
$
<<<<<

Although this page expires inmediatelly, it shows how to implement
content negotiated pages.

So, a possible implementation would be, if Squid supports "Vary", fairly
trivial.

So, my questions are:

Do SQUID support "Vary"?
>>>>>
$ grep Vary *
HttpHeader.c: {"Vary", HDR_VARY, ftStr}, /* for now */
http.c: * We don't properly deal with Vary features yet, so we
            * can't cache these
<<<<

:-(

Will SQUID supports "Vary" soon?

When SQUID can manage "Vary" headers, we will have a fairly simple "on
the fly compression":

1. Scan cache directories searching new entries. If the entry is
   "image/gif", "image/jpg", etc, ignore. If the file is "text/html",
   "text/plain", etc, go to step 2.

2. If the objet has "Content-Encoding" header, skip the objet.

3. Compress the object. Set "Content-Encoding: gzip".

4. Change SQUID to do "on the fly descompression" if "Content-Encoding:
   gzip" (server) but client does not request "Accept-Encoding: gzip".

"Andreas J. Koenig" wrote:

> It's a bit of a paradox that web servers that decide to actually
> implement On-The-Fly-Compression are out of the business wrt to cache
> servers, because they have to set the Vary header.

Since SQUID doesn´t cache object with "Vary" headers, the first issue
would be to implement "Vary" management. You are heartly correct.

Dancer wrote:

> However, this automatically means supporting transfer-chunking, which
> (if you've looked at the spec with a programmer's eye) is not all that
> appealing a concept in many ways.

Disable on the fly compression when "Transfer-Encoding: chunked".

> * If squid supports a transfer-encoding that the server does, and the
> client does not, should it use it? Or should it use the
> lowest-common-demoninator (perhaps none).

Any is appropiate. I'd prefer squid descompressing "gzip" objects when a
client does not include "Accept-Encoding: gzip". Unpack is fast, and
most of my users (all?) sends "Accept-Encoding: gzip".

> * What about the conversion of chunked transfers to non-chunked
> transfers (if squid is speaking transfer-encoding to the server, and
> not to the client)?

A thing to think. Easy step would be do not convert such object.

-- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea@argo.es http://www.argo.es/~jcea/ _/_/    _/_/  _/_/    _/_/  _/_/
                                      _/_/    _/_/          _/_/_/_/_/
PGP Key Available at KeyServ   _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
Received on Tue Mar 23 1999 - 13:12:40 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:45:23 MST