Summary of "Features" thread on squid-users (fwd)

From: Gregory Maxwell <nullc@dont-contact.us>
Date: Mon, 24 Nov 1997 20:31:30 -0500 (EST)

Heres what I just posted to squid-dev.. any aditions or comments?

---------- Forwarded message ----------
Date: Mon, 24 Nov 1997 20:25:51 -0500 (EST)
From: Gregory Maxwell <nullc@nightshade.z.ml.org>
To: squid-dev@nlanr.net
Subject: Summary of "Features" thread on squid-users

Last night I posted a message on squid-users about my ideas for features
for 1.2.. This caused quite a message storm.. I would have posted it here
first but I thought the list was named squid-devel..

Here follows a summary, and rehashing of my post and the replies I
recieved.

I suggested five items: reload_into_ims improvements, http1.1
cache-control settings, Improvemts for object purging, object
blacklisting, and compression. The compression was the only thing I got
real responces on.. However, I feel it's the most controversial and would
prob cause the most difficulties and have less benifits then my other
thoughs.. Because of this, I talk about it last.

Reload_into_ims:

        * Current system difficulities:
                1. Can return stale pages because of servers that dont do
                   IMS requests right.
                2. Can return stale pages because of cache hierarachy
        * An improvement:
                1. Have squid keep track of IMSes so that if there are
                   more then a certian number of client issued reloads
                   per minute then do a reload. If there are more then
                   that then bypass the hierarchy and go direct (or no
                   cache through the hierarchy)..

Http1.1 Cache control:

        * Allows sites to control how their pages will be cached.
        * A strictly implimented cache will not allow tuning of this.
                * Many sites may abuse this to prevent all caching in
                  order to get better hit stats.
                * Many users (home, slow links, maby ISPs) do not care
                  if ads on webpages are fresh.
                * Some users (home, very slow links, myself) dont care
                  if anything is fresh and are quite happy to hit reload
                  to IMS objects..
        * Squid should have a regex to allow admins to multiply certian
          cache-control headers by varibles.

Purging Improvments:

        * Squid currently purges objects that were the last loaded first.
        * An object that gets many hits and always IMSes as fresh will
          be considered older then object that IMSes as stale every time
          and gets reloaded often.
        * Squid should toss the objects with the oldest (and fewest) HITS
          first.

Blacklisting:

        * While working on design concepts for the Mnemonic WWW browser's
          cache, I had the problem of a small cache with a small user base.
        * In order to prevent objects that are always stale from flushing
          good objects out on the cache, I thought up a 'black list'.
        * Objects in the cache should be tagged for how many times they've
          been found STALE with a reload or IMS.. after the object has
          been in the cache for (Avg LRU Age/4) if it's hits/stale is 1 or
          very close then it's url gets added to the blacklist.
        * The blacklist should be a configable fixed size.
        * New objects get added to the top, objects who have not been hit
          recently get removed when the blacklist fills..
        * Blacklisted objects are proxy-onlyed..
        * This is of little benifit to adequately sized caches,
          but it could be a great benifit to undersized caches..
        * Implimenting and maintaining this feature would be small, sites
          that dont want it could just set the list size to zero..

Compression:

 This is the most 'heated' part of my discussion.. Basicly the idea is
like this: CPU is cheap, bandwidth isn't...

 When objects which match a regex are sent between capible caches,
compress them..

After reciving many emails and list mails, and conducting some tests I
think the best implimentation would work something like this.

Squid should include LZO compression. It's patent free and freely
available. It's decompression is lighting fast (around 28MBytes/s on a
single cpu of a dual 166/mmx). It's compression is preety quick, and you
can trade compression speed for level of compression. (42MByets in 6 sec
in fast mode)... LZO has very low memory requirements for both compression
and decompression..

For this message, please keep in mind that decompression is so fast that
in many circumstances it actually would improve speed (if you have a raid
array capible of producing 28Mbytes/s, your cpu would prob be a alpha 600
and capible of decompressing faster then 28Mbytes/s)..
 
Squid should include three compression modes: Idle, on the fly, and none.
The mode should be selectible by client-regex and object-regex.

When an object is requested from the cache that hits the on-the-fly rule,
and the requester is something that understands the lzo compression: Squid
should compress the object (with the compression level decided by current
system load).. The compressed object should be sent to the remote cache
(or squid savvy www browser), and should replace the object on the disk.
As long as this object is in the cache it will never need to be compressed
again.

When an object is requestd that hits the idle rule, if it's compressed
it's sent that way, if it's not it isn't..

When an object is recievd from another cache, it is stored like it's
recieved.. Compressed or not..

While the cache computer is mostly idle (as determined by a set of rules)
the cache searches the swap and find objects that match the idle rule and
first compresses objects with many hits.. The compression level would be
settible or determined by how many objects fit the idle rule needs to be
compressed..

When a client that is not LZO savvy requests an object, the cache
decompresses it on the fly.. (And possibly the cache could keep a small
ram buffer of decompressed objects)..

Big caches that are tight on cpu could keep their rules to idle only..
While smaller caches could make more use of on the fly..

Obviously the rules would be set to only compress .txt .html .htm and
friends..

With this system, an object needs to only be compressed once (at the top
of the hierarchy, or maby even by a lzo savvy apache) and only gets
decompressed when leaving the compression friendly cache world.. :)

This is much like the intended use of the http/1.1 x-compressed: header..
We add a compression method, LZO. (As I recall, adding a mode is not
against the spec) And we add the ability to detect clients that understand
our compression mode (someone check the specs), and we add the ability to
decompress for clients that arn't smart enough..

The reason we use LZO over gzip is because gzips decompression is about
10x slower (at least on my hardware), this speed difference would limit
the usefulness of compression on caches that handle many end clients that
dont understand compression.... Also, LZO is very light on memory..

Although my entire reasoning for compression is to save bandwidth, it also
has the advantage of saving disk space...

I've gone on long enough.. I'd apricate hearing any opinions on this, and
I wouldn't mind lending a hand in an actual implimentation of this, should
the main developers this this is a good idea..
Received on Mon Nov 24 1997 - 17:42:16 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:37:43 MST