Re: thoughts on memory usage...

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Thu, 21 Aug 1997 14:02:46 +0200 (EETDST)

--MimeMultipartBoundary
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT

On 21 Aug 97 at 11:03, squid-dev@nlanr.net wrote:

> On Thu, 21 Aug 1997, Brian Denehy wrote:
> >I need to dig out and rerun some work that Martin Hamilton and I both
> ran
> >about fifteen months ago which looked at duplicate objects in the
> cache. At
> >that stage I concluded it was not worth the savings to remove the
> duplicate
> >objects.
>
> Calculating MD5 checksums on the contents (ignoring headers) of my 6Gb
> cache, I find only 342,773 of the 382,947 objects are unique - about 10%
> is duplication. Is this worth saving? Note that we would still have to
> keep the headers, which are a significant part of the common small
> objects.

    I agree fully that there is no point in detecting duplicates by squid.
 Still, there are some cases when duplicate become really annoing, eg:
 realaudio player. Their web page constructs a different URL every time
 after registering a download, thus every time user downloads a file,
 the bandwidth is wasted. We have about 20 copies of raplayer in our
 cache, each of 1MB. Of course, there is no use in using md5 here, because
 to detect the duplicate, you first need to download it, so, redirector
 is a good choice here. And in general, its better to redirect many
 URL-s to a common source, rather than download them all and find out
 1% of them are duplicates. And, redirecting is not going to conserve
 much disk space also, its just the fact that to download big file takes
 some time and is much too easily failing to complete. popular files
 like netscape and IE can consume so much bandwidth while every user
 retries their downloads...

> There are over two thousand copies of ads from AltaVista Europe, which
> I'm now going to mark uncacheable (they have URLs like
> http://ad.altavista.telia.com/ad_image;time=1997.08.20.10.58.25.938&site
> =g2-i.altavista.telia.com&spacedesc=/front&country=gb).

    I'd suggest to add '&' to default "? cgi" hierarchy-stop and cache-stop
 options in squid.conf

 ----------------------------------------------------------------------
  Andres Kroonmaa mail: andre@online.ee
  Network Manager
  Organization: MicroLink Online Tel: 6308 909
  Tallinn, Sakala 19 Pho: +372 6308 909
  Estonia, EE0001 http://www.online.ee Fax: +372 6308 901
 ----------------------------------------------------------------------

--MimeMultipartBoundary--
Received on Tue Jul 29 2003 - 13:15:42 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:24 MST