Re: Save 15% on your bandwidth...

From: Gunnar Ingvi Þórisson <gunni@dont-contact.us>
Date: Sun, 15 Sep 1996 13:42:03 +0100

> From: Gudmundur Ragnar <squid@this.is>
> Cc: squid-users@nlanr.net
> Subject: Re: Save 15% on your bandwidth...
> Date: 15. september 1996 02:18
>
> I recomend having a list (lists) that the proxy can get automaticaly
> that would give the information of who is mirroring what.

I don't think that is possible, how can squid figure out which site is
closer (maybe by domain extensions)? We are talking about too
many mirrors / sites / files that should follow each rule, setting up
tables with rules of where it should get each file from each location
in the world is too much work.

We are dealing with this problem when the same files are being
retrieved from many sites (mostly big files). I think the best
solution for this now is running checksum (maybe MD5) on big ftp
files (not the path) name / date / time / size / (extension (.exe / .zip))
and if that matches to the file that was retrieved from another
server before and is still in the cache it returns that file. It is very
little chance that the file being retrived isn't the same file since the
odds go smaller while the files grow bigger.

        Let's say that someone retrieves the file
        ftp://ftp.netscape.com/pub/navigator/3.0/windows/file.exe
        3.423.664 bytes via proxy.if.is, the file gets stored in the proxy,
        another user finds the same file at ftp.funet.fi:/pub/incoming/file.exe
        same name, same size, same date, same time = same checksum
        deffenently the same file since it is so big, then the cache should
        return this file.

This should be an option in squid.conf also running a checksum on one/some
/all of the possible checks (that is date / time / file name / extension
etc..).

Also it is possble to log the checksum of a file and count how many times
the
file has been retrieved, if the file has been retrieved 10 times over one
month
(e.g. the 10 megs) and always expired (deleted) from the cache it should
keep
it longer than other files (longer than the default expiry).

> What the list must tell us is:
> Does this URL point to a mirror?
> If so:
> What is the actual "home" URL of the info?
> This is the URL that gets used to access the cache.
> If we do not have the file:
> Is there a mirror that I could use in preference to the "home"?

That will be hard the make that file common at ftp sites, however
that might succeed some day. There is a file for example at many
Linux sites included with linux sources named filename.lsm which
has a description of the certain file, the author etc.., in that lsm file
there often is a mirror listed and what the original site is, that file
should be more popular and could be used by squid, my opinion
of this is that this is too much work. Checksum should be good
enough for a while. Retrieving this info file and running checks,
connecting to the mirror site, check if the file is there.. takes too
long and would SLOW DOWN the client a lot. My opinion, there
is no way to do this good enough.

> Gudmundur Ragnar
> ragnar@this.is

Best regards,
Gunnar Ingvi Thorisson
System Administrator and programmer
Iceland Software Inc.
gunni@if.is
Received on Sun Sep 15 1996 - 06:41:42 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:33:01 MST