Re: thoughts on memory usage...

From: David Luyer <luyer@dont-contact.us>
Date: Thu, 21 Aug 1997 15:09:05 +0800 (WST)

--MimeMultipartBoundary
Content-Type: TEXT/PLAIN; charset=US-ASCII

On 21 Aug 1997, Michael O'Reilly wrote:
>> This leaves the question of what to do with the URL. Can you just throw
>> it away? Well... it would certianly be nice to have a fixed structure
>> fixed record length "log" file. One (obvious?) problem I can think of tho
>> is the removal of old items from the cache. Unless cache purging is to be
>> done purely on an LRU or similar basis (hmmm, decline page usefulness
>> every X hours by some constant (8? 50?), increase it by 1 every hit... or
>> some non-linear function? the way the page/buffer cache in linux
>> works...). When a new request for the URL is recieved, it can be decided
>> if the object is out of date or not since you now have the real URL.
>
>Or you can tag the object with the appropriate rule when you build it?
>i.e. You get a request for foobar.gif, you build the object, store it
>on disk, and then run down the refresh_rules to see which one matches,
>and then say "rule 6 is it for this one!", and voilo!
>
>Actually I think I can see all sorts of wild advantages to that....

And then the rules change. Rule 6 is no more. The URLs would have to be
checked against the rules on every config file reload... which means
storing them in the log file and frequent re-reading. I don't know if we
need to look at the refresh rules except when doing a request and checking
if an object is valid tho... an LRU-like technique should be fine for
purging cache objects. And then the URL isn't needed at all.

>> The idea of storing extra metadata in the objects in the cache is
>> interesting (the putting the url at the beginning). Allowing for a
>> rebuild from data, although slow, after re-arranging the cache or whatever
>> would be nice, but if we check the on-disk URL all the time then it's not
>> so nice because of the performance of ICP queries (or maybe just give a
>> false "yes" and then return a tcp denied and fix up squid in a way that
>> it deals with tcp denied by retrying the request from a different
>> peer/parent?). Basically, I like the idea of keeping the URL on disk
>> unless it's actually used all the time (ie, "it's there, why not use it,
>> MD5/HSA _could_ be wrong you know" is not the right attitude for ICP
>> queries).
>
>In for a penny, in for a pound. If you trust MD5, which bother sending
>the entire URL over in the ICP? that's just a waste of bandwidth and
>CPU time. Just send over the md5 signature.... half a :)

Well, agreed. A new ICP query which talked with MD5/HSA sum instead of
URL to compatible caches would work. But recieving an old ICP query you
have to do the MD5/HSA hash (cheap on small strings) - not a problem. The
point I was making was that I wouldn't want to check an on-disk URL for
every ICP query.

>> * the URLs could be kept on disk as the first line of the cache swap file.
>> it would nuke all existing caches, but, this would only be in the
>> upgrade to 1.2 for most users, and if they are patient enough they
>> *could* run a script... this would then mean that people wouldn't loose
>> their cache in a loss of "log"; maybe the format of the first line of a
>> swap file should be the format of a full line of the current "log" file.
>> it wouldn't increase real disk usage or I/O in most cases since real
>> hardware talks in typically 512-byte... 4k or more blocks.
>
>Still not sure why you need the URLs that much. why not write them to
>a seperate log? if reindexing is an oddity, then you're not really
>fussed about the speed....

Well, because the seperate log gets corrupt, runs out of disk space, gets
deleted, .... Keeping them on the first line of the swap files would mean
you can construct whatever you want out of just the cache files, survive
over disk re-arrangement and loss of log files due to running out of disk
space, etc. And because it wouldn't really cost anything in runtime or
disk usage except where it pushes an item into a new block; the only cost
is moving over.

David.

--MimeMultipartBoundary--
Received on Tue Jul 29 2003 - 13:15:42 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:24 MST