Re: thoughts on memory usage...

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Wed, 20 Aug 1997 11:03:30 +0200 (EETDST)

--MimeMultipartBoundary
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT

On 19 Aug 97 at 23:00, squid-dev@nlanr.net wrote:

> From: Duane Wessels <wessels@nlanr.net>

> Its probably not necessary to compress StoreEntry->key. That should
> only be different from StoreEntry->url while a request is in progress.
>
> Initially I was going to suggest that we always leave StoreEntry->url
> compressed and change every reference of entry->url with
> DECODE(entry->url). But then things become complicated if you ever
> need to do:
>
> foo(.., DECODE(e1->url), DECODE(e2->url), ...)

    do we ever need to?

> because you can't just decode into a static array (i.e. like
> inet_ntoa() does).

    I've been thinking some time ago about the possibility to exclude
 URL strings from RAM altogether. But I'm at all not sure if it is
 possible, still, I'd share my thoughts and I'd like your comments:

    URL string is needed only for logging, and finding actual source, thus
 only while request is serviced. I don't seem to find any other uses for
 keeping actual URL string in ram. Squid is request driven, and it doesn't
 care very much of what is in its cache for other times.
    For URL search all we need is a uniq identifier that can be calculated
 from any given URL. thus we'd need an alogritm that gives always a uniq
 (hash) id from any possible URL. It could be 64 bits, or whatever is
 possible to make it uniq enough. This algoritm could well be
 non-reversible.
    Then, for logging purposes we'd need to carry request URL along for
 all the service time, but this is not very much a RAM eater and happens
 anyway.
    Upon request we'd calc a uniq hash id from URL and make a lookup.
 HIT/MISS doesn't change anything in squid operation, no need to uncompress
 URL at any stage. swaplog could also contain only hash id of URL (or both).
    I don't know if it is possible to calc a uniq id from any url in such a
 way that no two different URL's would yield the same id, but I believe
 that "collisions" could be made extremely rare.
    ICP could use these cryptic ID's to ask for hits from peering caches,
 (if they have negotiated to use the same algoritm), reducing ICP traffic
 and remote cpu usage.
    As no place on disk would contain actual URL for which the id was
 made, it could be very difficult to change algoritm if the need arises.
 Also it would be hard to detect when collisions occur. To doublecheck,
 I'd suggest to prepend URL to any object on disk. Then, when servicing
 object, it would be easy to strip URL and check with actual request URL.
 In addition, saving URL's with objects gives a way to rebuild all store
 data from files spread on disks in case swaplog gets trashed or corrupted.
    Then, swaplog could contain both the ID and URL to help detect errors
 on startup, but I still don't think squid needs to keep URLs in ram while
 running.
    And last, this ID calc routine could use reversable algoritm also,
 although I think it would have much less compression ratio.

    In conclusion, if this idea is worth anything, squid RAM usage could
 drop from average 100 bytes per URL to 6-10, giving more ram and speeding
 up lookups.

 I must be missing something here...?

 best regards,

 ----------------------------------------------------------------------
  Andres Kroonmaa mail: andre@online.ee
  Network Manager
  Organization: MicroLink Online Tel: 6308 909
  Tallinn, Sakala 19 Pho: +372 6308 909
  Estonia, EE0001 http://www.online.ee Fax: +372 6308 901
 ----------------------------------------------------------------------

--MimeMultipartBoundary--
Received on Tue Jul 29 2003 - 13:15:42 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:23 MST