Re: squid-2.4 release ?

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Thu, 01 Feb 2001 23:08:44 +0100

Currently (until metadata updates are available) three timestamps are
required for proper operation:

a) Date header from last IMS reply received
b) Time when the reply was received
c) Fresh until

(a) and (b) can be moved to disk as soon as metadata updates are
available. They are only required when the reply is to be sent to the
client.

(a) is not in place today, and is causing some grief in hierarchies. In
short Age calculation in downstream caches ignores any IMS validations
done by us because we only update the Age header, and age calculations
uses approximately real_age = max(now-Date,Age)

(b) is needed for Age header calculations.

(c) only needs to be in memory for speedy ICP and digest generation, and
some optimizations in removal policies.

For most uses except for digest generation, (c) can be derived from (b)
and the URL.

Two of the three can be stored as offsets, with the thirds as base.

The third (the base) can be stored as an offset to an internal "around
this year" (a couple times the cache size) if we need to hunt for bits.

And I do agree that swap.state can be managed in many better ways than
today, but that is mostly an store-type specific detail, and will be
even more so in the future.

However, as I am not really in need for ICP or Digests, my goal is to
have an efficient store without any in-mem index. Have the logical store
design of this ready on paper, but not the infrastructure in Squid to
implement it..

/Henrik

Andres Kroonmaa wrote:
>
> Henrik,
>
> You replied with pointers to archives, but you didn't really comment.
> What do you think on the matter of aggressive timestamp reductions
> inram?
> On your compactsentry page you mean to handle it differently, but
> imho we don't need "request date (24 bits)" if we have global
> reference point for "freshness TTL".
>
> we can regenerate freshness ttl values periodically, by checkpointing
> swap.state (rereading it with all the timestamps, combine with
> new.log,
> and write out fresh swap.state, updating inram TTL's on the way).
>
> In fact, on the other note, I'd think it might be a good idea to not
> append to swap.state new obejcts, but write to new.log file, keeping
> swap.state file read-only. This would result in something similar
> to logging database systems. Then, during dirty restarts, we could
> read in old swap.state, which is known good (but old), then last
> new.log which overrides old swap.state. This way, if we keep new.log
> small enough (say we checkpoint after 10M log, 2 days, or 1M objects,
> whichever comes first), we can achieve much faster dirty restarts...
>
> On 9 Jan 2001, at 23:53, Henrik Nordstrom <hno@hem.passagen.se> wrote:
>
> > Henrik Nordstrom wrote:
> > >
> > > Andres Kroonmaa wrote:
> > >
> > > > My path of thinking goes like this. We take URL at fetch time,
> find
> > > > those 4 timestamps, from them calculate LM_AGE and FETCH_AGE,
> and then
> > > > act as specified in refresh_patterns. Basically, in whatever
> way, we
> > > > still simply determine an exact time when object needs to
> revalidated.
> > >
> > > Nice to see that more are thinking along the same lines ;-)
> > >
> > > http://www.squid-cache.org/mail-archive/squid-dev/200003/0017.html
>
> > >
> > > the older message referred to is attached (outside the archives)
> >
> >
> > Have tried to summarize the relevant parts on
> > http://squid.sourceforge.net/compactsentry/
> >
> > /Henrik
>
> ---------------------------
>
> From: "Andres Kroonmaa" <andre@online.ee>
> Organization: MicroLink Online
> To: Henrik Nordstrom <hno@hem.passagen.se>
> Date sent: Tue, 9 Jan 2001 10:50:46 +0200
> Subject: Re: squid-2.4 release ?
> Copies to: squid-dev@squid-cache.org
> Priority: normal
>
> On 9 Jan 2001, at 2:38, Henrik Nordstrom <hno@hem.passagen.se> wrote:
>
> > Adrian Chadd wrote:
> >
> > > You could start by changing sdirno into a byte instead of an int.
> > > That chops 3 bytes (on 32 bit archs) out per StoreEntry.
> >
> > Done (and moved things around to account for alignment).
> >
> > Now the structure on 32-bit platforms looks like
> > 8 hash_link hash; /* must be first */
> > 4 MemObject *mem_obj;
> > 4 RemovalPolicyNode repl;
> > 4 time_t timestamp;
> > 4 time_t lastref;
> > 4 time_t expires;
> > 4 time_t lastmod;
>
> I've been thinking, why we need that much time_t data in ram all the
> time. All 4 timestamps are used to determine point in time when squid
> needs to revalidate object freshness. lastref used to help in LRU
> logic, but since we have dlinklists and place ref'ed objects ontop,
> this isn't really very much needed anymore.
>
> If we'd partly revert back to TTL'based refresh logic, only to
> conserve
> ram and help ICP/digests, and let current refresh logic handle actual
> http fetches from cache, we might be able to further reduce memory
> use.
>
> My path of thinking goes like this. We take URL at fetch time, find
> those 4 timestamps, from them calculate LM_AGE and FETCH_AGE, and then
>
> act as specified in refresh_patterns. Basically, in whatever way, we
> still simply determine an exact time when object needs to revalidated.
>
> We can determine this timepoint at a time object is stored into FS, by
>
> the same refresh_pattern rules. We store all 4 timestamps, but in ram
> only keep a single timestamp that specifies a timepoint when object
> cannot be considered fresh any more. As squid currently does not purge
>
> objects that expire by Expires: headers, there is not much difference
> between Expires, max age, and lm_factor based expiring.
>
> We can use this single timestamp when returning ICP replies and when
> deciding whether add the object into digests.
>
> During HTTP fetches, we can afford to make a disk lookup to determine
> precise freshness, and we may recalculate expiry stamp for inmem
> index.
>
> The only thing we sacrifice is ability to change refresh_patterns for
> ICP/digests onthefly. For HTTP fetches this isn't a problem.
>
> Even more. As at initial fetch time we predict future expiry time,
> we can think of using delta stamps instead of precise times. We can
> pick a reference time to be last clean swap.state, and define expiry
> time in minutes into the future. This way we can define refreshing
> time upto 45 days into the future by only 16bit u_short. During
> clean swap.state rebuild, we can easily rewrite this expiry stamp
> based on old and new swap.state date/time. We'd need to add this
> timestamp to swap.state to avoid object reads and referring to
> URL refresh_patterns during clean startups.
>
> If this is a sane idea, then it looks to me like we could drop
> 16 bytes of timestamps down to 2...
>
> ------------------------------------ Andres Kroonmaa Delfi Online Tel:
> 6501 731, Fax: 6501 708 Pärnu mnt. 158, Tallinn, 11317 Estonia
Received on Thu Feb 01 2001 - 15:12:27 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:13:28 MST