entry->lastmod in absence of Last-Modified: header

From: Arjan de Vet <Arjan.deVet@dont-contact.us>
Date: Mon, 20 Jul 1998 21:42:36 +0200 (CEST)

I noticed that pages with no Last-Modified: and Expires: headers seem to
hang around for days (sometimes 2 or 3) in the cache without being refreshed
unless a client refresh is issued. I might have found the problem (some
extra debugging code was added to storeTimestampsSet, tests were done using
1.2beta22 with default refresh_pattern).

First request:

1998/07/17 23:17:39| refreshCheck: '[null_mem_obj]'
1998/07/17 23:17:39| refreshCheck: Matched '. 0 20% 259200'
1998/07/17 23:17:39| refreshCheck: age = 959
1998/07/17 23:17:39| refreshCheck: entry->timestamp = 900709300
1998/07/17 23:17:39| refreshCheck: entry->lastmod = 900704029
1998/07/17 23:17:39| refreshCheck: factor = 0.181939
1998/07/17 23:17:39| refreshCheck: NO: factor < pct

900710259.526 9 127.0.0.1 TCP_HIT/200 9552 GET http://www.planet.nl/

This is a TCP_HIT. Somewhat later a refresh is needed:

1998/07/17 23:19:46| refreshCheck: '[null_mem_obj]'
1998/07/17 23:19:46| refreshCheck: Matched '. 0 20% 259200'
1998/07/17 23:19:46| refreshCheck: age = 1086
1998/07/17 23:19:46| refreshCheck: entry->timestamp = 900709300
1998/07/17 23:19:46| refreshCheck: entry->lastmod = 900704029
1998/07/17 23:19:46| refreshCheck: factor = 0.206033
1998/07/17 23:19:46| clientProcessExpired: setting lmt = 900704029
1998/07/17 23:19:47| getMaxAge: 'http://www.planet.nl/'
1998/07/17 23:19:47| ctx: enter level 0: 'http://www.planet.nl/'
1998/07/17 23:19:47| storeTime: served_date = 900707629
1998/07/17 23:19:47| storeTime: e->expires = -1
1998/07/17 23:19:47| storeTime: e->timestamp = served_date = 900707629
1998/07/17 23:19:47| ctx: exit level 0
1998/07/17 23:19:47| storeTime: served_date = 900707629
1998/07/17 23:19:47| storeTime: e->expires = -1
1998/07/17 23:19:47| storeTime: e->timestamp = served_date = 900707629

900710387.398 537 127.0.0.1 TCP_REFRESH_HIT/200 9552 GET http://www.planet.nl/

This is a REFRESH_HIT but entry->lastmod has not been changed which you can
check with a new request:

1998/07/17 23:22:05| refreshCheck: '[null_mem_obj]'
1998/07/17 23:22:05| refreshCheck: Matched '. 0 20% 259200'
1998/07/17 23:22:05| refreshCheck: age = 138
1998/07/17 23:22:05| refreshCheck: entry->timestamp = 900710387
1998/07/17 23:22:05| refreshCheck: entry->lastmod = 900704029
1998/07/17 23:22:05| refreshCheck: factor = 0.021705
1998/07/17 23:22:05| refreshCheck: NO: factor < pct

900710525.888 9 127.0.0.1 TCP_HIT/200 9552 GET http://www.planet.nl/

When a client refresh is done entry->lastmod does get updated and is being
set to the served_date in absence of a real Last-Modified: header.

1998/07/17 23:25:35| getMaxAge: 'http://www.planet.nl/'
1998/07/17 23:25:35| ctx: enter level 0: 'http://www.planet.nl/'
1998/07/17 23:25:35| storeTime: served_date = 900710733
1998/07/17 23:25:35| storeTime: e->expires = -1
1998/07/17 23:25:35| storeTime: e->lastmod = served_date = 900710733
1998/07/17 23:25:35| storeTime: e->timestamp = served_date = 900710733

900710737.568 2891 127.0.0.1 TCP_CLIENT_REFRESH_MISS/200 9548 GET http://www.planet.nl/

As long as TCP_REFRESH_{MISS,HIT} are used the timestamp field is adjusted
but the lastmod field isn't. When I started testing this I had a copy of the
above page from July 16 but entry->lastmod was still at July 8:

1998/07/17 21:44:07| refreshCheck: '[null_mem_obj]'
1998/07/17 21:44:07| refreshCheck: Matched '. 0 20% 259200'
1998/07/17 21:44:07| refreshCheck: age = 85443
1998/07/17 21:44:07| refreshCheck: entry->timestamp = 900619204
1998/07/17 21:44:07| refreshCheck: entry->lastmod = 899927249
1998/07/17 21:44:07| refreshCheck: factor = 0.123481

(899927249 = Wed Jul 8 21:47:29 CEST 1998)

entry->timestamp matches with the most recent TCP_REFRESH_MISS:

900619206.278 3061 127.0.0.1 TCP_REFRESH_MISS/200 9693 GET http://www.planet.nl/

(900619206 = Thu Jul 16 22:00:06 CEST 1998)

This old lastmod value has a negative influence on the factor calculation:

        factor = (now - timestamp) / (timestamp - lastmod)

Shouldn't a TCP_REFRESH_{MISS,HIT} should also update entry->lastmod and set
it to e.g. served_date/squid_curtime? I tried coding it myself but the
refresh code is not that simple.

Arjan

-- 
Arjan de Vet, Eindhoven, The Netherlands              <Arjan.deVet@adv.iae.nl>
URL: http://www.iae.nl/users/devet/           for PGP key: finger devet@iae.nl
Received on Tue Jul 29 2003 - 13:15:51 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:49 MST