Re: refresh.cc - refreshIsCachable()

From: Doug Dixon <doug.dixon@dont-contact.us>
Date: Fri, 28 Apr 2006 14:14:10 +1200

On 28 Apr 2006, at 05:41, Duane Wessels wrote:

>> The return value of refreshIsCachable() can be calculated without
>> making a call to refreshCheck().
>>
>> I.e. you can remove the call to refreshCheck() from
>> refreshIsCachable(), and refreshIsCachable() will still return the
>> correct result.
>
> Sorry, still not following you. refreshIsCachable() uses the
> 'reason' value
> in an early if statement:
>
> if (reason < 200)
> /* Does not need refresh. This is certainly cachable */
> return 1;
>
> And there are numerous cases where refreshCheck() would return FRESH_*
> (i.e. < 200) values.
>
> DW

Sorry, you're right, we can't do away with the call to refreshCheck()
altogether. I guess I'm finding it difficult to piece together all
the bits that make the algorithm tick.

So... refreshIsCachable() is only ever called if the response from
the origin is one of:

                200 OK
                203 HTTP_NON_AUTHORITATIVE_INFORMATION
                300 MULTIPLE CHOICES
                301 MOVED PERMANENTLY
                410 GONE

The call to refreshCheck() then tells us whether the response would
be FRESH or STALE at Config.minimum_expiry_time, taking into account
the caching headers in both request and response, and the relevant
configuration params.

If refreshCheck() returns a FRESH_ reason, refreshIsCachable()
immediately returns true, and the response is cached. End of story,
so far so good.

The only thing I don't get now is this:

According to the algorithm in refreshIsCachable() there are
circumstances where we would cache a STALE response. In fact, we
would only NOT cache a stale response if (a) it didn't have a Last-
Modified header, or (b) it had a reply but no Content-Length header.

It's not an easy algorithm to follow in there... so I've rewritten it
like this:

int
refreshIsCachable(const StoreEntry * entry)
{
        /* If this response had a Last-Modified header, and either no reply
or a reply with a
         * positive Content-Length, we don't care if it is FRESH or STALE,
it is always cacheable.
         */
        if (entry->lastmod >= 0)
                if (entry->getReply() == NULL)
                        return 1;
                if (entry->getReply()->content_length > 0)
                        return 1;

        /* Otherwise we need to check whether it's FRESH or STALE */
        int reason = refreshCheck(entry, NULL, Config.minimum_expiry_time);
        refreshCounts[rcStore].total++;
        refreshCounts[rcStore].status[reason]++;
                
        if (reason < STALE_MUST_REVALIDATE)
                /* FRESH, so cache */
                return 1;
                
        /* STALE, so don't cache */
        return 0;
}

I *think* this is functionally identical.

So my questions now are:

1) Why is a STALE response cacheable just because it has a Last-
Modified header, and either no reply or a reply with a positive
Content-Length?
2) What's the higher level meaning of the situation that says it
should be cached?
3) Is this the desired behaviour?

If this IS the desired behaviour, then my rewritten version short-
circuits the function a lot of the time (e.g. for normal 200 OKs)
without making a call to refreshCheck(). So it's a bit quicker, but
most of all it's a lot easier to follow.

Basically, I suspect the existing refreshIsCachable() algorithm isn't
quite right, and that what we should be doing is more along the lines
of calling refreshCheck() every time and relying on that.

This is just a suspicion tho :) Am I getting close?

Thanks for your patience... and sorry for barking up the wrong tree
before.

Cheers
Doug
Received on Sat Apr 29 2006 - 16:09:36 MDT

This archive was generated by hypermail pre-2.1.9 : Mon May 01 2006 - 12:00:04 MDT