Re: please help: refresh_pattern question from Dancer on 1998-02-08 (squid-users)

From: Dancer <dancer@dont-contact.us>
Date: Mon, 09 Feb 1998 03:22:52 +1000

RL wrote:

> Thanks for your reply.
>
> A couple of things are still unclear to me:
>
> - when an object becomes stale, how often is it checked?

When it's requested by a client, and only then.

> Is there any
> control over that? I would like it to become stale after x minutes but only
> get checked every y minutes. Or only get checked every x minutes for that
> matter. It would be inefficient for squid to check every time there is a
> request, after the object becomes 'stale'.

Ah..Once it rechecks the object it recalculates a new 'freshness' time for it.
Assuming that the URL was requested constantly (say, every second) squid would
only check it every 60 minutes according to the rule I quoted. Each time it was
checked, it would have a new freshness interval set on it.

> - what's the /i for? Is there documentation on that somewhere which I missed?

Yes, it's in the comments in the squid.conf file. It means 'ignore the case of
alphabetic characters when matching the pattern'. It treats upper and lower-case
characters as equivalent.

> - in the regex, does the entire thing have to match or how does it work? The
> default is . which should be one character. That matches everything? So if I
> put in abc as the regex, it would match 123abc123 and abc123 etc?

Correct. The regular expression is a substring-match. Ie: Does it contain
something which matches this pattern?

If you specify 'abc' then anything with 'abc' in it is matched. If you specify
'cb32e404.exe$' then it matches that string occuring only at the _end_ of a
string (that's what the '$' at the end means, in regular-expressionese).

There's documentation about on regular expressions. The manual entry 'man perlre'
that describes the syntax of perl regular expressions is quite helpful. A quick
guide:

. match any single character.
x match x
x+ match 1 or more x's.
x* match 0 or more x's.
\. really match a .
^abc only match abc at the _beginning_
abc$ only match abc at the _end_
[abc] match any single character that is a or b or c
[a-zA-Z] match any single upper or lower-case alphabetic character.
[^abc] match any single character that isn't a or b or c

So..To match a filename with an extension:
[a-zA-Z][a-zA-Z0-9]*\.[a-zA-Z0-9]+

Which reads: one alphabetic character, followed by zero or more alphanumeric
characters, followed by a real dot, followed by one or more alphanumeric
characters. There are ways of specifying ranges of numbers (like match 1-3 of
them) but I've found them to be tricky and unreliable, and different in different
regular expression libraries.

aa*r in a refresh pattern would match 'aroma' or 'aardvark' (an 'a' followed
by zero or more 'a's, followed by an r. But it would also match 'barracuda',
'carpet', and 'bazaar'.

I use refresh patterns like this:

refresh_pattern/i \.gif$ 10080 50% 40320

A GIF image almost never changes. When it does, someone usually changes the name
of the GIF. This rule looks for '.gif' at the end of a URL. If it finds it, it
figures 50% of the time since it last changed, and adjusts the figures to fit
between min and max. The min and max values I use above are 10080 (one week) and
40320 (one month). I keep a whole bunch of these rules:

.gif, .jpeg, .jpe, .jpg, .mov, .avi, .exe, .com, .qt, .viv, .zip, .arj, .tar, .gz
and so on...

All of these objects are archives, images, movies or programs. They're _big_ and
they almost never change without someone changing the name (and hence the URL).
These are things you can afford to cache for a long time. They may squeeze out
smaller things (like .htm and .html which I have very short max times on) but
those things change more often, anyway, and it's the bulk traffic where we really
seem to save the bytes. (New game demos, browser releases, and popular but
image-heavy sites). YMMV, but that's how it works for us.

> Thanks again, I really appreciate it!
> R. Laderman

You said thankyou. This is good. Most people who write to me don't bother.
That's why you got the comprehensive version. (I hope everyone who uses me as a
reference manual without so much as a thanks is listening).

Enjoy,

--
Did you read the documentation AND the FAQ?
If not, I'll probably still answer your question, but my patience will
be limited, and you take the risk of sarcasm and ridicule.

Received on Sun Feb 08 1998 - 09:29:36 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:38:49 MST