Re: [squid-users] faster to not cache large or streamed files?

From: <adam-s@dont-contact.us>
Date: Tue, 29 Apr 2003 21:56:27 -0700

On 30 Apr 2003 12:18:59 +1000, Robert wrote:
>Well, first things first:
>aufs means that squid won't delay processing during disk writes, other
>than for cache hits on other requests.

Robert, thanks again for following up with your suggestions. While I
realize the idea of aufs is that it won't block, still the disk heads
can only write/read one file/bitty at a time. Since I only have the
one disk I figured it would be best to minimize large writes so it
could handle all the other reads and writes it has to handle. On the
other hand, I followed a post here or the FAQ and dug up the fastest
disk I could, a 10K RPM - hence the external diskpack and card. But
using the fast disk to cache files that no other user will access just
seems a waste of disk reads/writes, cpu, etc. I want to avoid that.

>If you are getting less than your full i/n connection - and you have no
>other traffic, plus the sites you are accessing have enough grunt to
>saturate your link, then you can review squid.

My goal is to serve pages as quickly as possible so users won't
complain. The connection was the bottle neck until we split the
traffic between two T-1's (thanks to tcp_outgoing_address). I still
want to review/tune squid as the Internet surfing from each "office"
seems to be doubling each year.

>If you are getting good performance (say 80% of your links nominal speed
>during busy periods), then there is unlikely to be a bottleneck in your
>squid environment.

How are you measuring "good performance" - from squid's stats or users
complaing pages are slow? What we do is hit a basketful of pages 1st
via the proxy and then direct and compare the times. Squid is still
slower during peak times (most of midday) even though some of the
various test sites should have already cached pages which should have
made the total page drawing faster...

>optimising for cost (greatest byte count hit %), performance (greatest
>request hit % and lowest median service time)...

If I read this correctly you are referring to caching methods like LRU
vs. GDSF? Like I wrote earlier, we want to optimize for more hits -
if that is what you mean by performance then yes that's our goal. By
lowest median service time do you mean emphasis on tuning the cache
itself to more efficiently serve up pages or something else? Sorry I
didn't quite understand these 3 relative to what I already know about
squid.

>As for not caching large / streaming files other than windows update -
>thats something that GDSF will address anyway. U

So you are confirming my question that if we have GDSF (as we do) then
we probably don't need the "no_cache deny" directive?

> I would keep the environment as simple as possible. That gives
>you the greatest flexability.

Sounds reasonable, I don't want to tweak arcana, just emphasize
performance/speed and ease up on caching the large files that will
never be used again.

>I.e. when XYZ vendor sends out an email with a link to a mp3 for a new
>ad campaign, all your marketing folk *will* go to the same .mp3 URL :}.

Sounds fair but will happen very rarely for us - trust me, I've been
watching the traffic pretty closely, what kind of usage there is and
it's mostly one-off connections by users to their favorite (and
separate) radio, movie, etc. sites. Hence the desire not to cache all
that stuff. We have a lot of users using the radio, watching movies,
etc. (hopefully delay_pools will mitigate their impact on the T1's)
and I don't want to waste disk reads/writes caching some radio station
1000miles away. There are enough common denominators (like Micro$oft)
that those I think I can code as exceptions. But if you think GDSF
will take care of that, then I will remove the "no_deny cache
streamsorlarge."

thanks,

Adam
Received on Tue Apr 29 2003 - 23:02:37 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:15:36 MST