Re: [squid-users] faster to not cache large or streamed files?

From: Robert Collins <robertc@dont-contact.us>
Date: 30 Apr 2003 21:31:44 +1000

On Wed, 2003-04-30 at 14:56, adam-s@pacbell.net wrote:
> On 30 Apr 2003 12:18:59 +1000, Robert wrote:
> >Well, first things first:
> >aufs means that squid won't delay processing during disk writes, other
> >than for cache hits on other requests.
>
> Robert, thanks again for following up with your suggestions. While I
> realize the idea of aufs is that it won't block, still the disk heads
> can only write/read one file/bitty at a time.

Well, no :}. See Henriks email...

> Since I only have the
> one disk I figured it would be best to minimize large writes so it
> could handle all the other reads and writes it has to handle.

Theres a concept in IT, known as 'premature optimisation is the root of
all evil'. Again, if you are not experiencing a specific performance
issue, there is little benefit in tweaking like you are suggesting. For
instance, if you have high latency on hits for small and large files,
you will get more benefit by adding a new spindle (aka disk) than by
reducing the writes of large files. Large, cachable files occur
infrequently compared to the thousands of small files - which also need
to be written, and take up head seeks and writes too.

> On the
> other hand, I followed a post here or the FAQ and dug up the fastest
> disk I could, a 10K RPM - hence the external diskpack and card. But
> using the fast disk to cache files that no other user will access just
> seems a waste of disk reads/writes, cpu, etc. I want to avoid that.

Nice goal, but I think it will be less effective than directly
troubleshooting what your actual issue is.

> >If you are getting less than your full i/n connection - and you have no
> >other traffic, plus the sites you are accessing have enough grunt to
> >saturate your link, then you can review squid.
>
> My goal is to serve pages as quickly as possible so users won't
> complain. The connection was the bottle neck until we split the
> traffic between two T-1's (thanks to tcp_outgoing_address). I still
> want to review/tune squid as the Internet surfing from each "office"
> seems to be doubling each year.

Great. So monitor it :}.

> >If you are getting good performance (say 80% of your links nominal speed
> >during busy periods), then there is unlikely to be a bottleneck in your
> >squid environment.
>
> How are you measuring "good performance" - from squid's stats or users
> complaing pages are slow?

From total throughput via squid (bytes per second in on the server
side). Also, keep an eye on the median service times - which is what
will likely cause the users to complain about pages being slow.

> What we do is hit a basketful of pages 1st
> via the proxy and then direct and compare the times. Squid is still
> slower during peak times (most of midday) even though some of the
> various test sites should have already cached pages which should have
> made the total page drawing faster...

Ok, now we are getting somewhere: squid's latency is greater than that
of the raw link, by some noticeable amount, and you want to correct that
- right?

> >optimising for cost (greatest byte count hit %), performance (greatest
> >request hit % and lowest median service time)...
>
> If I read this correctly you are referring to caching methods like LRU
> vs. GDSF? Like I wrote earlier, we want to optimize for more hits -
> if that is what you mean by performance then yes that's our goal.

More hits != more performance. More hits == more disk read load, and IF
you have a disk bottleneck, you've just made it worse.

For optimal performance, you need to start with your chosen metric (i.e.
median service time), and then examine what is contributing to that:
* DNS lookups.
* Disk IO limits.
* Network latency (both server->squid and squid->client).
* cache machine activity (i..e swapping, other processes ...)

> By
> lowest median service time do you mean emphasis on tuning the cache
> itself to more efficiently serve up pages or something else?

Median service time:
The median (middle value - not the mean or average) service time. The
service time is the time taken to deliver the entire object (from the
time accept() occurs, to the time the last block of data is written to
the socket).

Median service time is a very good indicator for squid speed. In the
stats pages in cachemanager, it's broken up into hit and miss values as
well, so if the hit median time is large, you need to look into factors
that affect hits (dns, large access lists, overall load, disk IO, client
side network latency). Conversely, if the miss median time is large,
look into factors affecting miss retrieval (hit factors - disk +
upstream network load/latency).

> So you are confirming my question that if we have GDSF (as we do) then
> we probably don't need the "no_cache deny" directive?

Yes. Note that true streaming files (i.e. radio) won't ever cache
anyway, so you don't need to worry about them at all.

> Sounds reasonable, I don't want to tweak arcana, just emphasize
> performance/speed and ease up on caching the large files that will
> never be used again.

Seriously, don't worry about the large files. IF you identify (via cache
manager primarily) that hits are suffering due to disk *writes*, then
you can look at the changes you've proposed. I strongly suspect that
you'll be better off with a couple of cheap, 5Gb IDE 10K disks thrown
into the mix (and the cache spread out over all three).

> But if you think GDSF
> will take care of that, then I will remove the "no_deny cache
> streamsorlarge."

I do think that :}. Really, the most important thing is solid analysis,
not picking a single aspect of the system and tuning that.

Bottom line: Identify the bottleneck first, then and only then correct
it. From what you've said about your system and tests, I think you need
to do further analysis.

Rob

-- 
GPG key available at: <http://users.bigpond.net.au/robertc/keys.txt>.

Received on Wed Apr 30 2003 - 05:36:02 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:15:37 MST