[squid-users] Re: store_avg_object_size -- default needs updating?

From: Linda W <squid-user_at_tlinx.org>
Date: Mon, 25 Feb 2013 18:19:46 -0800

Alex Rousskov wrote:
> On 02/18/2013 04:01 PM, Linda W wrote:
>> Has anyone looked at their average cached object size
>> lately?
>>
>> At one point, I assume due to measurements, squid
>> set a default to 13KB / item.
>>
>> About 6 or so years ago, I checked mine out:
>> (cd /var/cache/squid;
>> cachedirs=( $(printf "%02X " {0..63}) )
>> echo $[$(du -sk|cut -f1)/$(find ${cachedirs[@]} -type f |wc -l)]
>> )
>> --- got '47K, or over 3x the default.
>>
>> Did it again recently:
>> 310K/item average.
>>
>> Is the average size of web items going up or are these peculiar to
>> my users' browser habits (or auto-update programs from windows
>> going through cache, etc...).
>
> According to stats collected by Google in May 2010, the mean size of a
> GET response was about 7KB:
> https://developers.google.com/speed/articles/web-metrics
>
> Note that the median GET response size was less than 3KB. I doubt things
> have changed that much since then.

---
I'm pretty sure that google's stats would NOT be representative
of the net as a whole.  Google doesn't serve content -- the service
indexes of content -- the indices of content are going to be significantly smaller
than the content being indexed -- especially when pictures or other non-text
files are included.
> 
> Google stats are biased because they are collected by Googlebot.
> However, if you look at fresh HTTP archive stats, they seem to give a
> picture closer to 2010 Google stats than to yours:
> http://httparchive.org/trends.php#bytesTotal&reqTotal
> 
> (I assume you need to divide bytesTotal by reqTotal to get mean response
> size of about 14KB).
---
	That's how I'd read that data.
	But I'll betcha they don't have any download sites on their
top list.  Add in 'downloads.suse.org' and see how the numbers tally.
Have 2-3 users download that in a day and see if content is being cached...
it IS cacheable.
	Some stuff like some of the ISO images gets into the gigabytes,
though even I cut off caching above 1G.
My maxmem cache size is 512MB, and max disk cache size is 1G...
If you use the default squid settings,
the maximum object size is 512KB and 4M, which would not cache most
of the stuff on download sites -- so when a new release of some software
distribution or package comes out, those stats won't be included in the
averages.
I would say it is hard to get an accurate reading of actual transfer size,
if you have the cut-offs set at the defaults -- if you go to an image site
like deviantart, or animepaper.net or most wallpaper sites, you'll find the
average image sizes easily go over the memcache size, and for hires-images, those
can easily exceed the default max disk-cache size.  Losing all the stats for
the larger files would bias any "average get" or cache file size.
Seriously -- look at stats that cut off anything > 4M is going to strongly bias
things.
Received on Tue Feb 26 2013 - 02:19:55 MST

This archive was generated by hypermail 2.2.0 : Wed Feb 27 2013 - 12:00:05 MST