Re: [squid-users] questions about what's in my logs... from Amos Jeffries on 2013-07-18 (squid-users)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 19 Jul 2013 00:43:41 +1200

On 19/07/2013 12:10 a.m., Travel Factory S.r.l. wrote:
>
> Yesteray I moved for several hours all my users to the 2 new servers.
>
> Since I want to test SMP / Rock and eventually SMPCarp I went to have
> a look at my logs.
>
> My first goal is understand which max-size to set to rock cache_dir.
> So I did this on one server:
> grep SWAPOUT store.log.0 | sed -e 's/ */|/g' | cut -d"|" -f11| cut
> -d"/" -f2 > sizes
> wc -l sizes
> 491458
> followed by:
>
> cat sizes | sort -g | uniq -c > result
> wc -l result
> 68900
>
> you can download the file, if you want, from www.bruxx.it/frank/result.
>
> These are SWAPOUT entries and as far as I know they are stored on
> disk... are they ?
>
> You will notice that there are 18902 requests for 43 bytes SWAPOUT.
> 15414 are from http://p.twitter.com/t.gif?
>
> Is it normal that these files are cached ?

Yes. Looks like a web-bug to me, and a lot of those are coded up using
"no-cache" as if it were preventing caching. Which can be a nice save of
bandwidth if it is a icon-sized bug, but the 1px once the IMS headers
can be larger than the original payload size was to begin with - so no
gain using no-cache over no-store.

>
> 1374048878.108 SWAPOUT 00 00008061 FE66CED6D9B9E3E31D39654ED9FE19FA
> 200 1374048878 1328738114 -1 image/gif 43/43 GET
> http://p.twitter.com/t.gif?
>
> I can't find a single HIT in access log, well.... ok, I have 1265
> TCP_REFRESH_UNMODIFIED/304
> On my prodution server (squid 2.7) I only have TCP_MISS in the logs !
>
>
> So I arrive at the questions:
>
> Is it normal that these queries, with the ?, are cached ?

Yes. They are URLs just like any other. Nothing special there except the
missing query-string portion.

> Is there a list of domains/pages that it is better not to cache since
> they are changing anyway ?

No, that is not possible. There is no such thing as a page in HTTP.
Really. There are only objects, and some of those objects happen to be
indexes of other objects URLs with some display markup about how to
format the collection if and when they are all downloaded.

But every response has cache control headers saying whether that
particular response is cacheaable or not. Squid obeys those headers
unless you configure it to disobey the protocol somehow.

>
> After removing the 3213 entries with 0 bytes, I have 302541 entries
> with less than 9000 bytes... they cover 75% of the cached requests...
> is 9000 a good tradeoff ?

Don't remove those entries with 0-bytes. Even responses without a
payload can be cacheable, and they are in the size range where you
benefit by having in memory or rock store.

>
> My biggest SWAPOUT entry is for 4MB files.... I have this line:
> maximum_object_size 5 GB
> but perhaps GB is not recognized ?????

Squid recognizes units up to PB right now. Perhapse you placed it below
the cache_dir lines in your config file. There is a bug in recent
releases where that directive only works if it is configured above the
cache_dir lines.

Or perhapse your biggest file transparted simpy is the 4MB (I agree it
is probably the default maximum_object_size limit, but it could be a fluke).

Amos
Received on Thu Jul 18 2013 - 12:43:49 MDT

This archive was generated by hypermail 2.2.0 : Thu Jul 18 2013 - 12:00:23 MDT