RE: [squid-users] squid: Memory utilization higher than expected since moving from 3.3 to 3.4 and Vary: working from Martin Sperl on 2014-07-18 (squid-users)

From: Martin Sperl <Martin.Sperl_at_amdocs.com>
Date: Fri, 18 Jul 2014 10:09:37 +0000

Hi!

So some further analysis also including the latest samples showed:
we had reached:
        Storage Mem size: 4153696 KB
        Storage Mem capacity: 99.0% used, 1.0% free
After about 8 Days of running.
But unfortunately the decision was made to restart squid due to application-stability concerns.

The statistics reported were at this time:
PS:
squid 15022 9.4 62.1 10662148 10144652 ? Sl Jul08 1147:22 (squid-1) -f /opt/cw/squid/squid.conf
mgr:info:
Squid Object Cache: Version 3.4.3
Build Info:
Start Time: Tue, 08 Jul 2014 23:42:12 GMT
Current Time: Thu, 17 Jul 2014 10:00:01 GMT
Connection information for squid:
        Number of clients accessing cache: 0
        Number of HTTP requests received: 102316485
        Number of ICP messages received: 0
        Number of ICP messages sent: 0
        Number of queued ICP replies: 0
        Number of HTCP messages received: 0
        Number of HTCP messages sent: 0
        Request failure ratio: 0.00
        Average HTTP requests per minute since start: 8429.6
        Average ICP messages per minute since start: 0.0
        Select loop called: 744710406 times, 0.978 ms avg
Cache information for squid:
        Hits as % of all requests: 5min: 54.4%, 60min: 52.7%
        Hits as % of bytes sent: 5min: 56.4%, 60min: 57.9%
        Memory hits as % of hit requests: 5min: 69.9%, 60min: 68.9%
        Disk hits as % of hit requests: 5min: 1.2%, 60min: 1.3%
        Storage Swap size: 7549744 KB
        Storage Swap capacity: 90.0% used, 10.0% free
        Storage Mem size: 4153612 KB
        Storage Mem capacity: 99.0% used, 1.0% free
        Mean Object Size: 5.12 KB
        Requests given to unlinkd: 0
Median Service Times (seconds) 5 min 60 min:
        HTTP Requests (All): 0.00091 0.00091
        Cache Misses: 0.25890 0.27332
        Cache Hits: 0.00000 0.00000
        Near Hits: 0.00286 0.00286
        Not-Modified Replies: 0.00000 0.00000
        DNS Lookups: 0.00094 0.00094
        ICP Queries: 0.00000 0.00000
Resource usage for squid:
        UP Time: 728269.133 seconds
        CPU Time: 68842.988 seconds
        CPU Usage: 9.45%
        CPU Usage, 5 minute avg: 3.87%
        CPU Usage, 60 minute avg: 3.58%
        Process Data Segment Size via sbrk(): 10548736 KB
        Maximum Resident Size: 40596864 KB
        Page faults with physical i/o: 73693
Memory usage for squid via mallinfo():
        Total space in arena: -2034044 KB
        Ordinary blocks: -2050795 KB 32020 blks
        Small blocks: 0 KB 0 blks
        Holding blocks: 39332 KB 10 blks
        Free Small blocks: 0 KB
        Free Ordinary blocks: 16751 KB
        Total in use: 16751 KB -1%
        Total free: 16751 KB -1%
        Total size: -1994712 KB
Memory accounted for:
        Total accounted: 1362580 KB -68%
        memPool accounted: 9751188 KB -489%
        memPool unaccounted: -11745900 KB -0%
        memPoolAlloc calls: 49821009105
        memPoolFree calls: 49941055558
File descriptor usage for squid:
        Maximum number of file descriptors: 65536
        Largest file desc currently in use: 388
        Number of file desc currently in use: 341
        Files queued for open: 0
        Available number of file descriptors: 65195
        Reserved number of file descriptors: 100
        Store Disk files open: 0
Internal Data Structures:
        1482737 StoreEntries
        845588 StoreEntries with MemObjects
        845582 Hot Object Cache Items
        1474676 on-disk objects

So it would have been very nice to see how the system behaved afterwards.
Unfortunately this is not possible because of the restart

Note that my originally memory footprint reported via ps was also:
squid 17695 10.9 61.2 10629864 10001520 ? Rl Apr21 12131:59 (squid-1) -f /opt/squid/squid.conf

So the memory foot-print stayed fairly stable at around 10GB for about 2.5 month (or at least a long portion of that time)

So it is not really a memory leak which goes out of bounds - it is just a lot more memory that is used by the cache!

Effectively the Values stay fairly constant over time for our use-case independent of cache fill rates:
(I know that this is different to what I had reported earlier)

Avg DiskCache KB/cache_object: 5.12kb (as reported per mgr:info Storage Swap size/StoreEntries)
Avg MemCache KB/cache_object: 4.78 kb (as reported per mgr:info Storage Mem size/ StoreEntries with MemObjects - last time it was taken from du -sk, so there is a difference!)
Avg PoolTotalSize KB/cache_object: 12.06 Kb (as reported by mgr:mem Total/ StoreEntries with MemObjects)

Assuming that Average memcache is average payload size, then
the 5.12kb for the disk cache would be more in line with RequestHeader+Response-Header+Response-Body+Overhead.
But the factor of 2.5 for process memory/object is strange.

The question is why?

The FAQ would not indicate anything that this can be expected to be as high as 2.5!

Maybe we should better estimate the size of a mem_Cache object (by accounting for the allocations done) and based on
this calculate the actual mem_cache size, which then can get compared to what is configured?

I believe our workaround will be to reduce mem_cache size to 2 or 3GB and then the system should stay stable.

Martin

> -----Original Message-----
> From: Martin Sperl
> Sent: Freitag, 18. Juli 2014 09:14
> To: Alex Rousskov; squid-users_at_squid-cache.org
> Subject: RE: [squid-users] squid: Memory utilization higher than expected
> since moving from 3.3 to 3.4 and Vary: working
>
> Hi Alex!
>
> Well as for data: I have posted the raw data as an excel sheet including graphs
> to the ticket: http://bugs.squid-cache.org/show_bug.cgi?id=4084
> Especially the tab: "ChartKBto100%" show the wavy pattern of the memory
> pools (graphed as % of last measured value)
> And then you have the same, but graphing the delta to the previous hour:
> "ChartKBto100%Differencial"
>
> And finally there is "ChartInternal Data Structures" which graphs the behavior
> of the 4 different basic structures.
>
> Those wavy pattern I have mentioned are visible in each of those graphs and
> follow our traffic curve.
>
> Well - as for the theories:
> If it is "underestimating the real memory-footprint of a mem_object", then:
> a reduction of the memory cache size to 2GB should show a leveling of
> If it is instead wasted memory per request, then decreasing the memory
> settings would not have an impact.
>
> Here some more stats on Requests distribution with regards to cache hit/miss
> and status codes for a 20hour period:
> Count CacheStatus-StatusCode
> 11846509 total requests
> 6425676 TCP_MEM_HIT-200
> 1462630 TCP_MISS-200
> 1460093 TAG_NONE-302
> 1378134 TCP_IMS_HIT-304
> 390021 TCP_MISS-302
> 244197 TCP_REFRESH_MODIFIED-200
> 101942 TCP_NEGATIVE_HIT-404
> 88726 TAG_NONE-400
> 69045 TCP_REFRESH_UNMODIFIED-200
> 33006 TCP_MISS_ABORTED-0
> 30973 TAG_NONE_ABORTED-0
> 26063 TCP_MISS-304
> 23619 TCP_MISS-301
> 20107 TCP_DENIED_REPLY-302
> 17303 TCP_MISS-404
> 17267 TCP_MISS_ABORTED-200
> 16727 TCP_DENIED-302
> 8215 TCP_MEM_HIT-404
> 7921 TCP_HIT-200
> 7478 TCP_MEM_HIT_ABORTED-200
> 4909 TCP_MISS-303
> 3914 TCP_MISS-206
> 2746 TCP_DENIED-403
> 1448 TCP_MEM_HIT-206
> 777 TAG_NONE-405
> 670 TCP_MISS_TIMEDOUT-200
> 448 TCP_MISS_ABORTED-206
> 390 TCP_MISS-500
> 310 TCP_HIT_ABORTED-0
> 240 TCP_DENIED_REPLY-307
> 232 TCP_MEM_HIT_TIMEDOUT-200
> 169 TCP_MISS-502
> 146 TCP_MISS-503
> 139 TCP_DENIED_ABORTED-302
> 132 TCP_REFRESH_MODIFIED-302
> 69 TCP_NEGATIVE_HIT-500
> 62 TCP_REFRESH_UNMODIFIED-206
> 60 TCP_NEGATIVE_HIT-502
> 54 TCP_REFRESH_MODIFIED-206
> 50 TCP_REFRESH_UNMODIFIED_ABORTED-200
> 44 TCP_MISS-403
> 44 TAG_NONE_ABORTED-400
> 42 TCP_REFRESH_FAIL_ERR-500
> 36 TCP_REFRESH_MODIFIED_ABORTED-200
> 35 TCP_DENIED_ABORTED-403
> 33 TCP_NEGATIVE_HIT-403
> 30 TCP_DENIED-307
> 21 TCP_CLIENT_REFRESH_MISS-200
> 19 TCP_MISS_ABORTED-302
> 16 TCP_MISS-400
> 8 TCP_MISS-504
> 8 TCP_IMS_HIT_ABORTED-304
> 8 TCP_HIT_ABORTED-200
> 5 TCP_MISS-416
> 5 TCP_MEM_HIT_ABORTED-206
> 4 TCP_REFRESH_MODIFIED-404
> 4 TCP_REFRESH_FAIL_OLD-200
> 4 TCP_NEGATIVE_HIT_ABORTED-404
> 4 TCP_NEGATIVE_HIT-503
> 4 TCP_MISS_TIMEDOUT-206
> 4 TCP_MISS-204
> 4 TCP_MEM_HIT-500
> 4 TAG_NONE_TIMEDOUT-400
> 3 TCP_NEGATIVE_HIT-400
> 2 TCP_REFRESH_UNMODIFIED_TIMEDOUT-200
> 2 TCP_HIT_TIMEDOUT-200
> 1 TCP_SWAPFAIL_MISS-200
> 1 TCP_REFRESH_MODIFIED_TIMEDOUT-200
> 1 TCP_MISS_ABORTED-404
> 1 TCP_DENIED_REPLY_ABORTED-302
>
> I will try to do some more analysis including latest data and share the result as
> well - this time also correlating TPS...
>
> Martin
>
> > -----Original Message-----
> > From: Alex Rousskov [mailto:rousskov_at_measurement-factory.com]
> > Sent: Donnerstag, 17. Juli 2014 23:03
> > To: squid-users_at_squid-cache.org
> > Cc: Martin Sperl
> > Subject: Re: [squid-users] squid: Memory utilization higher than expected
> > since moving from 3.3 to 3.4 and Vary: working
> >
> > On 07/17/2014 02:49 AM, Martin Sperl wrote:
> > > This is why I have been mentioning all the pools that show similar (wavy)
> > > memory increase pattern. There must be one of those that is the root of
> > > all the others.
> >
> > Unfortunately, I do not share your certainty. As I see it, at least two
> > theories more-or-less fit your data:
> >
> > A) MemObjects are leaking.
> >
> > B) Your memory cache is growing, explaining some or all of the pools
> > growth and muddying the waters along the way. Something unrelated to
> > MemObjects is leaking. That something may or may not be pooled.
> >
> > My bet is on (B).
> >
> > If you do not mind reminding me, does accounted-for memory pools growth
> > correspond to the actual/total Squid memory growth? Or is the "pooled"
> > vs "total/real" gap widening with time?
> >
> >
> > > the strange thing is that if you look at the
> > > distribution of vm_objects, then you see that they have expired long ago
> > > (16267 days ago to be exact, so with EX:-1 - 42511 exactly).
> > > If these have been expired, then why would they get loaded into
> memory?
> >
> > because you have free cache space and Squid can still serve that cached
> > data (after revalidation)? In general, there is no benefit from purging
> > something from a non-full cache, especially if that something is reusable.
> >
> > I have not checked the code and responses producing those "16267 days"
> > stats so I treat that number with a grain of salt. :-)
> >
> >
> > > Well: SUM(inmem_hi) is memory for payload (possibly without headers)
> > against
> > > which we compare the cache_mem against.
> > >
> > > But if the squid process consumes 9GB, then there must be more than a
> > factor
> > > of 2 of overhead so that we get to those 9GB.
> >
> > Yes if (A) above is correct. If (B) above is correct, that "overhead" is
> > a leak of something other than memory cache entries.
> >
> >
> > > Report-file(with Date+Time) psmemory MemObjects kb/MemObject
> > > report-20140708-234225.txt 1795932 123979 6.03
> > ...
> > > report-20140717-000001.txt 10662148 845404 11.37
> >
> > > Which shows that the average size/memObject is increasing constantly!
> >
> > Yes, and that is one of the reasons I bet on (B). (B) is a lot less
> > surprising than a constantly growing MemObject overhead that (A) has to
> > explain :-). Running with a small cache size (i.e., with a full cache)
> > would confirm that.
> >
> >
> >
> > > Another point here more on the reporting/debugging side:
> > > Total space in arena: -2034044 KB
> >
> > Yes, this is a known problem mostly outside of Squid control:
> > http://www.squid-cache.org/mail-archive/squid-users/201402/0296.html
> >
> > Squid should probably stop using that API [when it can detect that it is
> > broken].
> >
> >
> > And yes, I know that my response does not really help you. I just wanted
> > to make sure you consider both (A) and (B) theories in your investigation.
> >
> >
> > Alex.
> > P.S. Another suggestion that you probably cannot use in your environment
> > is to run trunk or a trunk-based branch. Perhaps the leak you are after
> > has been fixed. And if not, somebody may be more motivated to help you
> > find it in trunk.
>
>
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
> you may review at http://www.amdocs.com/email_disclaimer.asp

This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,
you may review at http://www.amdocs.com/email_disclaimer.asp
Received on Fri Jul 18 2014 - 10:09:54 MDT

This archive was generated by hypermail 2.2.0 : Fri Jul 18 2014 - 12:00:04 MDT