RE: [squid-users] squid: Memory utilization higher than expected since moving from 3.3 to 3.4 and Vary: working

From: Martin Sperl <Martin.Sperl_at_amdocs.com>
Date: Thu, 17 Jul 2014 08:49:24 +0000

Hi Alex!
 
> > So there must be something that links all of those in the last group together.
>
> MemObject structures contain or tie most (possibly all) of the above
> objects. MemObjects are used for current transactions and non-shared
> memory cache storage. The ones used for non-shared memory cache storage
> are called "hot objects". However, some current transactions might
> affect "hot objects" counters as well, I guess. These stats is messy and
> imprecise.
>
> Please note that every MemObject must have a StoreEntry but StoreEntries
> may lack MemObject. When working with large caches, most of the
> StoreEntries without MemObject would correspond to on-disk objects that
> are _not_ also cached in memory.

This is why I have been mentioning all the pools that show similar (wavy)
memory increase pattern. There must be one of those that is the root of
all the others.

>
> The above is more complex for SMP-aware caches which, I think, you are
> not using.
We are not using SMP.

> > But looking at " StoreEntries with MemObjects" the picture is totally
> different.
> > 20140714-020001: 651560 StoreEntries with MemObjects
>
> OK, your memory cache is filling, possibly from swapped in disk entries
> (so that the total number of keys does not grow much)??
That is one interpretation, but the strange thing is that if you look at the
distribution of vm_objects, then you see that they have expired long ago
(16267 days ago to be exact, so with EX:-1 - 42511 exactly).
If these have been expired, then why would they get loaded into memory?

> FWIW, the "StoreEntries with" part of the label is misleading. These are
> just MemObjects. However, that distinction is only important if
> MemObjects are leaking separately from StoreEntries.
That was my understanding essentially:
* StoreEntries: all cached objects (Memory+Disk)
* StoreEntries with MemObjects: numbers
* Hot Object Cache Items: similar to the above - probably defines the
   ones that we keep in cache and that are not "uncachable" responses
   based on Response-headers (so without transient request that happen now).
* on-disk objects: objects that are in disk-cache
  (but some may also be in Memory as well)

> > And these show similar behavior to the pools mentioned above.
> Yes, the "StoreEntries with MemObjects" counter is just the MemObject
> pool counter.
Which was also my interpretation.

> > If I sum up the "inmem_hi:" values I get: 2918369522, so 2.9GB.
> >
> > So it seems as if there must be some major overhead for those inmem
> objects...
>
> How do you calculate the overhead? 2.9GB is useful payload, not
> overhead. Are you comparing 2.9GB with your total Squid memory footprint
> of about 9GB?
> ...
> > So the question is: why do we underestimate memory_object sizes by a
> > factor of aproximately 2?
>
> Sorry, you lost me here. What do you mean by "memory_object sizes",
> where do we estimate them, and x2 compared to what?

Well: SUM(inmem_hi) is memory for payload (possibly without headers) against
which we compare the cache_mem against.

But if the squid process consumes 9GB, then there must be more than a factor
of 2 of overhead so that we get to those 9GB.

Unfortunately looking at mgr:objects does not dump the size of the object in disk cache
But if I do the following calculation for as of now values:
7760936KB for 1474397 on-disk objects, then it would mean 5.2K/object on disk.
Extrapolating that in memory we would have about the same overhead
(say 6K to be generous), then we would need for 845269 memObjects
5071614KB (=845269*6) of memory.
So in total <6GB of memory would be expected for the process.

If we take: http://wiki.squid-cache.org/SquidFaq/SquidMemory, then one of
the major scaling factors is:
* Index of on-disk cache (metadata, kept in memory): 14MB*8G/1G=112MB
* In-memory cache with "hot objects":
    SUM(inmem_hi)+14MB*4G/1G=SUM(inmem_hi)+64MB
* the others (DNS) are in my case not very big.

But we run at 10.6GB, so even if we would assume 1GB of general overhead, then
8GB would be only for memObjects and that would break down to: 11.37kb/memObject.
That is a huge discrepancy compared to the disk cache size of 5.26Kb/diskCacheObject.
(=`du -sk /var/spool/squid/`/1474397)
And this average disk-cache-size is still big compared to AVG(inmem_hi) at 3.6k/memObject

Automating this calculation for the gathered hourly reports (ps uaxwww|grep squid+mgr:info+mgr:mem) shows:
(awk '/squid-1/{M=$5}/StoreEntries with MemObjects/{printf "%s %10i %10i %8.2f\n",FILENAME,M,$1,(M-1024*1024)/$1}' report-*.txt)

Report-file(with Date+Time) psmemory MemObjects kb/MemObject
report-20140708-234225.txt 1795932 123979 6.03
report-20140709-000001.txt 1803800 124783 6.05
report-20140709-010001.txt 1827868 126834 6.14
report-20140709-020001.txt 1847772 128542 6.22
report-20140709-030001.txt 1877496 131087 6.32
report-20140709-040001.txt 1924920 135272 6.48
report-20140709-050001.txt 1989736 140729 6.69
report-20140709-060001.txt 2069536 147082 6.94
report-20140709-070001.txt 2148564 153608 7.16
report-20140709-080001.txt 2238264 160438 7.42
report-20140709-090001.txt 2317684 167234 7.59
report-20140709-100001.txt 2406616 174782 7.77
report-20140709-110001.txt 2502508 182775 7.95
report-20140709-120001.txt 2598948 190814 8.13
report-20140709-130001.txt 2683420 198143 8.25
report-20140709-140001.txt 2774704 205479 8.40
report-20140709-150001.txt 2866016 213307 8.52
report-20140709-160001.txt 2942164 219550 8.62
report-20140709-170001.txt 3048268 228055 8.77
report-20140709-180001.txt 3147340 236483 8.87
report-20140709-190001.txt 3258056 245971 8.98
report-20140709-200001.txt 3366020 254855 9.09
report-20140709-210001.txt 3460648 262917 9.17
report-20140709-220001.txt 3527092 268797 9.22
report-20140709-230001.txt 3557572 271417 9.24
report-20140710-000001.txt 3581820 273440 9.26
report-20140710-010001.txt 3596816 274875 9.27
report-20140710-020001.txt 3610388 275923 9.28
report-20140710-030001.txt 3630528 277527 9.30
report-20140710-040001.txt 3677760 281124 9.35
report-20140710-050001.txt 3716484 284541 9.38
report-20140710-060001.txt 3761984 287976 9.42
report-20140710-070001.txt 3813820 291871 9.47
report-20140710-080002.txt 3873232 296740 9.52
report-20140710-090001.txt 3942136 302285 9.57
report-20140710-100001.txt 4014464 308074 9.63
report-20140710-110001.txt 4084960 313809 9.68
report-20140710-120001.txt 4154532 319559 9.72
report-20140710-130001.txt 4227840 325722 9.76
report-20140710-140001.txt 4295788 331344 9.80
report-20140710-150001.txt 4358836 336309 9.84
report-20140710-160001.txt 4424052 341664 9.88
report-20140710-170002.txt 4497952 347611 9.92
report-20140710-180001.txt 4584468 354501 9.97
report-20140710-190001.txt 4669424 361814 10.01
report-20140710-200001.txt 4759488 369118 10.05
report-20140710-210001.txt 4846816 376831 10.08
report-20140710-220001.txt 4902160 381787 10.09
report-20140710-230001.txt 4939144 385019 10.10
report-20140711-000001.txt 4950740 386269 10.10
report-20140711-010001.txt 4959816 387054 10.11
report-20140711-020001.txt 4972220 387990 10.11
report-20140711-030001.txt 4987424 389131 10.12
report-20140711-040001.txt 5012504 391082 10.14
report-20140711-050001.txt 5046484 393508 10.16
report-20140711-060001.txt 5099096 397892 10.18
report-20140711-070001.txt 5155600 402025 10.22
report-20140711-080001.txt 5205668 406090 10.24
report-20140711-090001.txt 5259172 410431 10.26
report-20140711-100001.txt 5324916 415564 10.29
report-20140711-110001.txt 5380688 420033 10.31
report-20140711-120001.txt 5443296 424989 10.34
report-20140711-130001.txt 5502928 429889 10.36
report-20140711-140001.txt 5569820 435343 10.39
report-20140711-150001.txt 5626396 439731 10.41
report-20140711-160001.txt 5696052 445478 10.43
report-20140711-170001.txt 5758300 450313 10.46
report-20140711-180002.txt 5828472 456013 10.48
report-20140711-190001.txt 5906400 462536 10.50
report-20140711-200001.txt 5984856 469168 10.52
report-20140711-210001.txt 6067804 476073 10.54
report-20140711-220001.txt 6120712 480890 10.55
report-20140711-230001.txt 6152704 483743 10.55
report-20140712-000001.txt 6175076 485724 10.55
report-20140712-010001.txt 6198796 487672 10.56
report-20140712-020001.txt 6223332 489994 10.56
report-20140712-030001.txt 6238924 491409 10.56
report-20140712-040001.txt 6258252 492911 10.57
report-20140712-050001.txt 6287416 495046 10.58
report-20140712-060001.txt 6329056 498147 10.60
report-20140712-070001.txt 6377000 501648 10.62
report-20140712-080002.txt 6425284 505327 10.64
report-20140712-090002.txt 6474928 509158 10.66
report-20140712-100001.txt 6527596 513483 10.67
report-20140712-110001.txt 6584792 518100 10.69
report-20140712-120001.txt 6647700 523203 10.70
report-20140712-130001.txt 6696960 526999 10.72
report-20140712-140001.txt 6742872 530688 10.73
report-20140712-150001.txt 6785636 534206 10.74
report-20140712-160001.txt 6833264 538028 10.75
report-20140712-170001.txt 6894464 542745 10.77
report-20140712-180001.txt 6940764 546703 10.78
report-20140712-190001.txt 7008360 551699 10.80
report-20140712-200001.txt 7066740 556946 10.81
report-20140712-210001.txt 7126948 561768 10.82
report-20140712-220001.txt 7177524 566223 10.82
report-20140712-230001.txt 7204440 568803 10.82
report-20140713-000001.txt 7219436 570253 10.82
report-20140713-010001.txt 7231676 571205 10.82
report-20140713-020001.txt 7238640 571872 10.82
report-20140713-030001.txt 7254256 573061 10.83
report-20140713-040001.txt 7269624 574264 10.83
report-20140713-050001.txt 7301304 576522 10.85
report-20140713-060002.txt 7335648 579168 10.86
report-20140713-070001.txt 7379256 582375 10.87
report-20140713-080001.txt 7416040 585368 10.88
report-20140713-090001.txt 7459900 588620 10.89
report-20140713-100001.txt 7508240 592739 10.90
report-20140713-110001.txt 7565964 596910 10.92
report-20140713-120001.txt 7612252 601031 10.92
report-20140713-130001.txt 7658660 604466 10.94
report-20140713-140001.txt 7714536 609020 10.95
report-20140713-150001.txt 7764536 613098 10.95
report-20140713-160001.txt 7820024 617678 10.96
report-20140713-170001.txt 7873520 621992 10.97
report-20140713-180001.txt 7949592 627626 11.00
report-20140713-190001.txt 8016368 632717 11.01
report-20140713-200001.txt 8073872 637970 11.01
report-20140713-210001.txt 8129848 643145 11.01
report-20140713-220001.txt 8173344 646848 11.01
report-20140713-230002.txt 8195560 648967 11.01
report-20140714-000001.txt 8206940 650047 11.01
report-20140714-010001.txt 8216668 650846 11.01
report-20140714-020001.txt 8224420 651560 11.01
report-20140714-030001.txt 8236072 652329 11.02
report-20140714-040001.txt 8257000 653875 11.02
report-20140714-050001.txt 8285316 655948 11.03
report-20140714-060001.txt 8315496 658080 11.04
report-20140714-070001.txt 8350656 660772 11.05
report-20140714-080001.txt 8392780 663982 11.06
report-20140714-090001.txt 8433028 667137 11.07
report-20140714-100002.txt 8478708 670540 11.08
report-20140714-110001.txt 8533096 674844 11.09
report-20140714-120001.txt 8580940 678968 11.09
report-20140714-130001.txt 8635156 683062 11.11
report-20140714-140001.txt 8681040 686999 11.11
report-20140714-150002.txt 8724996 689962 11.13
report-20140714-160001.txt 8774520 694465 11.13
report-20140714-170001.txt 8829156 698268 11.14
report-20140714-180002.txt 8877008 702078 11.15
report-20140714-190001.txt 8932712 706823 11.15
report-20140714-200001.txt 8990624 711618 11.16
report-20140714-210001.txt 9043120 716365 11.16
report-20140714-220001.txt 9083160 719605 11.17
report-20140714-230001.txt 9100684 721409 11.16
report-20140715-000001.txt 9108924 722309 11.16
report-20140715-010001.txt 9121892 723587 11.16
report-20140715-020001.txt 9127448 724092 11.16
report-20140715-030001.txt 9138608 724814 11.16
report-20140715-040001.txt 9158992 726345 11.17
report-20140715-050001.txt 9186972 728207 11.18
report-20140715-060001.txt 9218656 730691 11.18
report-20140715-070001.txt 9251448 732940 11.19
report-20140715-080001.txt 9290336 735692 11.20
report-20140715-090001.txt 9327012 738771 11.21
report-20140715-100001.txt 9373568 742214 11.22
report-20140715-110001.txt 9418600 745792 11.22
report-20140715-120001.txt 9465732 749634 11.23
report-20140715-130001.txt 9517832 753570 11.24
report-20140715-140001.txt 9555280 756787 11.24
report-20140715-150002.txt 9599048 760129 11.25
report-20140715-160001.txt 9647892 763470 11.26
report-20140715-170001.txt 9691632 767131 11.27
report-20140715-180001.txt 9726140 770066 11.27
report-20140715-190001.txt 9775816 774087 11.27
report-20140715-200002.txt 9830888 778618 11.28
report-20140715-210001.txt 9877992 782855 11.28
report-20140715-220001.txt 9914968 786192 11.28
report-20140715-230001.txt 9942920 788592 11.28
report-20140716-000002.txt 9956396 789737 11.28
report-20140716-010001.txt 9964112 790565 11.28
report-20140716-020001.txt 9971256 791085 11.28
report-20140716-030002.txt 9984892 792030 11.28
report-20140716-040001.txt 9999352 793141 11.29
report-20140716-050001.txt 10025332 795000 11.29
report-20140716-060001.txt 10052044 796868 11.30
report-20140716-070001.txt 10085256 799342 11.31
report-20140716-080001.txt 10121104 801816 11.31
report-20140716-090001.txt 10163464 804565 11.33
report-20140716-100001.txt 10196812 807610 11.33
report-20140716-110001.txt 10239336 810665 11.34
report-20140716-120001.txt 10282892 814532 11.34
report-20140716-130001.txt 10329112 818136 11.34
report-20140716-140001.txt 10367268 821156 11.35
report-20140716-150001.txt 10407856 824425 11.35
report-20140716-160001.txt 10444860 827677 11.35
report-20140716-170001.txt 10491136 830783 11.37
report-20140716-180001.txt 10530964 834200 11.37
report-20140716-190001.txt 10580356 838466 11.37
report-20140716-200001.txt 10632816 842692 11.37
report-20140716-210002.txt 10662148 844872 11.38
report-20140716-220001.txt 10662148 845126 11.38
report-20140716-230001.txt 10662148 845269 11.37
report-20140717-000001.txt 10662148 845404 11.37

Which shows that the average size/memObject is increasing constantly!
That does not sound right and would indicate a memory leak
in the sense of memory not returned to pool when it is no longer needed/referenced
(but still accounted for in the pools)

You see my point now?

> Are you willing to run Squid with a tiny memory cache (e.g., 100MB) for
> a while? This would remove the natural memory cache growth as a variable...

That was one of my ideas - restarting the cluster of 10 servers with one only
configured for only 2GB and observing the difference, if it really "hits a ceiling".

We still need the memory cache, but we probably could do with less...

Still: this requires a lot of administrative overhead and may not get executed
on short notice. Also from the observation of the ever increasing KB/memObject behaviour I would
guess that it does not make a real difference...

Also as a reminder: we have not seen this "memory" behavior before with 3.3.
So as our main reason for switching to 3.4 was related to working "Vary:" support
I guess it is related to this (and this may be the reason why this "Vary" got disabled
at some point)

Another point here more on the reporting/debugging side:
Mgr:info seems to use some int32 for some of its reporting-calculations,
which results in things like this:
Memory usage for squid via mallinfo():
        Total space in arena: -2034044 KB
        Ordinary blocks: -2053314 KB 31255 blks
        Small blocks: 0 KB 0 blks
        Holding blocks: 39332 KB 10 blks
        Free Small blocks: 0 KB
        Free Ordinary blocks: 19270 KB
        Total in use: 19270 KB -1%
        Total free: 19270 KB -1%
        Total size: -1994712 KB
Memory accounted for:
        Total accounted: 1360231 KB -68%
        memPool accounted: 9748839 KB -489%
        memPool unaccounted: -11743551 KB -0%
        memPoolAlloc calls: 49519225161
        memPoolFree calls: 49637977911

See the negative numbers?

Maybe some of the accounting is only done with 32 bit and not 64 bit counters
and this might also be part of the reason why cleanup does not happen...

Ciao,
        Martin

One Idea I have is that for a cacheable result with "Vary:..." we essentially need to have 2 objects:
* one with the information about the Vary headers to use for looking up in cache
* one with the real data.
Maybe the first one contains still a copy of the full request and response, but the pointers get set to
NULL but the memory does not get freed.

Here from vm_objects an example that shows the effect:
KEY 32B3D9453D9CF34D09343ED9096B7E6C
        STORE_OK IN_MEMORY SWAPOUT_DONE PING_NONE
        CACHABLE,VALIDATED
        LV:1405287550 LU:1405287768 LM:-1 EX:1405387550
        0 locks, 0 clients, 2 refs
        Swap Dir 0, File 0X1FA8D9
        GET http://whatever/whatever/framework.css
        inmem_lo: 0
        inmem_hi: 224
        swapout: 224 bytes queued
KEY 070067B9BA7DB56CA827B6CE6DF55823
        STORE_OK IN_MEMORY SWAPOUT_DONE PING_DONE
        CACHABLE,DISPATCHED,VALIDATED
        LV:1405287549 LU:1405287768 LM:1400600135 EX:1405301949
        0 locks, 0 clients, 3 refs
        Swap Dir 0, File 0X1FA8DB
        GET http://whatever/whatever/framework.css
        vary_headers: accept-encoding, x-mibinstance="whatever"
        inmem_lo: 0
        inmem_hi: 20557
        swapout: 20557 bytes queued

Note that the "x-mibinstance" request-header gets added during the icap request-modification phase.

So my guess would be that for a specific message there would be 2 instances of the same body in memory.
And only one would be referenced.

And in the case above this would obviously mean 20K of memory "wasted".

I might play with that idea and create a core-dumps of squid after loading a very specific html page...

This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,
you may review at http://www.amdocs.com/email_disclaimer.asp
Received on Thu Jul 17 2014 - 08:49:35 MDT

This archive was generated by hypermail 2.2.0 : Fri Jul 18 2014 - 12:00:04 MDT