RE: [squid-users] SNMP MIB updates? from Gregori Parker on 2009-04-15 (squid-users)

From: Gregori Parker <Gregori.Parker_at_theplatform.com>
Date: Wed, 15 Apr 2009 10:17:22 -0700

Thanks for the reply Amos, I agree with your statements and am glad that this might get placed on the someday-roadmap for Squid. I may not have permission to send to squid-dev, so please send it on if it doesn't find its way.

I have been working off of the squid/share/mib.txt MIB that came with the 3.0-STABLE13 build I'm currently running on most systems.

cachePeerTable should be constructed using standard integer index, initialized on first run and adjusted as configuration changes and gets reloaded, with one of the OIDs returning the IP as a label. So, I build and configure squid and run it for the first time with 3 cache peers configured, they get indexed as 1,2,3 on the table...I reconfigure squid and remove all 3 peers (peers == parents and/or siblings, something that needs to be decided as well), replacing them with new ones - at this point you can either rebuild the table using the new peers or append them as 4,5,6 and blank out 1,2,3. Cisco switches build their ifIndex table using the latter method, which works well when linecards are added or removed (granted, switchports in general are a bit more static than an application level configuration).

Also, I have tried the -Cc options when snmpwalk-ing and one big problem I run into is that I have two parents configured with the same IP (different hostname)...this causes snmpwalk to get stuck endlessly grabbing the same OID. Something like Cacti wont even begin to handle this table gracefully, so it's essentially unusable.

cacheHtcp* is great...but that would just make me want a cacheCarp as well. Perhaps you could just abstract whatever is being used under something like cacheSiblingProto?

In regards to adding a cacheHttpMisses (and pending, and negative) - I noticed that the cacheIpCache table has an OIDs for misses, pending hits and negative hits, so why cant the cacheProtoAggregateStats have these as well for HTTP? I've ran into cacti templates that get this elusive metric by subtracting cacheHttpHits from cacheProtoClientHttpRequests.

In regards to cacheMemUsage, I'm just interested in seeing a cacheMemCacheUsage added. This would be especially useful for diskless caches...there's a cacheSysVMsize that tells me how much total memory can be used for caching, but nothing that tells me how much is actually used. Seeing these metrics graphed over time would help determine optimal high/low swap values. MemUsage is currently an integer OID counting in KB - that should be changed to a Counter32 and represented in bits.

In regards to bits vs KB, everything everywhere is represented in bits, except for Squid...which is no big deal, except that it requires Cacti users to build in some extra math (result = value * 1024 * 8). This is very low hanging fruit IMO.

Not sure what to say about the CPU usage metric, perhaps it's not refreshing often enough (if it's meant to be a gauge). Perhaps it could be indexed into time-averages similar to the service timers, i.e. 1 min, 5 min and 60 min averages. Shouldn't be too difficult to do.

Regarding the differences between the cacheProtoAggregateStats and cacheIpCache tables. I can share graphs with you offline, but the curves graph out to be exactly the same, the numbers are just way off. For example I graph HTTP Requests per second using data from the cacheProtoAggregateStats table and I see a current of 350 rps (and about 310 hits per second), graphing IP Requests per second using data from the cachIpCache table I see a current of 1190 rps (and about 1150 hits per second). Notice here that the differences match up perfectly, and the deltas are always the same, the IP table just counts a LOT more hits and requests over time than the HTTP/ProtoAggStats table does. I cant account for the difference, so a detailed definition would help me a lot. I'm going to try turning off ICP/HTCP and seeing if there is any difference. If you want to see my graphs for a better idea of what I'm saying, I can attach them and send off-list.

Thanks guys,
Gregori

-----Original Message-----
From: Amos Jeffries [mailto:squid3_at_treenet.co.nz]
Sent: Wednesday, April 15, 2009 5:23 AM
To: Gregori Parker
Cc: squid-users_at_squid-cache.org; Squid Developers
Subject: Re: [squid-users] SNMP MIB updates?

Gregori Parker wrote:
> I was creating a fresh batch of cacti graph templates for Squid the other day (focused on reverse proxy setups, I will release them soon), and while crawling the Squid MIB I noticed that HTCP metrics don't register anywhere. Furthermore, the entire MIB seems to be in need of updating - here's a list of things I would like to understand or see updated at some point...
>

Excellent to see someone working on that update and the squid SNMP stuff
too. Thank you.

In answer to your points below, please retain followup to squid-dev
mailing list (cc'd) about any further on these.

Firstly which of the _3_ Squid MIB are you trying to get updated?
Squid-2.x, 3.0, or 3.x MIB?

> * cachePeerTable should be re-created so that it doesnt index by ip address (results in OID not increasing error when walking!)

While we do see this as a minor issue in need of cleanup one day its not
a major problem (the -Cc options of snmpwalk is created for such) but
has major alterations needed to fix it.
If you want to spend the time please discuss ideas on how it can be
solved with us first. There have been many discussions and attempts in
the past which can be leveraged to reduce dead-end work.

> * update cacheIcp* to register HTCP now that it is built in by default

Good idea, but I would rather see a cacheHtcp* added instead of
cacheIcp* extended with a new protocol.
If it does make more sense to details them together then a better name
than cacheIcp needs to be chosen for the joint tables.

> * add a cacheHttpMisses (and pending, and negative) to cacheProtoAggregateStats

okay, details on what you are thinking though please.

> * more detailed memory counters - the current cacheMemUsage doesnt seem to measure how much memory is being used for caching (in my diskless cache setups, the counter flatlines around 600MB when I know there is much more than that being used)

Thing to look at here is SNMP data type the counter is being forced
into. We hit a similar issue just a short while ago that turned out to
be a too-small field. I don't know of SNMP being updated for sizes since
the 64-bit stuff went default.

Otherwise might be explained by; not all memory is accounted for by
Squid MemPools, certain objects and such use stack, or unaccounted heap
space. These are all design choices within the code itself, not an SNMP
solvable issue.

> * cacheCpuUsage is constant at 8% across a variety of squid servers at all times - I can see that this doesnt match up with what I see locally via top or in my normal unix cpu graphs.

Does sound like trouble. At least it needs to be investigated and any
results documented about whats actually going on.

> * throughput should be measured in bits instead of kilobytes throughout the MIB

Ah, nice, but AFAIK the output reflects the level of details kept in
counters. An upgrade of that I agree is needed. Just be careful not to
get into too much work there.

>
> Btw, I've been trying to understand the differences between the cacheProtoAggregateStats and cacheIpCache tables - I get very different numbers in terms of requests, hits, etc and I cant account for it.
>

Anyone have info on this?

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE6 or 3.0.STABLE14
   Current Beta Squid 3.1.0.7

Received on Wed Apr 15 2009 - 17:17:31 MDT

This archive was generated by hypermail 2.2.0 : Wed Apr 15 2009 - 12:00:02 MDT