Re: [squid-users] Ideas for Squid statistics Web UI development from Geoge Machitidze on 2012-11-18 (squid-users)

From: Geoge Machitidze <giomac_at_gmail.com>
Date: Mon, 19 Nov 2012 06:50:02 +0400

Amos, Eliezer,

Thanks for response

I will add these notes to my project notebook :)

Also, I want to clarify: I'm not doing this alone, my friends are
partially (for now) involved and interested - there are experienced
coders, risk managers, dba, system administrators (I belong to), web
designers. They provide some help in organizing, bringing ideas,
HTML/CSS, db, mmajor testing, integration, providing logs. I'm sure I
can do this alone too and that's why I've started - I can rely on my
skills - they're enough, but they'll do the job too. I'm not planning to
make something huge from this - just will maintain, but if someone will
bring some new directions and huge changes later - no problem, I can
delegate it.

On 19.11.2012 5:34, Amos Jeffries wrote:
> On 19.11.2012 13:05, George Machitidze wrote:
>> Hello
>>
>> I've started development of open sourced Web UI for gathering stats
>> for Squid proxy server and need your help to clarify needs and
>> resources.
>>
>> Where it came from:
>> Enterprises require auditing, reporting, configuration
>> check/visibility and statistics. I can say that most of these things
>> are easy to implement and provide in different ways, except reporting
>> and stats. Additionally, there are some requirements in functionality
>> and nice interface not met by currently available solutions that I've
>> found. Also, state of maintenance, future development etc are very
>> unclear and Ineffective, but still acceptable or enough for _some_
>> installations. If you know something that can do all this stuff -
>> please let me know.
>> So, I've decided to write everything from the scratch, maybe will take
>> some public-licensed part from other projects.
>>
>
> You did not consider joining any existing FOSS project and providing
> the productivity boost to remove those lacks you noticed?
>
> The core problem with any FOSS project is its volunteer nature. People
> working towards something that will work fine for their needs and
> omitting minor details others need for the product to be portable
> between installations.
> I mention this as something you should look at seriously because
> 'from scratch' is a multi-year project with a long initial period
> where your own product is just another partially-baked piece of code.
> You can save yourself a lot of time (and marketing hassle) by
> improving something already written and promoting that.
>
Sure I do consider. After polishing the results and working code - no
problem to merge with others. I don't like separatism and I (personally)
support idea of centralizing all the small pieces around major project
(Squid itself in this case). Few architectural decisions available like:
1. SquidAnalyzer is a good project and maintained, yes, It is
incremental, analyzes default files via daily job, is not realtime, but
as mentioned - enough for most of the situations. For me it became
headache when I've started dealing with ca. 100 very active users
generating some logs (not very large btw) and when security analysts
told me "man, we want more, realtime, fancy gui etc, small footprint on
disk, no loads at nighttime for running jobs, details on every visit etc.
2. Mentioned mysql perl daemon - is a good thing, but looks like it's
not maintained and expanded well, but does the good thing if you want it

I think we can do something together, but before I must analyze the
requirements and architectural needs.

>
> I suspect may of the problems existing reporters have is also due to
> unreported Squid APIs limitations. But again we (Squid Project) need
> feedback, patches and developer assistance improving those in Squid so
> other projects can report the data efficiently.
>
>
>> Architecture:
>> Starting point is gathering stats, then we need to manipulate and
>> store it, then we can add some regular jobs (will avoid this) and then
>> we need to view this.
>>
>> Gathering data
>> Available sources:
>> 1. Logs, available via files or logging daemon (traffic, errors)
>> 2. Stats available via SNMP (status/counters/config)
>> 3. Cache Manager (status/counters/config)
>> 4. OS-level things (footprint, processes, disk, cpu etc)
>> [anything else?]
>
> (2) and (3) are *supposed* to present the same information in
> alternative machine and human readable formats. BUT .. uhm there are
> holes.
> I am interested in patches sent to squid-dev improving either (2) or
> (3) outputs (http://wiki.squid-cache.org/MergeProcedure).
>
> NP: The actually important errors are not logged to the daemon. They
> are logged to cache.log instead. You will need *2* forms of log
> processing to retrieve administrative error reports, one for
> access.log traffic issues and one for cache.log systemic issues.
I don't want to have something like "full informational super duper
control panel" on initial stage, I will start with logs - everything
else is much easier to do. For error logging there are hundreds of ways
to implement monitoring by monitoring cache.log - it's enough.

>
>>
>> This part will be done by local logging daemon, I won't use file
>> logging for known reasons.
>> BTW, good starting point is log_mysql_daemon by marcello, available in
>> GPL, written in perl. Effective enough to start and load any data to
>> DB - it's simple enough and took for me 10-15 minutes to analyze the
>> code, setup and configure.
>>
>> Data storage
>> File-based logging is very ineffective and has several huge
>> disadvantages:
>> - Ineffective use of disk resources
>> - Poor/no indexing
>> - Logrotation/DWH/archiving
>> - Not human readable, some parts need calculations anyway
>> - etc
>>
>> For optimized storing and then viewing of data It's actually required
>> to have DB. For first step I'll use MySQL, then will migrate the code
>> to support PgSQL (and maybe others too) through DB abstraction layer.
>>
>> We can store all of the access logs and also have some dynamically
>> updated counters, because periodic jobs are very intensive and require
>> time too.
>>
>> I don't want to put counter-updating code on the logging daemon, will
>> try to use DB-side for that as it's done in log_mysql_daemon.
>
> NP: this daemon is actually database agnostic. The early release was
> erroneously called 'mysql' because it was implemented on that
> database. Since 3.2 it is called log_db_daemon.
>
> To use PgSQL or any other database just alter the provided .sql
> template files and use pgsql in the squid.conf access_log DNS parameter.
>
> If there are any database-specific schema changes that would improve
> efficiency and reporting of this tool ... again I am interested in
> patches sent to squid-dev (http://wiki.squid-cache.org/MergeProcedure).
Sure, I will create a list of needs first and if any of them will need
changes in core we can add it to the wishlist/planned items, other needs
will be satisfied by the current functionality.
>
>>
>> If someone will need this data for monitoring purposes not available
>> via SNMP/OS through Nagios/Cacti/Zabbix/whatever - I see no problem to
>> do that too.
>
> AFAIK the needs in this area are centered around useful templates or
> plugins for polling the Squid OID with those tools. There are a lot of
> very useful OID data which can already be pulled out of Squid but
> nothing easily available in the FOSS area to do that display.
>
> Cacti has a few old templates available (if one is willing to hunt
> them down and fix a few bugs) for HIT ratio and overall traffic/disk
> usage but client info and error reporting is very noticeably absent.
> I'm not sure about the other tools.
Exactly, same with others, from my POV accessing this data from Squid
through SNMP/Cachemgr for this web utility is a weird thing, will think
about it later...
>
>>
>> Web UI
>> Technologies: PHP/CSS/JS/Ajax etc
>> PHP will select data from DB and generate pages accordingly.
>>
>> TODO:
>> 1. Collect information about UI requirements - what users want to see
>> and control
>> 2. Define all the counters, logging variables for daemon part required
>> for implementing first needs, according to P1
>> 3. Define DB-side counters, sources
>> 4. Check data types and lenght for DB for optimization
>> 5. Continuous improvement
>>
>> Any involvement: information about user needs, suggestions,
>> recommendations, coding, ideas are appreciated :)
>>
>> I chose GitHub for hosting the project, will write project docs and
>> plans there. Currently I am collecting a very detailed information on
>> user needs.
>>
>> Thanks
>>
>> Best regards,
>> George Machitidze
>
>
> Wonderful to hear about more progress in the administration sphere.
>
> I have an ongoing project by Francesco Chemolli (kinkie) to improve
> the cachemgr and SNMP information feeds. Would you be interested in
> collaboration on the Squid internal upgrades needed to support our
> three administration interfaces?
>
>
> The prime objectives of our feature project in no particular order are:
> * to upgrade the cachemgr reports output such that it can be used as
> an Open Web API for managing Squid via plugin Web UI.
> * create a HTML + XHR alternative to cachemgr.CGI.
> * to synchronize the cachemgr and SNMP reporting such that all data
> is equally available through either - as alternative API rather than
> supplementary.
I'm not so familiar with Squid internals/code and I'm not a coder - I am
administrator and have experience only with small and light projects,
but will be glad to stay informed about your progress and will update
you with information I will collect now.
>
> Amos Jeffries
> Treehouse Networks Ltd.

-- 
Best regards,
George Machitidze

Received on Mon Nov 19 2012 - 02:50:14 MST

This archive was generated by hypermail 2.2.0 : Mon Nov 19 2012 - 12:00:04 MST