[squid-users] [patch]: apache combined log can be produced via 'pr' and 'awk' if referer log logging direct access referer as ''-"

From: Che Dong <chedong@dont-contact.us>
Date: Sun, 11 May 2003 04:21:36 +0800

Hi All:
Today I read recent 3 years' squid mailing list archieve on apache combined log format support issue: including 2.5 STABLE1 patch and squid2combined.pl etc...but I think most end users are administrators but not coder. For some sysadmin maight think combined log can be slove by use gnu textutils such as pr to merge these three log file.

Produce a combined like log as following:

1 enable referer in compile and config as following:
    emulate_httpd_log on
    referer_log /usr/local/squid/var/logs/referer.log

2 make combined log via 'pr' in merge mode and use awk adjust output field:
    %pr -mJt access.log referer.log | awk '{print $1" "$2" "$3" "$4" "$5" "$6" "$7" "$8" "$9" "$10" \x22"$14"\x22 \x22"$11"\x22"}'
    -m merge
    -J join line
    -t omit header and footer

The reason of not use useragent.log is it contains many without escaping user agents info and We can use "TCP_IMS_HIT:NONE" act as user agent for cache hit ratio statistic.

the output as following
...
192.168.0.10 - - [11/May/2003:01:13:21 +0800] "GET http://ant.chedong.com/images/jw_ec_logo_winner2002.gif HTTP/1.1" 304 206 "http://ant.chedong.com/projects.html" TCP_MISS:DIRECT
192.168.0.10 - - [11/May/2003:01:14:02 +0800] "GET http://ant.chedong.com/projects.html HTTP/1.1" 304 208 "http://ant.chedong.com/projects.html" TCP_IMS_HIT:NONE
192.168.0.10 - - [11/May/2003:01:14:02 +0800] "GET http://ant.chedong.com/images/jakarta-logo.gif HTTP/1.1" 304 207 "http://ant.chedong.com/projects.html" TCP_MISS:DIRECT
192.168.0.10 - - [11/May/2003:01:14:02 +0800] "GET http://ant.chedong.com/images/sdm_productivity_award.gif HTTP/1.1" 304 207 "http://ant.chedong.com/projects.html" TCP_MISS:DIRECT
192.168.0.10 - - [11/May/2003:01:14:02 +0800] "GET http://ant.chedong.com/images/jw_ec_logo_winner2002.gif HTTP/1.1" 304 206 "http://ant.chedong.com/projects.html" TCP_MISS:DIRECT
192.168.0.10 - - [11/May/2003:01:14:02 +0800] "GET http://ant.chedong.com/images/ant_logo_large.gif HTTP/1.1" 304 207 "http://ant.chedong.com/projects.html" TCP_MISS:DIRECT
192.168.0.10 - - [11/May/2003:01:14:03 +0800] "GET http://ant.chedong.com/projects.html HTTP/1.1" 304 208 "" TCP_IMS_HIT:NONE
192.168.0.10 - - [11/May/2003:01:14:03 +0800] "GET http://ant.chedong.com/images/jakarta-logo.gif HTTP/1.1" 304 207 "" TCP_MISS:DIRECT
...

PLEASE NOTICE the last few lines lost referer: referer log omitted direct access "-". and I checked lines of access.log and useragent.log is not equal to referer.log
%wc -l access.log useragent.log referer.log
     44 access.log
     44 useragent.log
     38 referer.log <== lost direct access
    126 total

I think if referer-log module logging direct access as "-" can correct above problem.

I checked the source code: client_side.c
find the httpHeaderGetStr may return null. so if browser access directly without referer the referer will not logging.

patch by added log "-" as default case:

@@ -980,11 +980,16 @@
 #if USE_USERAGENT_LOG
     if ((str = httpHeaderGetStr(req_hdr, HDR_USER_AGENT)))
        logUserAgent(fqdnFromAddr(http->conn->log_addr), str);
+ else
+ logUserAgent(fqdnFromAddr(http->conn->log_addr), "-");
 #endif
 #if USE_REFERER_LOG
     if ((str = httpHeaderGetStr(req_hdr, HDR_REFERER)))
        logReferer(fqdnFromAddr(http->conn->log_addr), str,
            http->log_uri);
+ else
+ logReferer(fqdnFromAddr(http->conn->log_addr), "-",
+ http->log_uri);
 #endif
 #if FORW_VIA_DB
     if (httpHeaderHas(req_hdr, HDR_X_FORWARDED_FOR)) {

recompiled ok, and lines of referer.log equals to access.log now.
wc -l access.log referer.log
      5 access.log
      5 referer.log
     10 total

If Dan Reif's combined log patch can't be patched into 2.5 release please think about this referer-log patch and add a tip in document on combined log produce.

Regards

Che, Dong
http://www.chedong.com
Received on Sat May 10 2003 - 14:22:03 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:16:34 MST