Log web page TITLE to access.log

From: bsl <bsl_sw_at_tut.by>
Date: Wed, 28 Dec 2011 13:02:14 +0300

Hello.

I want to add page title to squid log for view the user's surfing history.
Thank's to Henrik Nordstrom and his reply at 2006 about this :-)
http://www2.tr.squid-cache.org/mail-archive/squid-dev/200603/0009.html

Following his idea I parse web page content in function sendMoreData of
client side routines (client_side_reply.cc)
I found the page title and log it to access.log using new logformat
token (for example "<tp").

But I have the problem:
The page title is not always logged.
For example I visit www.godaddy.com - I see in log his page title.
I visit www.nasa.gov - I don't see title in log :(
What I was wrong? Maybe not all pages are given to the client through
the client_side_reply::sendMoreData function?

Thank for any idea.

I made the following changes: (squid 3.1.10, freebsd 8.2 stable, amd64)

AccessLogEntry.h:
+ added char *title; to AccessLogEntry class definition (public section,
line 54);

access_log.cc:
+ added LFT_REPLY_PAGE_TITLE to end of enum logformat_bcode_t definition
+ added element "<tp" for LFT_REPLY_PAGE_TITLE to struct
logformat_token_table
+ added new case to function accessLogCustom():
      case LFT_REPLY_PAGE_TITLE:
        if (al->title) {
           out = al->title;
        quote = 1;
        dofree = 1;
        }
        break;

client_side_reply.cc:
   In function sendMoreData() line 2078 I added block for parsing buffer:
   if (http->al.title == NULL) {
     // search TITLE tag
     const char *tag1 = "<title>";
     const char *tag2 = "</title>";
     char *ans1 = strstr(buf, (char *)tag1, result.length-7); // search
open tag in buf (length in result.length minus length of tag)
     if (ans1) {
       char *ans2 = strstr(ans1+7, (char *)tag2, result.length -
(ans1-buf)-7); // search close tag in rest of buffer
       if (ans2) {
          int titlelen = ans2 - ans1 - 7; // title length
          http->al.title = (char *)xcalloc(titlelen + 1,1);
          xstrncpy(http->al.title, &ans1[7], titlelen);
       }
     }
   }

   Realisation of strstr function:
   char * strstr (char *haystack, char *needle, int strlen)
   {
     char *start;
     int tmplen = 0;
     for (start=haystack; tmplen<strlen; start++,tmplen++) {
       char *p = needle;
       char *q = start;
       while ( *p != '\0' && *p == tolower(*q) ) {
         p++;
         q++;
       }
       if ( *p == '\0' )
         return start; // reached end of needle without mismatch
     }
     return NULL;
     }

---
Regards,
Sergey
Received on Wed Jan 04 2012 - 19:14:09 MST

This archive was generated by hypermail 2.2.0 : Thu Jan 05 2012 - 12:00:07 MST