Re: [PATCH] Limit log field width

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 22 Jan 2010 16:12:57 -0700

On 01/21/2010 05:19 AM, Amos Jeffries wrote:
> Alex Rousskov wrote:
>> Hello,
>>
>> The attached patch was motivated by user complaints that “vi”,
>> “cat”, and even some more sophisticated log analysis tools have trouble
>> handling long access.log lines. Those long log lines result from long
>> URLs that do occur in the wild.
>>
>> Squid places a 8192 character limit on the URL length but that limit
>> exceeds (a) some of the tool limits and (b) Squid's access.log buffer
>> limit (if some other fields are logged).
>>
>> My solution was to honor the .precision setting in logformat field
>> specifications. You can use it with %ru or any other text field.
>>
>> For example, the format code below limits logged URI size to the first
>> 1000 characters.
>>
>> logformat xsquid ... %rm %.1000ru %un ...
>>
>> Squid access log line buffer cannot exceed 8192 characters. If you want
>> to preserve fields logged after the URL, your logged URL width limit
>> should be smaller than 8192 to leave space for other fields on the log
>> line.
>>
>> There is no width limit by default.
>>
>> Here is a possible commit message:
>>
>> ---------------------------------
>> Support maximum field width for string access.log fields.
>>
>> Some standard command-line and some log processing tools have trouble
>> handling URLs or other logged fields exceeding 8KB in length. Moreover,
>> Squid violates its own log line format and truncates the entire log line
>> if, for example, the URL is 8KB long. By supporting .precision format
>> argument, we allow the administrator to specify logged URL size and
>> avoid these problems.
>>
>> Limiting logged field width has no effect on traffic on the wire.
>
> Um, not quite true now that we log over UDP. anyways...

Also not quite true if you look at the log via ssh :-)

>> TODO: The name comes from the printf(3) "precision" format part. It may
>> be a good idea to rename our "precision" into max_width or similar,
>> especially if we do not support floating point precision logging.
>
> " [width[.maximum]] " gets my vote. seems clear and concise.

The maximum width field does not require the width field. Should I
change to [width][.maximum] or [width_min].[width_max]? These are just
labels in the .pre -- there is no effect on the code.

>> TODO: Old code used chars to store user-configured field width and
>> precision. That does not work for URLs, headers, and other entries
>> longer than 256 characters. This patch changes the storage type to int.
>> The code should probably be polished further to remove unsigned->signed
>> conversions.
>> ---------------------
>>
>> Please review.
>>
>
> +1. If the TODO can be removed.

You are talking about the "unsigned->signed" TODO? I am happy to make
those changes, but they will make the patch much bigger, there is higher
risk of compilation warnings, and they are not really about the feature
in question. Should I still do it? Perhaps as a separate commit?

Thank you,

Alex.
Received on Fri Jan 22 2010 - 23:12:50 MST

This archive was generated by hypermail 2.2.0 : Sat Jan 23 2010 - 12:00:07 MST