Re: [PATCH] base-64 encoder upgrade from Amos Jeffries on 2011-04-29 (squid-dev)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 30 Apr 2011 02:18:46 +1200

On 29/04/11 12:38, Alex Rousskov wrote:
> On 04/27/2011 11:01 PM, Amos Jeffries wrote:
>
>> * also fixes a bug where if the input text or output buffer was too
>> short the coder functions would crop a few bytes off the end of the result.
>
> What do we do now in those cases? Sorry, I cannot tell for sure from the
> patch (not enough context), but it seems like we may still crop a few
> bytes off the end of the result (but possibly fewer bytes), no?

It will truncate the result at the last full "hunk".

Right now we have a static buffer which is a multiple of base64 "hunks".
So guaranteed not to have unusual output length, though if not big
enough it may truncate a whole count of "hunks".

It is not a problem now with static buffers, but becomes obvious during
debugging when the caller could give any size of buffer space.

Markus may be able to fill in better on this but I believe it was hit
when debugging Kerberos with its dynamic length tokens.

>
>> * optimized to short-circuit on several more variations of empty input
>> and nil result buffer.
>
> Whether short-circuiting is an optimization or expense depends on how
> often we get those special cases. I can speculate that zero result
> buffer length, for example, is very rare so we should not check for it
> as an optimization, unless the check is needed for other reasons.
>

Okay.
The result length will also avoid buffer overwrites when we start using
this for zero-copy header coding.
The other NULL and input-'\0' byte checks are paranoia.

>> + extern int base64_decode(char *result, const char *p, unsigned int result_size);
>
> I find it rather confusing when the output buffer parameter is called
> result and the buffer size is called result size: It feels like I am
> being asked to provide the result of the function before calling the
> function, but perhaps it is just me.

Yeah. I'm making it result_max_size to be a bit clearer.

Are you aware of any convention about src,dst or dst,src pairing and
which goes first?
All I can think of is the snprintf(out, out_max, ...) example.

I'm tempted to alter the API to be (outBuf, outBufsz, inBuf, inBufSz)
before we get too many more callers for the new functions.

>
>> + // Ensures a nul-terminated result. Will always return non-NULL.
> ...
>> +old_base64_decode(const char *p)
>> {
> ...
>> if (!p)
>> return NULL;
> ...
>
> Please sync the docs with the code. FWIW, it feels like it would be
> safer to just return a pointer to a buffer with nothing but "\0" there,
> but I did not check how the callers are using the result.
>
> Please check other non-NULL claims.

Oops cut-n-paste bug. Fixed. The other claims are checked and correct.
In going through the callers of old decode I've upgraded them all to
the new API and killed this wrapper entirely now.

>
>> + /// \return number of bytes filled in result.
>> + extern int base64_decode(char *result, const char *p, unsigned int result_size);
>
> Do we ever use the exact returned value? If not, perhaps it is better to
> assert that the result fits and return nothing or result itself? A few
> use cases I saw in the patch would produce wrong/weird results if the
> result does not fit. Same for the encoding function.

We do. NTLM helpers use it to verify packet length matches the packet
apparently received.
This is a now visible with the removal of old_base64_decode().

>> + * AUTHOR: Unknown
>
> No objection, but I am pretty sure this will raise red flags during the
> next Debian or similar thorough audit. If we do not know who the author
> is, we cannot be sure about the license of that code (there cannot be a
> valid license without the copyright owner). I know that you may not
> consider "AUTHOR" equivalent to the "copyright owner" but others may
> make that assumption. I would remove that line or both of them.
>

Hmm, Okay. Dropped. The mention of GNU sources covers the logic origins.
I'm just not sure who wrote "with adjustments" on it.

>
>> +int
>> +base64_decode_len(const char *data)
>> +{
>> + int i, j;
>> +
>> + if (!data || !*data)
>> + return 0;
>> +
>> + j = 0;
>> + for (i = strlen(data) - 1; i>= 0; i--) {
>
> The "j" variable can be declared when initialized.
> The "i" variable can be made local to the loop.

Can it? this being one of the rare old-C code sections I took it as
having the declare before any code style all the rest does. Trying the
change now.

>> + if (j+4<= result_size) {
>> + // Speed optimization: plenty of space, avoid some per-byte checks.
>> + result[j++] = (val>> 16)& 0xff; /* High 8 bits */
>> + result[j++] = (val>> 8)& 0xff; /* Mid 8 bits */
>> + result[j++] = val& 0xff; /* Low 8 bits */
>> + } else {
>> + // part-quantum goes a bit slower with per-byte checks
>> + result[j++] = (val>> 16)& 0xff; /* High 8 bits */
>> + if (j == result_size)
>> + return j;
>> + result[j++] = (val>> 8)& 0xff; /* Mid 8 bits */
>> + if (j == result_size)
>> + return j;
>> + result[j++] = val& 0xff; /* Low 8 bits */
>> + }
>> + if (j == result_size)
>> + return j;
>
> Again, if the callers do not really care, then return or assert as soon
> as you see that the result is not going to fit. No need to carefully
> produce a malformed result in that case.
>

They do care. And the result is not completely needless, the helper
debugging uses it to diagnose the exact point of failure.

>
>> + int alen = base64_encode_len(blen);
>
> alen can be declared const here.
>

I assume you mean in tools.cahcemgr,cc? after fixing the issues above we
no longer have any "int alen" in the sources, and there are many
base64_decode_len().

Which reminds me, you notice how I've made a point of mentioning the
altered source file where each audit comment related to? Please make a
habit to doing that too. It's much easier to find and fix the problems.

>
>> + /// Base-64 encode a string into a given buffer.
>> + /// Will not terminate the resulting string.
>> + /// \return the number of bytes filled in result.
>> + extern int base64_encode(char *result, const char *data, int result_size, int data_size);
>> +
>> + /// Base-64 encode a string into a given buffer.
>> + /// Will terminate the resulting string.
>> + /// \return the number of bytes filled in result. Including the terminator.
>> + extern int base64_encode_str(char *result, const char *data, int result_size, int data_size);
>
> Consider s/base64_encode_str/base64_encode_and_terminate/

I considered and picked _str to represent the nil-terminated C-string
nature of the output.

>
>> + /// Calculate the buffer size required to hold the encoded form of a string of length 'len'.
>> + extern int base64_encode_len(int len);
>
> Please document whether base64_encode_len() is suitable for
> base64_encode_str() or just for base64_encode().

It is big enough to hold both terminated and non-terminated forms.

The magic numbers still need to be tracked down and documented why they
exist.

>
>> + // Old decoder. Now a wrapper for the new.
>> + // Old decoder. Now a wrapper for the new.
>
> One of them is an encoder, but I would actually remove both of those
> lines because after your commit, there will be no "old en/decoder" left.
>

As submitted earlier the "old" API of receiving back a status array was
still present with the "old_" prefix as documented.

New patch attached to fix all these issues.
  * Old decoder is now dead and its callers converted.
  * There is still two old API encoders (just no longer easily
identified as such per your patch redux request).
  * The *_bin() encoder is de-duped as well now.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.12
   Beta testers wanted for 3.2.0.7 and 3.1.12.1

text/x-patch attachment: base64_dedup_mk2.patch

Received on Fri Apr 29 2011 - 14:18:56 MDT

This archive was generated by hypermail 2.2.0 : Fri Apr 29 2011 - 12:00:07 MDT