Re: [PATCH] base-64 encoder upgrade from Alex Rousskov on 2011-04-28 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Thu, 28 Apr 2011 18:38:27 -0600

On 04/27/2011 11:01 PM, Amos Jeffries wrote:

> * also fixes a bug where if the input text or output buffer was too
> short the coder functions would crop a few bytes off the end of the result.

What do we do now in those cases? Sorry, I cannot tell for sure from the
patch (not enough context), but it seems like we may still crop a few
bytes off the end of the result (but possibly fewer bytes), no?

> * optimized to short-circuit on several more variations of empty input
> and nil result buffer.

Whether short-circuiting is an optimization or expense depends on how
often we get those special cases. I can speculate that zero result
buffer length, for example, is very rare so we should not check for it
as an optimization, unless the check is needed for other reasons.

> - buf = xstrdup(base64_decode(req->pub_auth));
> + int alen = base64_decode_len(req->pub_auth);
> + buf = (char*)xmalloc(alen);
> + base64_decode(buf, req->pub_auth, sizeof(req->pub_auth));

The size of the result buffer is not sizeof(req->pub_auth) but alen, right?

> + int blen = snprintf(buf, sizeof(buf), "%s:%s",
> + req->user_name ? req->user_name : "",
> + req->passwd);
>
> - str64 = base64_encode(buf);
> + int alen = base64_encode_len(blen);
> + char *str64 = static_cast<char*>(xmalloc(alen));
> + if (base64_encode_str(str64, buf, alen, blen) == 0)
> + return "";

Would be nice to have more meaningful names than a and b here (in alen
and blen). Perhaps decodedLen and encodedLen?

> + extern int base64_decode_len(const char *data);
> + extern int base64_encode_len(int len);

Not a big deal, but perhaps base64_decoded_len() and
base64_encoded_len() would be clearer?

And s/int len/int decodedLen/?

> + extern int base64_decode(char *result, const char *p, unsigned int result_size);

I find it rather confusing when the output buffer parameter is called
result and the buffer size is called result size: It feels like I am
being asked to provide the result of the function before calling the
function, but perhaps it is just me.

> + // Ensures a nul-terminated result. Will always return non-NULL.
...
> +old_base64_decode(const char *p)
> {
...
> if (!p)
> return NULL;
...

Please sync the docs with the code. FWIW, it feels like it would be
safer to just return a pointer to a buffer with nothing but "\0" there,
but I did not check how the callers are using the result.

Please check other non-NULL claims.

> + /// \return number of bytes filled in result.
> + extern int base64_decode(char *result, const char *p, unsigned int result_size);

Do we ever use the exact returned value? If not, perhaps it is better to
assert that the result fits and return nothing or result itself? A few
use cases I saw in the patch would produce wrong/weird results if the
result does not fit. Same for the encoding function.

> + * AUTHOR: Unknown

No objection, but I am pretty sure this will raise red flags during the
next Debian or similar thorough audit. If we do not know who the author
is, we cannot be sure about the license of that code (there cannot be a
valid license without the copyright owner). I know that you may not
consider "AUTHOR" equivalent to the "copyright owner" but others may
make that assumption. I would remove that line or both of them.

> +int
> +base64_decode_len(const char *data)
> +{
> + int i, j;
> +
> + if (!data || !*data)
> + return 0;
> +
> + j = 0;
> + for (i = strlen(data) - 1; i >= 0; i--) {

The "j" variable can be declared when initialized.
The "i" variable can be made local to the loop.
It would be nice to give "j" a more meaningful name.

> + for (i = strlen(data) - 1; i >= 0; i--) {
> + if (data[i] == '=')
> + j++;
> + if (data[i] != '=')
> + break;
> + }
> + return strlen(data) / 4 * 3 - j;

This computes strlen(data) twice, which is quite expensive.

> +old_base64_decode(const char *p)
> {
> static char result[BASE64_RESULT_SZ];
> - int j;
> - int c;
> - long val;
> + memset(result, '\0', sizeof(result));
> if (!p)
> return NULL;
> + base64_decode(result, p, sizeof(result));
> + result[BASE64_RESULT_SZ-1] = '\0';
> + return result;
> +}

The expensive memset() call can probably be removed if we are a little
bit more careful about result termination and/or if we trust
base64_decode(). And I hope we trust it!

> + if (j+4 <= result_size) {
> + // Speed optimization: plenty of space, avoid some per-byte checks.
> + result[j++] = (val >> 16) & 0xff; /* High 8 bits */
> + result[j++] = (val >> 8) & 0xff; /* Mid 8 bits */
> + result[j++] = val & 0xff; /* Low 8 bits */
> + } else {
> + // part-quantum goes a bit slower with per-byte checks
> + result[j++] = (val >> 16) & 0xff; /* High 8 bits */
> + if (j == result_size)
> + return j;
> + result[j++] = (val >> 8) & 0xff; /* Mid 8 bits */
> + if (j == result_size)
> + return j;
> + result[j++] = val & 0xff; /* Low 8 bits */
> + }
> + if (j == result_size)
> + return j;

Again, if the callers do not really care, then return or assert as soon
as you see that the result is not going to fit. No need to carefully
produce a malformed result in that case.

> + int alen = base64_encode_len(blen);

alen can be declared const here.

> + extern char *old_base64_decode(const char *coded);
> + extern const char *old_base64_encode(const char *decoded);

If we do not take addresses of these functions, consider removing the
"old_" prefix. This will minimize the non-changes caused by the patch.

> + /// Base-64 encode a string into a given buffer.
> + /// Will not terminate the resulting string.
> + /// \return the number of bytes filled in result.
> + extern int base64_encode(char *result, const char *data, int result_size, int data_size);
> +
> + /// Base-64 encode a string into a given buffer.
> + /// Will terminate the resulting string.
> + /// \return the number of bytes filled in result. Including the terminator.
> + extern int base64_encode_str(char *result, const char *data, int result_size, int data_size);

Consider s/base64_encode_str/base64_encode_and_terminate/

> + /// Calculate the buffer size required to hold the encoded form of a string of length 'len'.
> + extern int base64_encode_len(int len);

Please document whether base64_encode_len() is suitable for
base64_encode_str() or just for base64_encode().

> + // Old decoder. Now a wrapper for the new.
> + // Old decoder. Now a wrapper for the new.

One of them is an encoder, but I would actually remove both of those
lines because after your commit, there will be no "old en/decoder" left.

Thank you,

Alex.
Received on Fri Apr 29 2011 - 00:38:58 MDT

This archive was generated by hypermail 2.2.0 : Fri Apr 29 2011 - 12:00:07 MDT