Re: [PATCH] Bug 1961 pt2: store URI userinfo (aka login) in class URL

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 12 May 2014 18:24:11 -0600

On 05/09/2014 01:31 PM, Amos Jeffries wrote:

> + const SBuf &userInfo(void) const {return userInfo_;}

Please avoid "(void)" in new code and use "()" for consistency with most
C++ code.

> + if (checklist->request->url.userInfo().isEmpty()) {
> + debugs(28, 5, "URL has no user-info details. cannot match");
> + return 0; // nothing can match
> + }

RFC 3986 BNF implies that empty "userinfo" is valid:

> authority = [ userinfo "@" ] host [ ":" port ]
> userinfo = *( unreserved / pct-encoded / sub-delims / ":" )

AFAICT, Squid does not include the "@" character in userinfo, right? If
the old code could not fully handle a URL with empty but present
userinfo (e.g., could not match it using a urllogin ACL that searches
for an empty userinfo), then this is not a problem, although adding an
XXX comment would be nice.

> + SBuf::size_type len = checklist->request->url.userInfo().copy(str, sizeof(str)-1);
> + str[len] = '\0';
> + rfc1738_unescape(str);
> + int result = data->match(str);

Please make "len" and "result" constant if possible.

> + /* If there was a username part.
> + * Ignore 0-length usernames, retain what we have already
> + */
> + if (colonPos != 0) {

The code is too smart even though it is correct. I suggest adding an
explicit npos condition, even though it is not technically needed:

  // If there was a non-empty username part, retain it.
  if (colonPos == SBuf::npos || colonPos > 0) {
  ...
  }

> + // URI userinfo/login is deprecated by IETF so we dont care if it crops accidentally
> + len = request->url.userInfo().copy(loginbuf, min(len, sizeof(loginbuf)-2));

I do not think that comment is a valid statement. With all due respect
for IETF, official deprecation is not enough to make us not care about
mishandling real-world traffic. If Squid can crop the deprecated
userinfo when forwarding or logging it, then it is a problem we should
care about. However, you do not have to fix that problem if it was
already present in the code. Documenting it with XXX is fine, especially
if it is difficult to fix!

The above changes can be done during commit IMO. I am unable to fully
audit the parsing changes, but if you are happy with them, that is
enough for me.

Thank you,

Alex.

> On 28/04/2014 2:39 p.m., Amos Jeffries wrote:
>> On 28/04/2014 3:51 a.m., Kinkie wrote:
>>> On Sun, Apr 27, 2014 at 4:52 PM, Amos Jeffries wrote:
>>>> This continues the bug 1961 redesign of URL handling.
>>>
>>> Hi,
>>>
>>> + SBuf tmp = checklist->request->url.userInfo();
>>> + // NOTE: unescape works in-place and may decrease the c-string
>>> length, never increase.
>>> + rfc1738_unescape(const_cast<char*>(tmp.c_str()));
>>>
>>> is troublesome: you are most likely trampling the userinfo data as
>>> rfc1738_unescape is acting on a shared MemBlob (shared by tmp and
>>> userInfo at least) and c_str() doesn't guarantee a COW. The best way
>>> to do this is by defining a SBuf rfc1738_unescape (const SBuf &) which
>>> copies the param if needed.
>>>
>>>
>>> + if (colonPos != 0) {
>>>
>>> shouldn't this be if (colonPos != SBuf::npos) ?
>>
>> No. The edge cases of user-info is that the colon may be the first
>> character to signal missing username, and if ':' is not found at all the
>> entire string is username.
>> So if colonPos==0 then we dont extract one, but all other values
>> regardless including npos we do.
>>
>> The edges case for password is that if colon is missing entirely we have
>> none, all other cases have a password. So the password extraction only
>> happens on colonPos!=npos.
>>
>> Special case is that 0-length username is invalid (unlike absent
>> password). The colonPos==0 case also helps omit replacing any existing
>> username with an empty string - whereas the empty password case *does*
>> replace the pasword with empty string.
>>
>
> To clarify the cases are:
>
> case u:p@...
> - gives new username 'u' and new password 'p'
>
> case u:@...
> - gives new username 'u' and clears previous password (sets new 0-length)
>
> case u@...
> - gives new username 'u' and retains previous password
>
> case :p@...
> - retains previous username and gives new password 'p'
>
> case :@...
> - retains previous username and clears previous password (sets new
> 0-length)
>
> case @...
> - invalid. retain previous username and password.
>
>>
>>>
>>> + memcpy(user, userName.rawContent(), userName.length());
>>> + user[userName.length()] = '\0';
>>>
>>> you could use
>>> SBuf::size_type upTo = min(userName.length(), sizeof(user));
>>> userName.copy(user, upto);
>>> user[upto]='\0';
>>>
>>> Same for
>>> + memcpy(loginbuf,
>>> request->url.userInfo().rawContent(), len);
>>>
>>
>> Thank you for tha pointer. Adding a slight variant, copy() does min()
>> internally and returs the size used so we can drop another line.
>>
>>>
>>>
>>> + static char result[MAX_URL*2]; // should be big enough
>>> for a single URI segment
>>> That conditional is worrisome. Is there any risk of the result not
>>> being big enough?
>>>
>>
>> Oops, small bug there should have been " < sizeof(result)-1" instead of
>> "> 0".
>>
>
> I have a better answer for this now...
>
> The URL parser used a MAX_URL length array to parse the originally
> received user-info/login field so the raw data cannot exceed MAX_URL.
> The *2 is because base64 may double the length on us.
>
> Patch attached that covers all the above.
>
> Amos
>
Received on Tue May 13 2014 - 00:24:23 MDT

This archive was generated by hypermail 2.2.0 : Tue May 13 2014 - 12:00:13 MDT