Re: [MERGE] Rework urlAbsolute to be a little more streamlined.

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 2 Sep 2008 13:44:52 +1200 (NZST)

> On Mon, 2008-09-01 at 12:42 +1000, Benno Rice wrote:
>> Resubmit this patch, including changes based on comments by various
>> people.
>>
>> - Mention RFC text in relation to changing the default behaviour in
>> relation
>> to unknown HTTP methods.
>> - Use safe_free instead of xfree.
>> - Rework urlAbsolute to use snprintf in a slightly better way. Snprintf
>> is now
>> used to construct the initial portion of the url and the rest is added
>> on
>> using POSIX string routines.
>
>
>> + const char *url, *absUrl;
>> +
>> + if ((url = rep->header.getStr(hdr)) != NULL) {
>
> The above is better written as
>
> if (const char *url = rep->header.getStr(hdr)) {
>
> It would also be better to name this URL "hdrUrl". You already have
> "reqUrl" and "absUrl" and it is difficult to keep track.
>
>> + const char *url, *absUrl;
> ...
>> + if (absUrl != NULL) {
>> + url = absUrl;
>> + }
>> + if (absUrl != NULL) { // if the URL was relative, it is by nature
>> the same host
>> + purgeEntriesByUrl(url);
>> + } else if (sameUrlHosts(reqUrl, url)) { // prevent purging DoS, per
>> RFC 2616 13.10, second last paragraph
>> + purgeEntriesByUrl(url);
>> + }
>> + if (absUrl != NULL) {
>> + safe_free(absUrl);
>> + }
>
> The above looks strange but since urlAbsolute is still not documented, I
> may be misinterpreting this code. I would expect something like the code
> below, which uses cleanup names:
>
> if (const char *absUrl = urlRelativeToAbsolute(req, hdrUrl)) {
> // hdrUrl was relative so absUrl points to the request host
> purgeEntriesByUrl(absUrl);
> safe_free(absUrl);
> } else
> if (sameUrlHosts(reqUrl, url)) { // prevent purging DoS; RFC 2616 13.10
> purgeEntriesByUrl(url);
> }

Thanks for picking that up Alex.

Benno the safe_free() does the NULL case handling itself, sorry for not
being clear.

>
> If you want to make it even more clear, split urlAbsolute in two:
> urlIsRelative and urlMakeAbsolute, arriving at something like
>
> if (isRelative(hdrUrl)) {
> const char *absUrl = urlMakeAbsolute(req, hdrUrl);
> // hdrUrl was relative so absUrl points to the request host
> purgeEntriesByUrl(absUrl);
> safe_free(absUrl);
> } else
> if (sameUrlHosts(reqUrl, url)) { // prevent purging DoS: RFC 2616 13.10
> purgeEntriesByUrl(url);
> }
>
> The above are polishing comments.
>
> I would still insist that urlAbsolute() is documented as its
> complicated, multi-purpose interface is impossible to guess by its name.
> I think it returns NULL when the URL we feed is already absolute, but I
> am not sure. If my understanding if the intent is correct, then
> splitting urlAbsolute in two functions as shown above would be
> preferred.
>
>
>> + if (strchr(relUrl, ':') != NULL) {
>> + return (NULL);
>> + }
>
> According to RFC 2396, Section 3.3 "Path Component", a URL path may
> include a colon (':') character. This would make the above check wrong.
> Did I misread the RFC? Are colons banned from URL paths?
>
>
> I found this call in urlCanonical():
>
>> if (request->protocol == PROTO_URN) {
>> snprintf(urlbuf, MAX_URL, "urn:%s", request->urlpath.buf());
>> } else {
>> /// \todo AYJ: this could use "if..else and method == METHOD_CONNECT"
>> easier.
>> switch (request->method.id()) {
>>
>> case METHOD_CONNECT:
>> snprintf(urlbuf, MAX_URL, "%s:%d", request->GetHost(),
>> request->port);
>> break;
>>
>> default:
>> portbuf[0] = '\0';
>>
>> if (request->port != urlDefaultPort(request->protocol))
>> snprintf(portbuf, 32, ":%d", request->port);
>>
>> snprintf(urlbuf, MAX_URL, "%s://%s%s%s%s%s",
>> ProtocolStr[request->protocol],
>> request->login,
>> *request->login ? "@" : null_string,
>> request->GetHost(),
>> portbuf,
>> request->urlpath.buf());
>>
>
> this code in urlCanonicalClean():
>
>> if (request->protocol == PROTO_URN) {
>> snprintf(buf, MAX_URL, "urn:%s", request->urlpath.buf());
>> } else {
>> /// \todo AYJ: this could use "if..else and method == METHOD_CONNECT"
>> easier.
>> switch (request->method.id()) {
>>
>> case METHOD_CONNECT:
>> snprintf(buf, MAX_URL, "%s:%d",
>> request->GetHost(),
>> request->port);
>> break;
>>
>> default:
>> portbuf[0] = '\0';
>>
>> if (request->port != urlDefaultPort(request->protocol))
>> snprintf(portbuf, 32, ":%d", request->port);
>>
>> loginbuf[0] = '\0';
>>
>> if ((int) strlen(request->login) > 0) {
>> strcpy(loginbuf, request->login);
>>
>> if ((t = strchr(loginbuf, ':')))
>> *t = '\0';
>>
>> strcat(loginbuf, "@");
>> }
>>
>> snprintf(buf, MAX_URL, "%s://%s%s%s%s",
>> ProtocolStr[request->protocol],
>> loginbuf,
>> request->GetHost(),
>> portbuf,
>> request->urlpath.buf());
>
> and now there is this code in urlAbsolute (which handles the CONNECT
> case earlier so the code below is guaranteed not to get a CONNECT
> method):
>
>> + if (req->protocol == PROTO_URN) {
>> + snprintf(urlbuf, MAX_URL, "urn:%s", req->urlpath.buf());
>> + } else {
>> + if (req->port != urlDefaultPort(req->protocol)) {
>> + urllen = snprintf(urlbuf, MAX_URL, "%s://%s%s%s:%d",
>> + ProtocolStr[req->protocol],
>> + req->login,
>> + *req->login ? "@" : null_string,
>> + req->GetHost(),
>> + req->port
>> + );
>> + } else {
>> + urllen = snprintf(urlbuf, MAX_URL, "%s://%s%s%s",
>> + ProtocolStr[req->protocol],
>> + req->login,
>> + *req->login ? "@" : null_string,
>> + req->GetHost()
>> + );
>> + }
>
>
> There got to be some way to remove this code duplication (and related
> inconsistencies)!
>

That duplication and a lot more URL mess is all covered by Bug 1961.
http://www.squid-cache.org/bugs/show_bug.cgi?id=1961

Since the original request from Benno was for Vary invalidation. I think
we can sign off on that merge when the last memory tweaks of urlAbsolute
are there.

But ignore the duplication until URL cleanup is done, which will be easier
after the string buffer Kinkie is working on now goes in.

>
> I hesitate voting against because I do not know whether that would
> affect future re-submissions. I do not want to block them. Does anybody
> know whether a negative vote is sticky and needs to be "cleared" by the
> voter?

Good question.

Amos
Received on Tue Sep 02 2008 - 01:44:55 MDT

This archive was generated by hypermail 2.2.0 : Tue Sep 02 2008 - 12:00:02 MDT