Re: [PATCH] Helper callback params from Amos Jeffries on 2012-07-22 (squid-dev)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Mon, 23 Jul 2012 09:30:30 +1200

On 6/07/2012 4:23 a.m., Alex Rousskov wrote:
> On 07/05/2012 12:31 AM, Amos Jeffries wrote:
>> This patch updates the helper reponse callback API from using char*
>> buffer to a HelperReply object.
>>
>> * the helper I/O read handler is updated to parse the result code off
>> the start of the helper response as is currently done for channel-ID.
>> The callback handlers are altered to use the HelperReply::status instead
>> of parsing it off themselves individually.
>>
>> * the remaining I/O read buffer is stored in a MemBuf and callbacks are
>> updated to use it via the method other().
>>
>> * the responding helper-server is stored into the HelperReply object and
>> stateful helper callbacks are combined into the same API as stateless.
>> The callback handlers are updated to use HelperReply::lastserver instead
>> of function parameter.
>>
>> After this patch the helper response format is: [channel-ID] [result]
>> [blob] <terminator>
>>
>> The behavour changes expected from this is that all helpers are now able
>> to send OK/ERR/BH states. Although the handlers for some helpers will
>> deal with the new states as unknown response. None of the bundled
>> helpers have been altered to make use of this changed potential.
>>
>> TODO:
>> * implement key=value parser for the blob area of the format, and update
>> handlers to use the HelperReply API to retrieve them.
>> + HelperReply(); // not defined
> Do not declare it if you do not need it. However, I think it would be
> best to actually define it and then move parsing from a constructor to a
> method so that helper-specific reply code can be called during parsing
> (via virtual methods).

Made private.

>> + HelperReply(const HelperReply &); // not defined
> Either
>
> * Make private so that its usage is caught at compile time rather than
> at linking time. Please also add an undefined assignment operator.

Done.

There is something strange about the redirect helper callback wrapper in
client_side.cc.
The compiler implicitly converts from pass-by-reference to a copy
constructor instance if this is left undefined/default.

>> +class HelperReply
>> +{
>> +public:
> ...
>> + CBDATA_CLASS2(HelperReply);
>> +};
> Why do we need to add cbdata here? I do not see dynamically allocated
> HelperReply objects that can be passed as pointers to async calls or
> some such. Keep it simple?

Er, put that way we don't. Sorry, habit of making callback params all
CBDATA_CLASS2().
Dropped.

>> + enum Result {
>> + Unknown,
>> + OK,
>> + ERR,
>> + BrokenHelper,
> ...
>> + TT,
>> + AF,
>> + NA,
>> + LD
> It may be a good idea to make these names more unique to avoid clashes
> with poorly written code that places names like "OK" and "ERR" in the
> global namespace. Adding an "hrr" (helper reply result) prefix would help.
>
> If BrokenHelper is spelled out, perhaps "TT", "AF", and similar other
> results names should be too? Or, vice versa, BrokenHelper should match
> its BH string.
>
> It would be nice to spell out those acronyms(?) in comments if we use
> them. For example:
>
> BH, // Broken Helper
>

As mentioned in the TODO, the next-stage patch will be dropping
TT/AF/NA/LD from the enum when it upgrades the handlers to use key-pair
input and translates these old formats into OK/ERR/BH status.

>> + enum Result {
> ...
>> + } status;
> Both should be named either [rResult] or [sS]tatus to stay consistent IMO.
>
> The "status" member is also missing a Doxygen description.

Oops. Added documentation note.

Going with result/Result_ to match the documentation and with the plan
of having members named after registered keys.

>> + /// for stateful replies the last server needs to be preserved across callbacks
>> + CbcPointer<helper_stateful_server> lastserver;
> Why is this the "last" server? From the helper reply point of view, it
> is the one and only server. Please consider renaming to "server" or
> "helper" or "source".

It used to be the "lastserver" parameter sent to stateful callbacks. No
reason to keep it that name. I just didn't seen any clear need to change
it either.

Renamed 'whichServer' and documentation updated a bit.

>> + // some result codes for backward compatibility with NTLM/Negotiate
>> + // TODO: migrate these into variants of the above results with key-pair parameters
>> + TT,
>> + AF,
>> + NA,
>> + LD
> and
>
>> + case HelperReply::Unknown: // Squid 3.2 and older the digest helper only returns a HA1 hash (no "OK")
>> + case HelperReply::OK:
> and
>
>> + // The reply passed to us is constant, so we need to duplicate it and relay the duplicate onward.
>> + // The helper is using old format (result 'Unknown') so everything we need is inside the 'res' variable
>> + HelperReply newRes(res,(res?strlen(res):0));
> The code above already warrants addition of helper-specific reply
> classes that can hide helper specifics from the general helper code and
> hide backward compatibility conversions from the helper-specific code
> that should focus on other things.

Not in this patch. Later when we are settling the key-pair parser design
into place maybe.

>> + if (reply.other().hasContent())
>> + digest_request->setDenyMessage(reply.other().content());
>> + }
>> + break;
>> +
> Something funny is going on here with formatting: The "break" position
> looks odd.

I didn't source-format these patches to clarify the changes. There are a
few switch cases where I had to add brackets around local variable
definitions and the indenting is a bit off.

>> + case HelperReply::Unknown:
>> + if (reply.status == HelperReply::Unknown) {
>> + debugs(29, DBG_IMPORTANT, "ERROR: NTLM Authentication Helper '" << reply.lastserver << "' crashed!.");
>> + blob = "Internal error";
>> + }
>> +
> and
>
>> + case HelperReply::Unknown:
>> + debugs(29, DBG_IMPORTANT, "ERROR: Negotiate Authentication Helper '" << reply.lastserver << "' crashed!.");
>> + blob = "Internal error";
>> +
>> + case HelperReply::BrokenHelper:
> Please add something like /* continue to the next case */ to emphasize
> that "break" is not missing here.

Ack. Added.

>> - arg = strchr(blob + 1, ' ');
>> - } else {
>> - arg = NULL;
>> + const char *blob = reply.other().content();
>> + char *arg = NULL;
>> + if (blob && *blob != '\0') {
>> + const char *temp = strchr(blob + 1, ' ');
>> + arg = xstrdup(++temp);
>> }
> and
>
>> + if (reply.other().hasContent()) {
>> + char *temp = xstrdup(reply.other().content());
>> + char *token = strwordtok(temp, &t);
> Are these performance regressions? We did not do an xstrdup() before,
> did we? If these are minor performance regressions desirable for
> non-performance reasons, please disclose in commit message. Just like
> XXXs in the code, disclosing is "good practice" and might help
> identify/explain regressions later during performance testing or
> investigations.

Was trying to avoid const_cast<>. Resolved this with use of the
reply.modifiableOther().

>> + debugs(29, 8, HERE << "helper: '" << reply.lastserver << "' sent us {" << reply << "}");
> HelperReply already adds curly braces when dumped to ostream.
>

Dropped and made this identical display syntax for NTLM and Negotiate
handlers.

>> - if (reply_message.parse(reply, strlen(reply)) != Ssl::CrtdMessage::OK) {
>> + if (reply_message.parse(reply.other().content(), reply.other().contentSize()) != Ssl::CrtdMessage::OK) {
>> debugs(33, 5, HERE << "Reply from ssl_crtd for " << sslHostName << " is incorrect");
>> } else {
>> - if (reply_message.getCode() != "OK") {
>> + if (reply.status != HelperReply::OK) {
> This looks strange to me. We used to extract and analyze the code from
> "reply". Now we are extracting that code from reply.other() but then
> analyzing reply.status which is not a part of reply.other().

It used to also extract the result code from the char* (aka other()).
Now that is done already by the shared parser and stored in reply.status.

> If the old "reply" is different from the new "reply.other()", then we
> are parsing the wrong thing now. And if the old "reply" is the same as
> the new "reply.other()", then we are looking at the wrong status/code now.

The reply.other() is the response string without the "OK " characters.

reply_message.getCode() used return the result/status field of the
response (as string). That part of the line is now held in reply.status
and unknown to the reply_message parser.

> This confusion (even if the new code is correct!) is one more reason why
> helper-specific parsing code should be moved to helper-specific
> HelperReply kids as a part of this change IMO.

There is multiple uses in the ssl_crtd interface for
Ssl::CrtdMessage::OK. This caught me during the updates too, went down
the rabbit hole sideways and had to revert lots of work :-(.

The important point(s) to note:
* that ssl_crtd uses the same state tree to generate request lines and
to parse result lines.
* that ssl_crtd interface currently uses Ssl::CrtdMessage::OK both to
represent the "OK" result code token on responses and to report the
parser success/incomplete/garbage parsing state.

What my patch does is detach the "OK" response line parsing from
Ssl::CrtdMessage::OK, since the squid-input message "OK" is now parsed
by HelperReply object.
This leaves Ssl::CrtdMessage::OK as the state enumerator for
reply_message.parse() [which is ssl_crtd custom parser] and as the token
representing the helpers internel code for "OK" on helper-output messages.

I'm relatively confident that this operates identical to how ssl_crtd
operates right now in trunk. If you and/or Christos are able to verify
that it would be great. I have now used this patch successfully for
several days on production servers running basic auth, log daemon, and
external ACL helpers.

>> + // NULL-terminate so the old helper callback handlers do not buffer-overrun
>> + other_.terminate();
> Null-terminations is required for most helpers callbacks I saw in the
> patch. Perhaps we should remove "old "?

Done.

>> + if (reply.other().hasContent()) {
>> + /* 2012-06-28: This cast is due to urlParse() truncating too-long URLs itself.
>> + * At this point altering the helper buffer in that way is not harmful, but annoying.
>> + * When Bug 1961 is resolved and urlParse has a const API, this needs to die.
>> + */
>> + char * result = const_cast<char*>(reply.other().content());
> Please add the following method and use it here instead of the
> const_cast<>. It would make tracking the "At this point [modifying blob]
> is not harmful" assumption much easier/safer.
>
> /// access to modifiable blob, required by redirectHandleReply()
> /// and by urlParse() in ClientRequestContext::clientRedirectDone()
> MemBuf &HelperReply::modifiableOther() const { return other_; }
>
> The method implementation may need casting itself, but try without it
> first (and it is not a problem if it does need it).

Done. It did require casting, the underlying HelperReply object, and
thus the other_ member is const at the point modifiedOther() is needed.

> You can drop "modifiable" prefix from the method name, but it is easier
> to track this that way, and since it is an exception and not a commonly
> used method, I think it is better that way. Your call though.

Okay. Using to also revert those xstrdup regressions in Negotiate and
external ACL handlers.

>> + // BACKWARD COMPATIBILITY 2012-06-15:
>> + // Some nasty old helpers send back the entire input line including extra format keys.
>> + // This is especially bad for simple perl search-replace filter scripts.
>> + //
>> + // * trim all but the first word off the response.
>> + // * warn once every 50 responses that this will stop being fixed-up soon.
>> + //
>> + if (char * res = const_cast<char*>(reply.other().content())) {
> Same comments apply here.
>
> BTW, most other helper callbacks just dupe the the other().content()
> string. Why do these two one prefer to avoid that and cast instead?

Re-written with modifiedOther(). Changing the memBuf in-place has
dropped all the need for a duplicate HelperReply.

As for the Q. These ones were not duping to start with. I was trying to
keep the changes logically identical. The other handlers duping are
probably not needing to in most cases, or doing it prematurely. But that
is outside this patch scope.

>> + const MemBuf &other() const { return other_; };
> Extra semicolon at the end.

Done.

>> - http->redirect.status = status;
>> - http->redirect.location = xstrdup(t + 1);
>> + if (!strcmp(http->redirect.location, http->uri)) {
>> + // prevent redirector setting up an infinite loop of 302.
>> + xfree(http->redirect.location);
>> + debugs(85, DBG_CRITICAL, "ERROR: URL-rewriter redirects the client back to " << http->uri);
>> + } else {
> Looks like new functionality unrelated to your patch scope. Please
> commit separately if possible. BTW, have you seen redirect-to-same-URI
> cause infinite 302 loops? I thought browsers can detect these (and we
> may misdetect something if the two URLs are not actually the same from
> client point of view). Could be another policing problem that will
> require workarounds. If there are real cases were this is needed, let's
> add it, of course (but separately).

Arg. Yes. Reverted for now.

And yes I have seen some browsers hitting infinite loops IE7 is
particularly bad. The others let it loop up to a counter limit then
abort. I figure we might as well catch it at the source and notify an
admin who can probably fix it rather than leaving it up to an end-user
to figure out.

>> + debugs(85, DBG_CRITICAL, "ERROR: URL-rewriter redirects the client back to " << http->uri);
> ...
>> + debugs(85, (!(temp++%50)?DBG_CRITICAL:2), "ERROR: URL-rewriter redirects the client back to the same URL. Ignoring reply " << reply);
> Can you make these two cases identical in behavior,
> too-many-console-messages avoidance logic, and ERROR message text?
>

Can do. Reverted from this project patching though. Along with the next
few bits. Keeping your comments in mind for that other bug fix patch.

>> if (cbdataReferenceValid(state->def)) {
>> - if (reply)
>> + // only cache OK and ERR results.
>> + if (reply.status == HelperReply::OK || reply.status == HelperReply::ERR)
>> entry = external_acl_cache_add(state->def, state->key, entryData);
>> else {
>> external_acl_entry *oldentry = (external_acl_entry *)hash_lookup(state->def->cache, state->key);
> An undisclosed functionality change unrelated to this patch scope?
> Please commit separately.

No its part of this patch. There is always a reply now. Just with
varying status.

The lines received from this helper up to now are:
  "OK ..."
  "ERR ..."
  NULL (helper crash, aborted lookup, etc).

So the above change seeks to retain those semantics by bundling
everything other than OK/ERR into the NULL case. If there are new
semantics needed to handle Unknown/BrokenHelper that is part of the next
round of changes. I dont really think we should cache the BrokenHelper
and unknown-response states though.

>> + HelperReply nulReply(NULL,0);
> s/nul/null/ or s/nul/nil/
>
> There are several such objects; search for nulReply. This is minor, of
> course, but keep things more consistent and might make it easier for
> some to comprehend the intent.

Fixed.

Amos

text/plain attachment: HelperReply_stage1_mk3.patch

Received on Sun Jul 22 2012 - 21:30:45 MDT

This archive was generated by hypermail 2.2.0 : Mon Jul 23 2012 - 12:00:04 MDT