Re: [PATCH] Native FTP Relay

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Sat, 09 Aug 2014 20:44:21 -0600

On 08/08/2014 11:13 AM, Amos Jeffries wrote:
> On 9/08/2014 4:57 a.m., Alex Rousskov wrote:
>> On 08/08/2014 09:48 AM, Amos Jeffries wrote:
>>> Audit results (part 1):
>>> * Please apply CharacterSet updates separately.
>>> * Please apply Tokenizer API updates separately.
>>> * please apply src/HttpHdrCc.h changes separately.

>> Why? If the branch is merged, they will be applied with their own/
>> existing commit messages.

> They are updates on previous feature changesets unrelated to FTP which
> will block future parser alterations (also unrelated to FTP) being
> backported if this project takes long to stabilize.
>
> If they have their own commit messages then cherrypicking out of the
> branch should be relatively easy.

Thank you for explaining your rationale. To minimize overheads, let's
come back to this issue if the branch is not merged soon.

>>> * is FTP/1.1 really the protocol version of FTP parsed by Squid? (the
>>> AnyP::ProtocolVersion parameters)

> Okay, leave as-is but please document that as the reason for using that
> version number

Done. It was actually documented in the corresponding commit message. To
provide _source code_ documentation in one place, I added
Ftp::ProtocolVersion() and used that everywhere instead of hard-coded
"1,1". It will be easier to track down all "1,1" users that way if
somebody wants to change or polish this further later. See ftp/Elements.*

The resulting code required more changes than one may expect and is
still a little awkward because Http::ProtocolVersion should have been
just a function (not a class/type), and because PortCfg::setTransport
internals did not belong to AnyP where they were placed. I did my best
to avoid cascading changes required to fix all that by leaving
Http::ProtocolVersion as is and moving PortCfg::setTransport() internals
into cache_cf.cc where they can access FTP-specific code.

> and a TODO about cleaning up the message processing to
> stop assuming HTTP versions.

Done in one of the many client_side.cc places where we still assume
HTTP. Here is the new TODO:

> /* RFC 2616 section 10.5.6 : handle unsupported HTTP major versions cleanly. */
> /* We currently only support 0.9, 1.0, 1.1 properly */
> /* TODO: move HTTP-specific processing into servers/HttpServer and such */
> if ( (http_ver.major == 0 && http_ver.minor != 9) ||
> (http_ver.major > 1) ) {

This part of the problem will be gone when HTTP-specific code will be
moved to HttpServer, but the ICAP compatibility part will not go away
regardless of any cleanup.

>>> in src/cf.data.pre:
>>> * please document tproxy, name=, protocol=FTP as supported on ftp_port

>> We do not know whether it is supported and do not plan on testing/fixing
>> that support right now. Do you still want us to document it as supported?

> Yes. You have not changed the TcpAcceptor code for TPROXY lookups or the
> ACL code using port name. So they are supported.

Done. Please note that protocol= requires accel mode, and I am not sure
what that mode means for a native FTP relay. I added protocol=FTP anyway
in hope to shorten the discussion.

> protocol= you explicitly added support in the parser.

This is probably a misunderstanding caused by poor official code
quality. We did not add explicit support for the protocol= option. The
parsing code we added was required for the _default_ ftp_port
configuration (no protocol=... option) to work because of the way the
general port parsing (and other) official code is written.

The official code uses CfgPort::transport for at least two rather
different purposes (for determining foo in foo_port and for the
protocol= feature). Cleaning this up requires a serious effort well
beyond the FTP project scope (the FTP code does not really need the
protocol= feature to work).

If, after reading the above clarification, you would rather see
protocol=FTP gone from squid.conf.documented, I would be more than happy
to remove it.

> in src/FwdState.cc:
> * at line 817 calling request->hier.note() twice in a row with different
> values is both needless change and wasting CPU.

Fixed? This was my merge bug. The fixed patch uses the more recent line
version (and both lines were yours, thank you).

> in src/HttpHeader.cc:
> * please enact the new TODO about stripping FTP-Arguments "header" on
> PASS command.

Done to shorten the discussion.

> This information leak could earn us a CVE if merged as-is.

I do not think it could because that code is currently unused by/for FTP
relays (as the added source code comment tried to explain). Native FTP
relay does not send Squid "error pages" to the user and those error
pages are the only context (that I know of) where the masking code is
actually used today.

> in src/HttpHeaderTools.cc:
> * httpHeaderQuoteString() could be optimized with needInnerQuote as
> SBuf::size_type to append in blocks between characters needing quote.

Agreed. Added a corresponding TODO. I do not want to volunteer to
provide the corresponding optimization at this time and consider it
rather minor, especially given relative rarity of FTP traffic and its
focus on large/long transfers. The now-documented optimization would
become more useful if HTTP code starts using this function, of course.

> in src/Server.cc:
> * this change looks semantically wrong:
> - if (!doneWithServer()) {
> + if (mayReadVirginReplyBody()) {
> closeServer();

The change itself does not make the code look semantically wrong. On the
semantics level, the code looks about the same before and after the change:

Was: If we are not done with the server, close it.
Now: If we are still reading from the server, close it.

> ... because the action resulting is to close the server connection.
> We may still need to write to this server, or to read some Trailers
> after the reply body has finished.

The old comment above that code explains why we have to do what looks wrong:

> // We stop reading origin response because we have no place to put it and
> // cannot use it. If some origin servers do not like that or if we want to
> // reuse more pconns, we can add code to discard unneeded origin responses.

The situation discussed in the above comment has not changed and is
unrelated to FTP. When the adaptation is done, Squid cannot continue to
read the virgin response. In most cases, this rude closure should be OK
because nobody cares about that response.

According to the commit log, the FTP-related code change from
!doneWithServer() to mayReadVirginReplyBody() was necessary to minimize
pointless closures of FTP server control connections. Here is the
explanation from the commit log:

> revno: 12761
> committer: Alex Rousskov <rousskov_at_measurement-factory.com>
> branch nick: ftp-gw
> timestamp: Sun 2013-08-25 13:50:44 -0600
> message:
> Avoid some unnecessary FTP control connection closures; ...
>
> Do not close FTP server control connection just because FTP response
> adaptation is done. We still close FTP connections if we are receiving the
> virgin response (because we have no place to store it), but if we are done
> receiving, there is no need to terminate FTP server connections. The former
> happens when an ICAP service responds before receiving the entire virgin FTP
> response. The latter, when the ICAP service responds after the FTP data
> connection is closed but before the control 226 response comes in.

Note that the FTP 226 response arrives on the control connection while
the response data is received on the data connection. closeServer()
closes both connections to the server and doneWithServer() tests both
connections.

HTTP does not have to deal with this two-connection complication, of
course -- when it is done with the only connection it can immediately
close or pool it.

> in src/Server.h:
> * I'm not sure the call order of doneWithServer() and
> mayReadVirginReplyBody() is the right way around.

Sorry, I am not sure what you mean by "call order" in src/Server.h
context. That file does not determine the order of those two calls.

> It seems like "being
> done completely with a server" includes "whether more reply is coming"
> as one sub-condition of several (see above), not the reverse.

Correct. As the commit message quoted above tries to explain, FTP needed
to distinguish two cases that the old code merged together:

* The transaction may still be receiving the virgin response (the new
mayReadVirginReplyBody method). This is when FTP and HTTP code has to
end communication with the server. No other solution here right now (as
the source code comment explains).

* The transaction stopped receiving the virgin response but may still be
able to communicate with the origin server. In this case, the FTP code
wants to keep the [pinned] _control_ connection to the origin server
open and, hence, does not want to call closeServer().

The code remains the same for HTTP because the new
mayReadVirginReplyBody() method just calls the old doneWithServer()
method in HTTP case.

> in src/anyp/PortCfg.cc:
> * at line 199 please s/http(s)_port/*_port/ for the fatalf() message so
> it no longer mentions http(s)_port when parsing ftp_port.

Changed. That fatalf() call (moved to cache_cf.cc) now displays the
offending directive name. I also changed all other similar messages that
I could find.

> * please make the protocol=FTP case sensitive upper case "FTP".
> - the variants with lower case http/https and without version numbers
> are only accepted for backward compatibility. Which this new feature
> does not suffer from.

Done. This change required a bit of rewriting due to the same
misunderstanding of where the protocol=... parsing code is used as above.

> in src/cache_cf.cc:
> * squid.conf on/off parameters for directives generally do not
> explicitly state =on/=off in their naming unless there are other values
> to differentiate in the =value part.
> - "ftp-track-dirs" is sufficient with a default of false.

Changed to shorten discussion (despite my dislike for this asymmetrical
configuration approach).

> in src/cf.data.pre:
> * ... "client_idle_pconn_timeout s/used/uses/ for incoming"...

Not changed. Here is the context:

> NAME: ftp_client_idle_timeout
> How long to wait for an FTP request on a connection to Squid ftp_port.
> Many FTP clients do not deal with idle connection closures well,
> necessitating a longer default timeout than client_idle_pconn_timeout
> used for incoming HTTP requests.

The patch version sounds correct to me while the requested change would
lead to a garbled sentence AFAICT:

  - Longer [timeout] than what?
  - Longer than client_idle_pconn_timeout.
  - What client_idle_pconn_timeout?
  - client_idle_pconn_timeout used for incoming HTTP requests.

> in src/clientStreamForward.h:
> * why is enums.h included? please document.

Done. It is used for clientStream_status_t returned by CSS(). I do not
see other enum.h #includers justifying their enums.h use. A new policy?

> * please remove the "\ingroup ClientStreamAPI" since you are shuffling
> their related code

Done.

> - these can be replaced with a single TODO on creating a ClientStream
> namespace instead.

Did not add a TODO because I doubt we need that namespace. I suspect we
should get rid of that whole "pluggable C functions" design instead.

> in src/client_side.h:
> * please document finishDechunkingRequest() properly now that it's
> shifted scope.

Moved it back into protected scope instead. We had to expose some
methods during code reshuffling, but that exposure apparently became
unnecessary after additional code polishing got rid of old
function-based callbacks.

> * do not need to add protected: twice.
> - or was the second one meant to be private: since it is now exposing
> the entire set of private members and variables?

Fixed. Sorry I missed what was supposed to be a temporary change. After
all the polishing, only one member had to be elevated into protected
status (bodyPipe).

> - if the privates are now supposed to be protected please TODO about
> renaming them all from private_ style.

FWIW, the trailing underscore can be used for protected data members as
well. The underscore is there primarily to avoid conflicts with access
method names. As such, it does not make sense for public data members,
but can be used for both protected and private ones (discouraging their
use in lieu of those access methods).

> in src/clients/FtpClient.cc:
> * needless empty line after squid.h include. same in FtpRelay.cc

Fixed both. Checked for, but did not find other occurrences.

> * missing empty line between "" includes block and <> ones.

Fixed. Checked for, but did not find other occurrences.

> * please remove all "\ingroup ServerProtocolFTPInternal" this grouping
> is replaced by the namespace already.

Done reluctantly as this change caused lots of comment-only changes in
the old FTP code.

The adjusted compressed patch is attached (ftp-native-r12815.patch).
This patch corresponds to r12815 of the ftp-native branch[1]. I also
attached a smaller compressed patch showing just the changes made for
this review iteration (ftp-native-c12815.patch).

Thank you,

Alex.
[1] https://code.launchpad.net/~measurement-factory/squid/ftp-native

Received on Sun Aug 10 2014 - 02:44:41 MDT

This archive was generated by hypermail 2.2.0 : Sun Aug 10 2014 - 12:00:11 MDT