Re: [PATCH] Native FTP Relay

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Sun, 10 Aug 2014 17:30:43 -0600

On 08/10/2014 06:11 AM, Amos Jeffries wrote:
> On 10/08/2014 2:44 p.m., Alex Rousskov wrote:
>> This part of the problem will be gone when HTTP-specific code will be
>> moved to HttpServer, but the ICAP compatibility part will not go away
>> regardless of any cleanup.
>
> The ICAP compatibility issue is a bug in ICAP or specific ICAP servers
> that needs fixing there.

Yes, of course, but fixing those ICAP services is often beyond Squid
admin control, and there is no pressing need to break compatibility with
those ICAP services _now_ (by using a different FTP "version" than 1.0
or 1.1).

> HTTP itself can already (or shortly will) have
> present "HTTP/2.0", "h2" or "h2-NN" message versions to be encapsulated.

I suspect we will receive requests to optionally mask those new versions
when sent over ICAP because some broken service that the admin cannot
fix cannot handle much more than /1.0 and /1.1.

>>> in src/Server.cc:
>>> * this change looks semantically wrong:
>>> - if (!doneWithServer()) {
>>> + if (mayReadVirginReplyBody()) {
>>> closeServer();
>>
>> The change itself does not make the code look semantically wrong. On the
>> semantics level, the code looks about the same before and after the change:
>>
>> Was: If we are not done with the server, close it.
>> Now: If we are still reading from the server, close it.
>
> It is now *apparently* skipping all the possible non-reply-reading
> reasons for closing the server connection. That is semantic change...

If you think the above change affects something on the semantics level,
then I do not have any new arguments to convince you otherwise.
Hopefully, the diverging opinions on the code looks are not important in
this case, because we seem to agree that the patch code is correct (at
least in a sense that it does not make the known problem any worse and
actually solves one sub-problem).

> Okay, I think I understand the cases now. So the old HTTP code was
> broken by not considering the request and Trailers cases as okay even
> when ICAP closed. You are just not fixing that bug.

Yes, I am not fixing any old bugs here. As for what exactly is broken
and by how much, I prefer not to discuss those details further (here and
now) because the patch does not change them.

> Would you mind adding a TODO to the new HTTP mayReadVirginReplyBody()
> that it should be only considering reply body reads, not overall server use?

Done. There was no HTTP mayReadVirginReplyBody(), but I added one.

> in src/cache_cf.cc:
> * the new parsePortProtocol() FTP case should return
> Ftp::ProtocolVersion().

Done.

> in src/cf.data.pre:
> * mentions of "ftp-track-dirs=on|off" need updating

Fixed.

> in src/ftp/Elements.cc:
> * Ftp::HttpReplyWrapper() has unnecessary:
> httpVersion = Http::ProtocolVersion(Ftp::ProtocolVersion().major,
> Ftp::ProtocolVersion().minor);
>
> - the current default Http::ProtocolVersion() constructor sets the
> right values for HTTP messages generated by Squid. The old code using
> Http::ProtocolVersion(1, 1) was incorrect/outdated.
>
> - same for Ftp::Server::parseOneRequest() in src/servers/FtpServer.cc

Left as is because the Http::ProtocolVersion default is irrelevant in
those two places.

We should set the wrapping message version to what FTP code wants to use
for HTTP wrapping. That version is defined by Ftp::ProtocolVersion().
The fact that the HTTP default currently matches the version used for
FTP wrapping is a coincidence that the code should not rely on. Imagine
changing the HTTP default to 2.0 tomorrow and then spending hours trying
to find all the places where FTP was _implicitly_ relying on that
default being 1.1...

> in src/clients/Makefile.am:
> * no need for forward.h in the SOURCES list to be separated from the
> other files.
> - same in src/servers/Makefile.am

Removed.

> * please include $(top_srcdir)/src/TestHeaders.am to run the .h header
> unit tests
> - same in src/ftp/Makefile.am and src/servers/Makefile.am

Done.

> in src/servers/forward.h:
> * missing pre-define for class ConnStateData
> - this would have been caught by the .h unit tests mentioned above.

Fixed.

> in src/servers/FtpServer.cc:
>
> + Ftp::Server::parseOneRequest() for all the following until "++"
>
> * please reference RFC 959 section 3.1.1.5.2 for the unusual definition
> of WhiteSpace CharacterSet.

Done. It was actually based on isspace(3) so not that "unusual" :-)

> * please use existing CharacterSet::LF instead of re-defining as local
> NewLine.

Done.

> * please use tok.prefix(CharacterSet::ALPHA) to parse the FTP command
> instead of BlackSpace. Then explicit delimiter check to validate the input.
> - RFC 959 is clear:
> "The command codes themselves are alphabetic characters terminated
> by the character <SP> (Space) if parameters follow and Telnet-EOL
> otherwise."

Not changed. I do not see a compelling reason for a general purpose
relay to police traffic by default. Squid itself does not care if there
is some other character present in some command name. Why increase the
chances of breaking what was working without Squid by rejecting commands
by default?

Yes, it is possible that some command that Squid does care about would
have invalid trailing characters in it, preventing Squid from
recognizing it, but, with your approach, Squid would have to reject that
command anyway, so we would not break something that would work in a
policing Squid. In the worst case, we would just make triage more difficult.

If you insist on Squid rejecting RFC-invalid command names, I will
change the code, but I suggest leaving it permissive for now.

> - NP: if the input has LF terminated lines instead of CRLF the current
> parser will consume multiple \n into the "Blackspace" cmd string then
> demand more.

Fixed. Good catch!

> * the right-trimming of params only trims "\r\t " instead of the defined
> WhiteSpace special set.
> - the left-trim crops all custom Whitespace characters.

Changed to right-trim WhiteSpace characters.

> - since you are altering the other string parser API significantly
> anyway, may as well add SBuf::trim version using CharacterSet and use it
> here with FTP custom Whitespace variable. That also gets rid of the TODO.

As that TODO meant to indicate, I have considered doing those
enhancements to Tokenizer/SBuf and decided not to volunteer them at this
time. I suspect the best changes to support right-trimming will not be
simple and would like to see more actual use cases to narrow down the
best solution.

> * regarding "// interception cases do not need USER to calculate the uri"
> - can you explain that a bit. Where does the URI detail come from
> instead of USER ?

AFAIK, it comes from the intercepted connection address. Please see
Ftp::Server::start() where the URI is formed for the interception case.
I am pretty sure interception will require more work in the future, but
I believe it worked at least in some environments when Dmitry wrote and
tested it.

> * please use () to make this line easier to read:
> const SBuf *path = params.length() && CommandHasPathParameter(cmd) ?
> &params : NULL;

Done.

> * please omit StoreIOBuffer "data" parameter when its unused in methods

Done? I hope I did not miss any cases.

> * Ftp::Server::wroteReplyData() please avoid wrap on debugs()

Done.

> * Ftp::Server::handleEpsvReply() comment "we combine remote server
> address with" is wrong/stale
> - looks like a cut-n-paste error from what handlePasvReply() does
> differently.

Fixed by removing the "remote server address" part.

> * debug section 11 is HTTP protocol.
> - FTP protocol info debugs are supposed to be section 9.

Fixed.

> * Ftp::Server::createDataConnection() is not using the new
> Comm::Connection::setAddrs() you had me add a few days back.

Fixed.

> in src/servers/FtpServer.h:
>
> * please make Ftp::MasterState RefCountable and use a Pointer for
> Ftp::Server::master
> - this will make is easily linked to MasterXaction later (or sooner)
> without altering lots of FTP code in a non-FTP patch.

Done to shorten the discussion, but I do not think it will help with the
transition.

> * please fix Ftp::MasterState definitions ordering (constructor).
> - coding guidelines specify methods before variables.

Done.

> * please rename private member variables with _ suffix as per coding
> guidelines.

Not done. The lack of underscore complies with the guidelines. We should
change the corresponding MAY into MUST if we want to use underscores for
all non-public data members, but I do not think we should do that.

An underscore tells the developer that the data member should be
accessed via a get/set method instead of directly. The rest is fluff
growing on top of that idea. If we start using underscores for all
non-public data members, then we will kill the original idea. Perhaps
that has already happened in Squid, but I do not think it is a good
thing, and I do not want to make it worse.

> - the 1-word members in particular are easily confused with local
> variables in the .cc method code.

If _that_ is the problem you want to solve, we should use underscores
for all data members, not just the non-public ones. I do not think that
would be overall better than using underscores for data members with
access methods, but I am happy to debate that and volunteer to rename
FtpServer data members if the results of that debate warrant a change.

IMO, a better solution (if there is one) for the confusion problem you
outlined above is to keep methods small enough for local variables to be
easily visible/remembered when one looks at the method code.

A cumulative patch and a patch with just this iteration (r12816) changes
are attached, compressed as usual.

Thank you,

Alex.
> PS. you know how we have been discussing reasons for patches
> encountering audit problems and wasting peoples time ...

Yes.

> 1) earlier I highlighted the parse classes API changes which were not
> particularly FTP related. They could have been audited separately and
> merged earlier. With possible benefits to the ongoing parser project.

I do not understand why are you mentioning the above in the "wasting
peoples time" context:

* "audited separately": The API changes could indeed be audited
separately -- it is auditor's decision what to review and those changes
are confined to a few well-known source files. I did not really make a
separate audit much harder IMHO; in fact, one could argue that I made a
quality audit easier overall by providing working usage examples!

* "merged earlier": A separate merge would waste time compared to a
merge of the whole branch. This is the primary reason I am not
volunteering to do that right now.

> 2) this patch is also explicitly forcing the ConnStateData inheritence
> model for Http::Server and Ftp::Server

Within the scope of the FTP project, the three "Server" classes is the
only reasonable inheritance model for ConnStateData code: Shared code
stays in the Server class (old ConnStateData) while protocol-specific
code goes into either HttpServer or FtpServer. There is really not much
room for design or debate here IMO!

There are certainly complex and still unresolved issues surrounding
client_side* classes, but this project _avoids_ them. This is not an
accident. Furthermore, it is quite possible that ConnStateData and its
kid classes will significantly change once we resolve other issues, but
I bet the basic protocol-based split is not going away regardless of any
future reorganization, so the patch is a step forward for any future
design direction.

> despite previous weeks of
> argument reaching a deadlock about separation of duties between
> ConnStateData or ClientSocketContext for that class hierarchy - and
> indeed whether it should be a hierarchy at all.

I think the FTP patch avoids those issues by keeping ClientSocketContext
design as is.

> *without any discussion to resolve that deadlock*.

As you said, there have been weeks of arguments. If weeks of arguments
did not resolve a deadlock, another month of discussions is not very
likely to resolve it either (but we should keep trying if there is no
better alternative).

Please note that the origin-facing design has been approved in January
2014 and that design is exactly the same as the user-facing design in
the posted patch. That January RFC did not detail the user-facing side
classes beyond the introduction of FtpServer class because I was not
sure how much of a restructuring I could commit to doing at the time.
The only addition to that design is the HttpServer class which should
not come as a surprise if FtpServer was approved/expected!

> Luckily I have learnt more of the client-side functionality since then
> and am okay with this change. But since we have not communicated that a
> lack of discussion here could very easily have resulted in more
> days/weeks wasted.

Very true, but I do not really see how that risk could have been
mitigated more efficiently in this case since the weeks of discussions
have not worked! If you are happy with the patch, we just saved weeks of
additional arguments. If you think the design is wrong, there is working
code (needed regardless of any discussions) that may help shape future
arguments.

Received on Sun Aug 10 2014 - 23:31:04 MDT

This archive was generated by hypermail 2.2.0 : Mon Aug 11 2014 - 12:00:16 MDT