Re: [PATCH] Native FTP Relay

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 08 Aug 2014 10:57:42 -0600

On 08/08/2014 09:48 AM, Amos Jeffries wrote:
> On 8/08/2014 3:29 p.m., Alex Rousskov wrote:
>> Hello,
>>
>> Native FTP Relay support is finally ready: Squid receives native FTP
>> commands on ftp_port and relays FTP messages between FTP clients and FTP
>> origin servers through the existing Squid access control, adaptation,
>> and logging layers. A compressed 70KB patch attached to this email is
>> also temporary available at [1]. I would like to merge the corresponding
>> lp branch[2] into Squid trunk.
>>
>> This is a large, complex, experimental feature. I am sure there are
>> cases where it does not work well or does not support some existing
>> features. I am not aware of any regressions specific to old FTP gateway
>> code, and I hope there are no regressions specific to HTTP code, but I
>> do not recommend deployment without testing.
>>
>> The branch code worked quite well in limited production deployment, but
>> the final version (with code restructuring and polishing for the
>> official submission) has only seen simple lab testing. AFAIK, the FTP
>> interception path has not seen any production deployment at all.
>>
>> This code represents more than a year worth of development, started by
>> Dmitry Kurochkin in the beginning of 2013. Christos Tsantilas and I
>> finished Dmitry's work. I bear responsibility for the bugs, but there
>> are probably areas of Dmitry's code that appear to work well and that
>> neither Christos nor I have studied.
>>
>> Overall, we tried to focus on the new FTP code and leave the old
>> FTP/HTTP code alone, except where restructuring was necessary to avoid
>> code duplication. For example, the server-facing side reuses a lot of
>> the old FTP code, while the relayed FTP commands are passed around in
>> HttpMsg structures using the old ClientStreams, doCallout(), and Store
>> interfaces.
>>
>> If you review the patch, please note that bzr is incapable of tracking
>> code across file splits (e.g., old ftp.cc becoming ftp/Parsing.cc plus
>> three clients/Ftp*.cc files). Many of the old XXXs and TODOs will appear
>> as newly added code in the patch. Usually, one can figure it out by
>> searching for the similar but deleted code in the patch.
>>
>> Please see the above referenced lp branch for a detailed change log.
>>
>>
>> Some Native FTP Relay highlights:
>>
>> * Added ftp_port directive telling Squid to relay native FTP commands.
>> * Active and passive FTP support on the user-facing side; require
>> passive connections to come from the control connection src IP.
>> * IPv6 support (EPSV and, on the user-facing side, EPRT).
>> * Intelligent adaptation of relayed FTP FEAT responses.
>> * Relaying of multi-line FTP control responses using various formats.
>> * Support relaying of FTP MLSD and MLST commands (RFC 3659).
>> * Several Microsoft FTP server compatibility features.
>> * ICAP/eCAP support (at individual FTP command/response level).
>> * Optional "current FTP directory" tracking (cannot be 100% reliable due
>> to symbolic links and such, but is helpful in some common use cases).
>> * FTP origin control connection is pinned to the FTP user connection.
>> * No caching support -- no reliable Request URIs for that (see above).
>> * Significant FTP code restructuring on the server-facing side.
>> * Initial steps towards HTTP code restructuring on the client-facing
>> side.
>>
>>
>> Changes to the general code used by the Native FTP Relay code:
>>
>>
>>> * The user- and origin-facing code restructured as agreed previously on
>>> this mailing list. I will email some thoughts about the results separately,
>>> but here is the executive summary:
>>>
>>> src/servers/FtpServer.* # new FTP server, relaying FTP
>>> src/servers/HttpServer.* # old ConnStateData parts conflicting with FtpServer
>>> src/clients/FtpClient.* # code shared by old and new FTP clients
>>> src/clients/FtpGateway.* # old FTP client, translating back to HTTP
>>> src/clients/FtpRelay.* # new FTP client, relaying FTP
>>> src/ftp/* # FTP stuff shared by clients and servers
>>>
>>>
>>> * Fixed HttpHdr::Private/NoCache(v) implementations and optimized their API.
>>>
>>> * Tokenizer fixes and API improvements:
>>>
>>> Taught Tokenizer to keep track of the number of parsed bytes. Many callers
>>> need to know that because they need to adjust/consume I/O offsets/buffers.
>>>
>>> Adjusted unused Parser::Tokenizer::token() to not treat NUL delimiter
>>> specially. Besides the fact that not all grammars can treat NUL that way, the
>>> special NUL treatment resulted in some token() calls returning true for empty
>>> tokens, which was confusing parsers. Callers that do not require trailing
>>> delimiters, should use prefix() instead. This change is based on experience
>>> writing Tokenizer-based FTP parser, although the final parser code uses
>>> prefix() instead of token(), for unrelated reasons.
>>>
>>> Split Parser::Tokenizer::skip(set) into skipOne() and skipAll(). All other
>>> skip() methods skip exactly one thing (either a given character or a given
>>> token) but the old skip(set) method was skipping multiple things. That
>>> confused callers. We now force the caller to make a choice.
>>>
>>> Fixed Parser::Tokenizer::skip(char) to avoid out of bound access.
>>>
>>>
>>> * CharacterSet enhancements:
>>>
>>> Added CharacterSet::complement() to create "all except those in that set" sets
>>> handy for parsing (e.g., "get all characters until the end of line").
>>>
>>> Added CharacterSet::rename() to label sets. Handy in const declarations that
>>> use expressions. For example: const CharacterSet AB = (A+B).renamed("AB").
>>>
>>>
>>> * SBuf fixes:
>>>
>>> Do not allow SBuf::toLower/toUpper() return a value.
>>>
>>> Some callers think toLower() method changes the string and some think it
>>> returns a changed value without changing the string. Without unportable hacks
>>> like __attribute__((warn_unused_result)), some of the callers will guess wrong
>>> and not know about it. It is easier/safer to change the object itself and
>>> return nothing.
>>>
>>> Provided ToUpper/ToLower() functions (not methods!) that preserve their
>>> paramter and return a new object for callers that need the old functionality.
>>> These functions are a good candidate for __attribute__ hacks.
>>
>>
>>
>> Thank you,
>>
>> Alex.
>>> [1] http://www.measurement-factory/tmp/attachments/ftp-native-r12814.patch.gz
>>> [2] https://code.launchpad.net/~measurement-factory/squid/ftp-native
>>
>
> Audit results (part 1):
>
> * Please apply CharacterSet updates separately.
>
> * Please apply Tokenizer API updates separately.
>
> * please apply src/HttpHdrCc.h changes separately.

Why? If the branch is merged, they will be applied with their own/
existing commit messages.

> * is FTP/1.1 really the protocol version of FTP parsed by Squid? (the
> AnyP::ProtocolVersion parameters)
...
> - NP: 0,0 version should be usable and probably best since there does
> not appear to be any actual official version numbering for FTP.

Since there is no "any actual official version numbering for FTP", we
are using what works best. Version 1.1 works best because it causes
fewer FTP exceptions in the general code because FTP control connections
are persistent by default, just like HTTP/1.1 connections. I think there
is a comment about that.

Using 0.0 would probably create more exceptions. 0.0 will most likely
also screw up some ICAP services that expect /1.0 or /1.1 when parsing
headers.

After reading the above arguments, do you still want us to switch to 0.0?

> in src/cf.data.pre:
> * please document tproxy, name=, protocol=FTP as supported on ftp_port

We do not know whether it is supported and do not plan on testing/fixing
that support right now. Do you still want us to document it as supported?

> in src/client_side.h:
> * what is "waiting for future HttpsServer home" meant to mean on
> postHttpsAccept()

Unlike HTTP and FTP, there is currently no object dedicated to serving
HTTPS connections. HttpServer is misused for that on one occasion, but a
lot more work is needed to properly move HTTPS code from ConnStateData
into severs/HttpsServer.* That work is unrelated to FTP.

I will be working on other comments/requests.

Thank you,

Alex.
Received on Fri Aug 08 2014 - 16:57:58 MDT

This archive was generated by hypermail 2.2.0 : Fri Aug 08 2014 - 12:00:12 MDT