Re: Ip::Address performance from Amos Jeffries on 2010-11-19 (squid-dev)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 19 Nov 2010 20:06:31 +1300

On 19/11/10 09:27, Alex Rousskov wrote:
> Hello,
>
> This email summarizes my questions and suggestions related to optimizing
> Ip::Address implementation.
>
> Disclaimer: I am writing from general performance and API principles
> point of view. I am not an IP expert and may miss some critical
> portability concerns, among other things. I am counting on Amos and
> others to audit any ideas worth considering.
>
>
> Ip::Address is actually a socket address. Besides the IP address itself,
> the class maintains the socket address family and port (at least).
>
> Currently, Ip::Address is designed around a single sockaddr_in6 data
> member that holds the address family, socket port, the IP address
> itself, and some extra info. This is 28 bytes of storage, most of which
> are used in case of an IPv6 address.
>
> If we were to deal with IPv4 addresses only, the corresponding "native"
> IPv4 sockaddr_in structure takes 16 bytes, and only the first 8 of which
> are normally used.
>
> Currently, Ip::Address implementation strategy can be summarized as
> "convert any input into a full-blown IPv6 address as needed, store that
> IPv6 address, and then convert that IPv6 address into whatever the code
> actually needs". This is a simple, solid approach that probably helped
> eliminate many early IPv6 implementation bugs.
>
> Unfortunately, it is rather expensive performance-wise because the input
> is usually not an IPv6 address and the output is usually not an IPv6
> address either. While each single conversion (with the exception of
> conversion from and to textual representation) is relatively cheap,
> these conversions add up. Besides, the conversions themselves and even
> the "what am I really?" tests are often written rather inefficiently.
>
> For example, to go from an IPv4 socket address to an a.b.c.d textual
> representation of that address, the current code may allocate,
> initialize, use, and discard several sockaddr_in6 structures, do a dozen
> of sockaddr_in6 structure scans for certain values, and copy parts of
> sockaddr_in and sockaddr_in6 structures a few times a long the way to
> the ntoa() call. After the ntoa() call, scan the resulting string to
> find its end.
>
> The old code equivalent of the above? Initialize one IPv4 sockaddr_in
> structure on stack and pass it to ntoa()!
>
>
> I see three ways to optimize this without breaking correctness:
>
> * one-struct: Keep using a single sockaddr_in6 data member and convert
> everything into an IPv6 socket address as we do now. Optimize each
> conversion step, remove repeated "who am I?" checks during each step,
> and optimize each check itself.

All of the storage methods require them to identify the type accessed.
The only way to completely avoid them is to revert to v4-only or v6-only
compile modes.

Yes they seriously need some optimization and reduction.

>
> * union: Replace sockaddr_in6 storage with a union of sockaddr_in and
> sockaddr_in6 structures (or their wrappers). The Ip::Address class will
> check the address family part (that must overlap in all socket
> addresses, by design) to know what the address is and then use the right
> structure. This will avoid most conversions for the currently dominating
> IPv4-to-IPv4 path. If the caller does need a different version of the
> address than the one we stored, the "up" or "down" conversion is
> unavoidable and is still be handled by the Ip::Address.

Otherwise known as the sockaddr_storage.
http://www.retran.com/beej/sockaddr_inman.html

FWIW: To convert to this the stages are:
* change the type to sockaddr_storage
* re-add maintenance stages to keep the sa_family field accurate
(these were originally intended but dropped to avoid bugs caused by and
hidden by premature-optimization).

* add casts as required to make the correct sockaddr_* type fields
visible.

This makes the IsIP*() tests faster single-byte comparisons and the
IPv4() output faster.

>
> * two-struct: Similar to union, but keeping two separate structures, not
> merged in a union. This costs more memory and whole-object copying
> cycles, but allows us to keep both native and converted versions of the
> addresses. I do not know whether that is useful.

IMO: two-struct is just a waste of memory. We would have to do the same
or less maintenance for other methods. We have no situations where the
one object has to hold both types.

For the record there is a fourth option that could be considered:

* addrinfo: This is a descriptor to a dynamically allocated
sockaddr_in or sockaddri_in6 with all the what-do-i-hold? details we
would need to maintain for a union or two-struct designs anyway.

Overall I think this is slightly worse than union though. The size of
addrinfo in addition to the sockaddr it points to has little gain over
two-struct. With extra maintenance to keep the addrinfo type info
accurate in duplicate to the start of sockaddr.

>
> The "one struct" approach is simpler to implement given the current code
> state, but the "union" approach should be faster as long as IPv4-to-IPv4
> paths are common. The union approach may also yield overall simpler code
> as there will be very few "what am I know?" checks and concerns. The
> two-struct approach attempts to minimize back-and-forth conversion costs
> by giving access to both v4 and v6 versions at the same time.
>
> FWIW, on a conceptual level, boost.asio library uses a union approach
> (only one address version is valid per Address object) while their
> implementation uses the two-struct approach (there is a place to store
> both IPv4 and IPv6 versions). This does not surprise me much because
> this Boost library does not seem to care about RAM/copying overheads.
>
> Interestingly enough, boost.asio does not support transparent v4-v6
> conversions (AFAICT): To do a conversion, the calling code needs to know
> what IP version the generic address is storing, get that
> version-specific address, and then ask it to convert to another version.
> Perhaps that means that nobody needs a general "give me IPv6 regardless
> of what you really have" API?
> https://svn.boost.org/trac/boost/browser/trunk/boost/asio/ip/address.hpp
>
>
> There are some secondary optimizations and API changes that I would like
> to discuss, but let's first see if there is a consensus which overall
> design approach is the way to go. There may be more options than the
> three outlined above.
>

Phew, I was worried you had uncovered some major fatal flaw in the
design. So it seems just the expected culprits which were not optimized
for testing and development simplicity are the ones to die.

I think we go with union. The sockaddr_storage type and its usage is
relatively standard and can be passed directly to the stack, or on some
API via an addrinfo wrapper pointing at it.

PS: If you are still wanting to unify the sockaddr_ip* with sockaddr_un
addresses we can do it easily with union {sockaddr_storage + padding;
sockaddr_un;} and keep all the existing v4/v6 code unchanged due to the
way it uses the initial sockaddr* common fields.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.9
   Beta testers wanted for 3.2.0.3

Received on Fri Nov 19 2010 - 07:06:42 MST

This archive was generated by hypermail 2.2.0 : Fri Nov 19 2010 - 12:00:05 MST