Re: Ip::Address performance

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 19 Nov 2010 10:31:00 -0700

On 11/19/2010 12:06 AM, Amos Jeffries wrote:
> On 19/11/10 09:27, Alex Rousskov wrote:
>> Hello,
>>
>> This email summarizes my questions and suggestions related to optimizing
>> Ip::Address implementation.
>>
>> Disclaimer: I am writing from general performance and API principles
>> point of view. I am not an IP expert and may miss some critical
>> portability concerns, among other things. I am counting on Amos and
>> others to audit any ideas worth considering.
>>
>>
>> Ip::Address is actually a socket address. Besides the IP address itself,
>> the class maintains the socket address family and port (at least).
>>
>> Currently, Ip::Address is designed around a single sockaddr_in6 data
>> member that holds the address family, socket port, the IP address
>> itself, and some extra info. This is 28 bytes of storage, most of which
>> are used in case of an IPv6 address.
>>
>> If we were to deal with IPv4 addresses only, the corresponding "native"
>> IPv4 sockaddr_in structure takes 16 bytes, and only the first 8 of which
>> are normally used.
>>
>> Currently, Ip::Address implementation strategy can be summarized as
>> "convert any input into a full-blown IPv6 address as needed, store that
>> IPv6 address, and then convert that IPv6 address into whatever the code
>> actually needs". This is a simple, solid approach that probably helped
>> eliminate many early IPv6 implementation bugs.
>>
>> Unfortunately, it is rather expensive performance-wise because the input
>> is usually not an IPv6 address and the output is usually not an IPv6
>> address either. While each single conversion (with the exception of
>> conversion from and to textual representation) is relatively cheap,
>> these conversions add up. Besides, the conversions themselves and even
>> the "what am I really?" tests are often written rather inefficiently.
>>
>> For example, to go from an IPv4 socket address to an a.b.c.d textual
>> representation of that address, the current code may allocate,
>> initialize, use, and discard several sockaddr_in6 structures, do a dozen
>> of sockaddr_in6 structure scans for certain values, and copy parts of
>> sockaddr_in and sockaddr_in6 structures a few times a long the way to
>> the ntoa() call. After the ntoa() call, scan the resulting string to
>> find its end.
>>
>> The old code equivalent of the above? Initialize one IPv4 sockaddr_in
>> structure on stack and pass it to ntoa()!
>>
>>
>> I see three ways to optimize this without breaking correctness:
>>
>> * one-struct: Keep using a single sockaddr_in6 data member and convert
>> everything into an IPv6 socket address as we do now. Optimize each
>> conversion step, remove repeated "who am I?" checks during each step,
>> and optimize each check itself.
>
> All of the storage methods require them to identify the type accessed.
> The only way to completely avoid them is to revert to v4-only or v6-only
> compile modes.
>
> Yes they seriously need some optimization and reduction.

Yes, some checks are unavoidable, of course. What I was referring to is
replacing the current

     if (I am vX) ... do a little ...
     if (I am vX) ... do a little ...
     if (I am vX) ... do a little ...
     if (I am vX) ... do a little ...

code with something closer to this:

     if (I am vX) ... do a lot ...

to the extent possible.

>> * union: Replace sockaddr_in6 storage with a union of sockaddr_in and
>> sockaddr_in6 structures (or their wrappers). The Ip::Address class will
>> check the address family part (that must overlap in all socket
>> addresses, by design) to know what the address is and then use the right
>> structure. This will avoid most conversions for the currently dominating
>> IPv4-to-IPv4 path. If the caller does need a different version of the
>> address than the one we stored, the "up" or "down" conversion is
>> unavoidable and is still be handled by the Ip::Address.
>
> Otherwise known as the sockaddr_storage.
> http://www.retran.com/beej/sockaddr_inman.html

Good point. I thought sockaddr_storage is deprecated or unpopular since
Squid code did not use it :-/. Apparently, that is not the case.

Sockaddr_storage is the right abstraction for this option but it comes
with a performance penalty:

     sizeof(sockaddr_storage) == 128
     sizeof(sockaddr_in6) == 28
     sizeof(sockaddr_in) == 16

I do not care much about pure RAM overhead those 100 extra bytes bring,
but initializing and copying them is one of those thousand cuts, IMO. We
may be able to minimize initialization and copying overhead if we use
custom constructors and assignment operators that only initialize and
copy what is actually used. We will try to test that theory.

I also wanted to wrap version-specific sockaddr_in* union members into
C++ classes that provide version-specific and version-optimized handling
of their sockaddr_in* structure. This will, in part, minimize repeated
"what am I?" checks. It may be possible to do that with a single
sockaddr_storage member, but it would require placement-new which may
have performance overheads. We will test that as well.

If placement-new has overheads, we can have a union of three structs:
sockaddr_storage, sockaddr_in, sockaddr_in6.

There is another reason we may have to deal with large sockaddr
structures as discussed below.

> FWIW: To convert to this the stages are:
> * change the type to sockaddr_storage
> * re-add maintenance stages to keep the sa_family field accurate (these
> were originally intended but dropped to avoid bugs caused by and hidden
> by premature-optimization).
> * add casts as required to make the correct sockaddr_* type fields visible.
>
> This makes the IsIP*() tests faster single-byte comparisons and the
> IPv4() output faster.

Looking at the current code, I am not sure single-byte family
comparisons will be always possible. This is one of the secondary
questions I had. Currently, we check for "any address", "no address",
and "mapped address" to answer the "What kind of address am I?" question.

* Can we use zero family for "any address" and FF family for "no
address"? We can add a data member to store this info if using family is
wrong, but I hope it is not necessary.

* Do we need to check for mapped addresses? Or do we change the code to
convert as soon as we are given a mapped address, to avoid mapping
checks? Or do we add a data member to cache the mapping information for
faster checks?

>> * two-struct: Similar to union, but keeping two separate structures, not
>> merged in a union. This costs more memory and whole-object copying
>> cycles, but allows us to keep both native and converted versions of the
>> addresses. I do not know whether that is useful.
>
> IMO: two-struct is just a waste of memory. We would have to do the same
> or less maintenance for other methods. We have no situations where the
> one object has to hold both types.
>
>
> For the record there is a fourth option that could be considered:
>
> * addrinfo: This is a descriptor to a dynamically allocated sockaddr_in
> or sockaddri_in6 with all the what-do-i-hold? details we would need to
> maintain for a union or two-struct designs anyway.
>
> Overall I think this is slightly worse than union though. The size of
> addrinfo in addition to the sockaddr it points to has little gain over
> two-struct. With extra maintenance to keep the addrinfo type info
> accurate in duplicate to the start of sockaddr.

IMO, dynamic allocation and deallocation of addrinfo members is one of
those thousand cuts. I may be wrong, but I think the current code
misuses addrinfo and getaddrinfo() function, with measurable performance
penalties. I hope we can restrict if not eliminate their use.

>> The "one struct" approach is simpler to implement given the current code
>> state, but the "union" approach should be faster as long as IPv4-to-IPv4
>> paths are common. The union approach may also yield overall simpler code
>> as there will be very few "what am I know?" checks and concerns. The
>> two-struct approach attempts to minimize back-and-forth conversion costs
>> by giving access to both v4 and v6 versions at the same time.
>>
>> FWIW, on a conceptual level, boost.asio library uses a union approach
>> (only one address version is valid per Address object) while their
>> implementation uses the two-struct approach (there is a place to store
>> both IPv4 and IPv6 versions). This does not surprise me much because
>> this Boost library does not seem to care about RAM/copying overheads.
>>
>> Interestingly enough, boost.asio does not support transparent v4-v6
>> conversions (AFAICT): To do a conversion, the calling code needs to know
>> what IP version the generic address is storing, get that
>> version-specific address, and then ask it to convert to another version.
>> Perhaps that means that nobody needs a general "give me IPv6 regardless
>> of what you really have" API?
>> https://svn.boost.org/trac/boost/browser/trunk/boost/asio/ip/address.hpp
>>
>>
>> There are some secondary optimizations and API changes that I would like
>> to discuss, but let's first see if there is a consensus which overall
>> design approach is the way to go. There may be more options than the
>> three outlined above.
>>
>
> Phew, I was worried you had uncovered some major fatal flaw in the
> design. So it seems just the expected culprits which were not optimized
> for testing and development simplicity are the ones to die.

I am glad you do not consider the implied changes significant, although
I am afraid you underestimate the total amount of the work required to
optimize the implementation and polish the API. :-)

> I think we go with union. The sockaddr_storage type and its usage is
> relatively standard and can be passed directly to the stack, or on some
> API via an addrinfo wrapper pointing at it.

I too favor the union or sockaddr_storage approach. Let's wait for
others to chime in though as this is an important decision.

> PS: If you are still wanting to unify the sockaddr_ip* with sockaddr_un
> addresses we can do it easily with union {sockaddr_storage + padding;
> sockaddr_un;} and keep all the existing v4/v6 code unchanged due to the
> way it uses the initial sockaddr* common fields.

This is a very good question; thank you for reminding me of Unix Domain
Socket needs!

We can easily add sockaddr_un to the proposed union. No extra
padding/etc is necessary as far as I can tell. However, sockaddr_un is
110 bytes long :-(. Is that kind of overhead worth the cleaner code?
Perhaps micro-level performance tests of optimized
constructors/assignments will help answer that question.

If we do end up adding sockaddr_un to the union, then we may want to add
sockaddr_storage as well because it is just a few bytes longer (or just
use sockaddr_storage alone).

Thank you,

Alex.
Received on Fri Nov 19 2010 - 17:31:10 MST

This archive was generated by hypermail 2.2.0 : Sat Nov 20 2010 - 12:00:05 MST