Ip::Address performance from Alex Rousskov on 2010-11-18 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Thu, 18 Nov 2010 13:27:33 -0700

Hello,

This email summarizes my questions and suggestions related to
optimizing Ip::Address implementation.

Disclaimer: I am writing from general performance and API principles
point of view. I am not an IP expert and may miss some critical
portability concerns, among other things. I am counting on Amos and
others to audit any ideas worth considering.

Ip::Address is actually a socket address. Besides the IP address itself,
the class maintains the socket address family and port (at least).

Currently, Ip::Address is designed around a single sockaddr_in6 data
member that holds the address family, socket port, the IP address
itself, and some extra info. This is 28 bytes of storage, most of which
are used in case of an IPv6 address.

If we were to deal with IPv4 addresses only, the corresponding "native"
IPv4 sockaddr_in structure takes 16 bytes, and only the first 8 of which
are normally used.

Currently, Ip::Address implementation strategy can be summarized as
"convert any input into a full-blown IPv6 address as needed, store that
IPv6 address, and then convert that IPv6 address into whatever the code
actually needs". This is a simple, solid approach that probably helped
eliminate many early IPv6 implementation bugs.

Unfortunately, it is rather expensive performance-wise because the input
is usually not an IPv6 address and the output is usually not an IPv6
address either. While each single conversion (with the exception of
conversion from and to textual representation) is relatively cheap,
these conversions add up. Besides, the conversions themselves and even
the "what am I really?" tests are often written rather inefficiently.

For example, to go from an IPv4 socket address to an a.b.c.d textual
representation of that address, the current code may allocate,
initialize, use, and discard several sockaddr_in6 structures, do a dozen
of sockaddr_in6 structure scans for certain values, and copy parts of
sockaddr_in and sockaddr_in6 structures a few times a long the way to
the ntoa() call. After the ntoa() call, scan the resulting string to
find its end.

The old code equivalent of the above? Initialize one IPv4 sockaddr_in
structure on stack and pass it to ntoa()!

I see three ways to optimize this without breaking correctness:

* one-struct: Keep using a single sockaddr_in6 data member and convert
everything into an IPv6 socket address as we do now. Optimize each
conversion step, remove repeated "who am I?" checks during each step,
and optimize each check itself.

* union: Replace sockaddr_in6 storage with a union of sockaddr_in and
sockaddr_in6 structures (or their wrappers). The Ip::Address class will
check the address family part (that must overlap in all socket
addresses, by design) to know what the address is and then use the right
structure. This will avoid most conversions for the currently dominating
IPv4-to-IPv4 path. If the caller does need a different version of the
address than the one we stored, the "up" or "down" conversion is
unavoidable and is still be handled by the Ip::Address.

* two-struct: Similar to union, but keeping two separate structures, not
merged in a union. This costs more memory and whole-object copying
cycles, but allows us to keep both native and converted versions of the
addresses. I do not know whether that is useful.

The "one struct" approach is simpler to implement given the current code
state, but the "union" approach should be faster as long as IPv4-to-IPv4
paths are common. The union approach may also yield overall simpler code
as there will be very few "what am I know?" checks and concerns. The
two-struct approach attempts to minimize back-and-forth conversion costs
by giving access to both v4 and v6 versions at the same time.

FWIW, on a conceptual level, boost.asio library uses a union approach
(only one address version is valid per Address object) while their
implementation uses the two-struct approach (there is a place to store
both IPv4 and IPv6 versions). This does not surprise me much because
this Boost library does not seem to care about RAM/copying overheads.

Interestingly enough, boost.asio does not support transparent v4-v6
conversions (AFAICT): To do a conversion, the calling code needs to know
what IP version the generic address is storing, get that
version-specific address, and then ask it to convert to another version.
Perhaps that means that nobody needs a general "give me IPv6 regardless
of what you really have" API?
https://svn.boost.org/trac/boost/browser/trunk/boost/asio/ip/address.hpp

There are some secondary optimizations and API changes that I would like
to discuss, but let's first see if there is a consensus which overall
design approach is the way to go. There may be more options than the
three outlined above.

Thank you,

Alex.
Received on Thu Nov 18 2010 - 20:27:42 MST

This archive was generated by hypermail 2.2.0 : Fri Nov 19 2010 - 12:00:05 MST