Re: Ip::Address performance from Pieter De Wit on 2010-11-19 (squid-dev)

From: Pieter De Wit <pieter_at_insync.za.net>
Date: Sat, 20 Nov 2010 09:04:44 +1300

Hey Guys,

I am joining the thread a bit late and I might miss a few things, but
wouldn't this be a simple solution:

(From memory, this is pretty close to the "union" one Alex suggested)

NB : There is a very very small chance that an IPv6 address can be
converted into an IPv4 address, from memory, it's the 2001:: addresses,
but we can google that to confirm.

* We have a struct that contains both the IPv4 and IPv6 struct
* When an IPv4 address is needed, it's stored in the IPv4 struct (same
with IPv6)
* If we have an IPv4 address and we need to return an IPv6 address the
conversion is done once, then the result is stored in the struct (IMO, a
simple 'if (struct.we_have_ipv6==1) ...' should do the trick

The other side of this is that we have 3 methods for returning the address:

* Get_IPv6 ();
* Get_IPv4 ();
* Get_IP ();

The purpose of 'Get_IP ()' is so that we can retrieve what we need, with
a simple command and not really care about what is stored. As an example:

* if we only have an IPv6 address for someone, Get_IP will return that,
otherwise IPv4, or an error if we have nothing.
* if we have both IPv4 and 6, and we can use both, then for the time, we
"prefer" IPv4, and we return that
* Get_IPv4 will return an error if the IPv6 address can't be converted
(but really, I can't see the need for this, just covering bases)

IPv6 is gaining speed, but it's not as well connected as we would have
like to see. The point is that squid, imho, should still default to the
IPv4 address. There can be factors that change this (config file/checks
at startup - like no ipv4 address ?/etc), but these will only effect
'Get_IP' and the checks are 'light' or limited startup/config re-read code.

As a side note, the two struct can be populated by the resolver.

Hope I am not way off track here :)

Cheers,

Pieter

On 20/11/2010 06:31, Alex Rousskov wrote:
> On 11/19/2010 12:06 AM, Amos Jeffries wrote:
>> On 19/11/10 09:27, Alex Rousskov wrote:
>>> Hello,
>>>
>>> This email summarizes my questions and suggestions related to
>>> optimizing
>>> Ip::Address implementation.
>>>
>>> Disclaimer: I am writing from general performance and API principles
>>> point of view. I am not an IP expert and may miss some critical
>>> portability concerns, among other things. I am counting on Amos and
>>> others to audit any ideas worth considering.
>>>
>>>
>>> Ip::Address is actually a socket address. Besides the IP address
>>> itself,
>>> the class maintains the socket address family and port (at least).
>>>
>>> Currently, Ip::Address is designed around a single sockaddr_in6 data
>>> member that holds the address family, socket port, the IP address
>>> itself, and some extra info. This is 28 bytes of storage, most of which
>>> are used in case of an IPv6 address.
>>>
>>> If we were to deal with IPv4 addresses only, the corresponding "native"
>>> IPv4 sockaddr_in structure takes 16 bytes, and only the first 8 of
>>> which
>>> are normally used.
>>>
>>> Currently, Ip::Address implementation strategy can be summarized as
>>> "convert any input into a full-blown IPv6 address as needed, store that
>>> IPv6 address, and then convert that IPv6 address into whatever the code
>>> actually needs". This is a simple, solid approach that probably helped
>>> eliminate many early IPv6 implementation bugs.
>>>
>>> Unfortunately, it is rather expensive performance-wise because the
>>> input
>>> is usually not an IPv6 address and the output is usually not an IPv6
>>> address either. While each single conversion (with the exception of
>>> conversion from and to textual representation) is relatively cheap,
>>> these conversions add up. Besides, the conversions themselves and even
>>> the "what am I really?" tests are often written rather inefficiently.
>>>
>>> For example, to go from an IPv4 socket address to an a.b.c.d textual
>>> representation of that address, the current code may allocate,
>>> initialize, use, and discard several sockaddr_in6 structures, do a
>>> dozen
>>> of sockaddr_in6 structure scans for certain values, and copy parts of
>>> sockaddr_in and sockaddr_in6 structures a few times a long the way to
>>> the ntoa() call. After the ntoa() call, scan the resulting string to
>>> find its end.
>>>
>>> The old code equivalent of the above? Initialize one IPv4 sockaddr_in
>>> structure on stack and pass it to ntoa()!
>>>
>>>
>>> I see three ways to optimize this without breaking correctness:
>>>
>>> * one-struct: Keep using a single sockaddr_in6 data member and convert
>>> everything into an IPv6 socket address as we do now. Optimize each
>>> conversion step, remove repeated "who am I?" checks during each step,
>>> and optimize each check itself.
>>
>> All of the storage methods require them to identify the type accessed.
>> The only way to completely avoid them is to revert to v4-only or v6-only
>> compile modes.
>>
>> Yes they seriously need some optimization and reduction.
>
> Yes, some checks are unavoidable, of course. What I was referring to
> is replacing the current
>
> if (I am vX) ... do a little ...
> if (I am vX) ... do a little ...
> if (I am vX) ... do a little ...
> if (I am vX) ... do a little ...
>
> code with something closer to this:
>
> if (I am vX) ... do a lot ...
>
> to the extent possible.
>
>>> * union: Replace sockaddr_in6 storage with a union of sockaddr_in and
>>> sockaddr_in6 structures (or their wrappers). The Ip::Address class will
>>> check the address family part (that must overlap in all socket
>>> addresses, by design) to know what the address is and then use the
>>> right
>>> structure. This will avoid most conversions for the currently
>>> dominating
>>> IPv4-to-IPv4 path. If the caller does need a different version of the
>>> address than the one we stored, the "up" or "down" conversion is
>>> unavoidable and is still be handled by the Ip::Address.
>>
>> Otherwise known as the sockaddr_storage.
>> http://www.retran.com/beej/sockaddr_inman.html
>
> Good point. I thought sockaddr_storage is deprecated or unpopular
> since Squid code did not use it :-/. Apparently, that is not the case.
>
> Sockaddr_storage is the right abstraction for this option but it comes
> with a performance penalty:
>
> sizeof(sockaddr_storage) == 128
> sizeof(sockaddr_in6) == 28
> sizeof(sockaddr_in) == 16
>
> I do not care much about pure RAM overhead those 100 extra bytes
> bring, but initializing and copying them is one of those thousand
> cuts, IMO. We may be able to minimize initialization and copying
> overhead if we use custom constructors and assignment operators that
> only initialize and copy what is actually used. We will try to test
> that theory.
>
> I also wanted to wrap version-specific sockaddr_in* union members into
> C++ classes that provide version-specific and version-optimized
> handling of their sockaddr_in* structure. This will, in part, minimize
> repeated "what am I?" checks. It may be possible to do that with a
> single sockaddr_storage member, but it would require placement-new
> which may have performance overheads. We will test that as well.
>
> If placement-new has overheads, we can have a union of three structs:
> sockaddr_storage, sockaddr_in, sockaddr_in6.
>
> There is another reason we may have to deal with large sockaddr
> structures as discussed below.
>
>
>> FWIW: To convert to this the stages are:
>> * change the type to sockaddr_storage
>> * re-add maintenance stages to keep the sa_family field accurate (these
>> were originally intended but dropped to avoid bugs caused by and hidden
>> by premature-optimization).
>> * add casts as required to make the correct sockaddr_* type fields
>> visible.
>>
>> This makes the IsIP*() tests faster single-byte comparisons and the
>> IPv4() output faster.
>
> Looking at the current code, I am not sure single-byte family
> comparisons will be always possible. This is one of the secondary
> questions I had. Currently, we check for "any address", "no address",
> and "mapped address" to answer the "What kind of address am I?" question.
>
> * Can we use zero family for "any address" and FF family for "no
> address"? We can add a data member to store this info if using family
> is wrong, but I hope it is not necessary.
>
> * Do we need to check for mapped addresses? Or do we change the code
> to convert as soon as we are given a mapped address, to avoid mapping
> checks? Or do we add a data member to cache the mapping information
> for faster checks?
>
>
>>> * two-struct: Similar to union, but keeping two separate structures,
>>> not
>>> merged in a union. This costs more memory and whole-object copying
>>> cycles, but allows us to keep both native and converted versions of the
>>> addresses. I do not know whether that is useful.
>>
>> IMO: two-struct is just a waste of memory. We would have to do the same
>> or less maintenance for other methods. We have no situations where the
>> one object has to hold both types.
>>
>>
>> For the record there is a fourth option that could be considered:
>>
>> * addrinfo: This is a descriptor to a dynamically allocated sockaddr_in
>> or sockaddri_in6 with all the what-do-i-hold? details we would need to
>> maintain for a union or two-struct designs anyway.
>>
>> Overall I think this is slightly worse than union though. The size of
>> addrinfo in addition to the sockaddr it points to has little gain over
>> two-struct. With extra maintenance to keep the addrinfo type info
>> accurate in duplicate to the start of sockaddr.
>
> IMO, dynamic allocation and deallocation of addrinfo members is one of
> those thousand cuts. I may be wrong, but I think the current code
> misuses addrinfo and getaddrinfo() function, with measurable
> performance penalties. I hope we can restrict if not eliminate their use.
>
>
>>> The "one struct" approach is simpler to implement given the current
>>> code
>>> state, but the "union" approach should be faster as long as
>>> IPv4-to-IPv4
>>> paths are common. The union approach may also yield overall simpler
>>> code
>>> as there will be very few "what am I know?" checks and concerns. The
>>> two-struct approach attempts to minimize back-and-forth conversion
>>> costs
>>> by giving access to both v4 and v6 versions at the same time.
>>>
>>> FWIW, on a conceptual level, boost.asio library uses a union approach
>>> (only one address version is valid per Address object) while their
>>> implementation uses the two-struct approach (there is a place to store
>>> both IPv4 and IPv6 versions). This does not surprise me much because
>>> this Boost library does not seem to care about RAM/copying overheads.
>>>
>>> Interestingly enough, boost.asio does not support transparent v4-v6
>>> conversions (AFAICT): To do a conversion, the calling code needs to
>>> know
>>> what IP version the generic address is storing, get that
>>> version-specific address, and then ask it to convert to another
>>> version.
>>> Perhaps that means that nobody needs a general "give me IPv6 regardless
>>> of what you really have" API?
>>> https://svn.boost.org/trac/boost/browser/trunk/boost/asio/ip/address.hpp
>>>
>>>
>>>
>>> There are some secondary optimizations and API changes that I would
>>> like
>>> to discuss, but let's first see if there is a consensus which overall
>>> design approach is the way to go. There may be more options than the
>>> three outlined above.
>>>
>>
>> Phew, I was worried you had uncovered some major fatal flaw in the
>> design. So it seems just the expected culprits which were not optimized
>> for testing and development simplicity are the ones to die.
>
> I am glad you do not consider the implied changes significant,
> although I am afraid you underestimate the total amount of the work
> required to optimize the implementation and polish the API. :-)
>
>
>> I think we go with union. The sockaddr_storage type and its usage is
>> relatively standard and can be passed directly to the stack, or on some
>> API via an addrinfo wrapper pointing at it.
>
> I too favor the union or sockaddr_storage approach. Let's wait for
> others to chime in though as this is an important decision.
>
>
>> PS: If you are still wanting to unify the sockaddr_ip* with sockaddr_un
>> addresses we can do it easily with union {sockaddr_storage + padding;
>> sockaddr_un;} and keep all the existing v4/v6 code unchanged due to the
>> way it uses the initial sockaddr* common fields.
>
> This is a very good question; thank you for reminding me of Unix
> Domain Socket needs!
>
> We can easily add sockaddr_un to the proposed union. No extra
> padding/etc is necessary as far as I can tell. However, sockaddr_un is
> 110 bytes long :-(. Is that kind of overhead worth the cleaner code?
> Perhaps micro-level performance tests of optimized
> constructors/assignments will help answer that question.
>
> If we do end up adding sockaddr_un to the union, then we may want to
> add sockaddr_storage as well because it is just a few bytes longer (or
> just use sockaddr_storage alone).
>
>
> Thank you,
>
> Alex.
>
Received on Fri Nov 19 2010 - 20:04:56 MST

This archive was generated by hypermail 2.2.0 : Sat Nov 20 2010 - 12:00:05 MST