Re: boolean bit fields

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Thu, 24 Jan 2013 08:44:16 -0700

On 01/24/2013 02:43 AM, Amos Jeffries wrote:
> On 24/01/2013 7:51 p.m., Alex Rousskov wrote:
>> On 01/23/2013 07:05 PM, Amos Jeffries wrote:
>>> On 24/01/2013 7:20 a.m., Kinkie wrote:
>>>> the attached patch turns the unsigned int:1 flags in CachePeer to
>>>> bools.
>>
>>> Please retain the :1 bitmasking. My microbench is showing a consistent
>>> ~50ms speed gain on bitmasks over full bool, particularly when there are
>>> multiple bools in the structure. We also get some useful object size
>>> gains.
>> Hello,
>>
>> FYI: With g++ -O3, there is no measureable performance difference
>> between bool and bool:1 in my primitive tests (sources attached). I do
>> see that non-bool bit fields are consistently slower though ("foo:0"
>> below means type "foo" without bit fields; bool tests are repeated to
>> show result variance):

...

>> To me, it looks like bit fields in general may hurt performance where
>> memory composition is not important (as expected, I guess), and that
>> some compilers remove any difference between full and bit boolean with
>> -O3 (that surprised me).
>>
>> G++ assembly source comparison seem to confirm that -- boolean-based
>> full and bit assembly sources are virtually identical with -O3 and newer
>> g++ versions, while bit fields show a lot more assembly operations with
>> -O0 (both diffs attached). Assembly is well beyond my expertise though.

> At -O3 G++ is optimizing for speed at expense of code size.
> -O2 is probably a better comparision level and AFAIK the preferred level
> for high-performance and small code size build.

Recent g++ versions optimize bitfield booleans to work as fast as full
booleans starting with -O1. There is no .asm difference among -O1, -O2,
and -O3 optimization levels. -O0 is the default. This means that your
test results contradict mine -- all my tests show consistently worse
bitfield performance when optimization is disabled (-O0).

Squid defaults to -O2 in most cases, right? If my test code is
representative, there will be no difference in compiled full and
bitfield code at that optimization level.

>>
>> Am I testing this wrong or is it a case of YMMV? If it is "YMMV", should
>> we err on the side of simplicity and use simple bool where memory
>> savings are not important or not existent?
>
> I think YMMV with the run-time measurement. I had to run the tests many
> times to get an average variance range on the speed even at 100M loops.
> Some runs the speed was 100ms out in the other direction, but only some,
> most were 50ms towards bool:1. And the results differed between flag
> position and struct with 1-byte length and struct with enough flags for
> 2-bytes.

FWIW, my test structure includes both int and bitfields. I have not
experimented with bitfield-only structures because they are not common
in Squid.

I cannot explain your results (especially since I do not know what your
test code is). The only thing that seems to be clear is that you are
using -O0 (default) which is not that relevant if we are comparing
performance of a typical Squid installation since g++ applies bool
optimizations starting with -O1 and most Squids are built with -O2.

> I did not have time to look at the ASM, thank you for the details there.
> If -O2 shows the same level of cycles reduction I think I will change my
> tack...

It does in my tests. I do not know how representative my test code is
though.

> we should be letting it handle the bitfields. BUT, we should still take
> care to arrange the flags and members such that -O has an easy job
> reducing them.

If we leave any :1 flags, we should also make sure that all of them are
using unsigned integers (or bool) as the base type. Using signed
integers leads to bugs as the difference in "final" checksum below
demonstrates (the final value should not change when bitfields are
enabled, but does change for signed integer types):

> int32_t:0 size: 20 final: -1085972333.443588956 iterations: 100M time: 0.784s
> int32_t:1 size: 12 final: 140980519.610307440 iterations: 100M time: 2.111s
> int8_t:0 size: 12 final: -1085972333.443588956 iterations: 100M time: 0.892s
> int8_t:1 size: 12 final: 140980519.610307440 iterations: 100M time: 2.111s
> bool:0 size: 12 final: -1085972333.443588956 iterations: 100M time: 0.848s
> bool:1 size: 12 final: -1085972333.443588956 iterations: 100M time: 0.848s

Thank you,

Alex.
Received on Thu Jan 24 2013 - 15:44:28 MST

This archive was generated by hypermail 2.2.0 : Thu Jan 24 2013 - 12:00:08 MST