You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Much like the trailing and leading zero count operations, the population
count (popcount) operation has a built-in function within gcc. Unlike
these other two operations, which are implemented in terms of these
built-ins, the gcc backend implements popcount in terms of a custom
implementation of the operation entirely separate from the built-ins
gcc provides. This has lead to poor codegen in some circumstances.
For instance, the gcc backend of rustc currently emits the following for
a function that implements popcount for a u32 (x86_64 targeting AVX2,
using standard unix calling convention):
popcount:
mov eax, edi
and edi, 1431655765
shr eax
and eax, 1431655765
add edi, eax
mov edx, edi
and edi, 858993459
shr edx, 2
and edx, 858993459
add edx, edi
mov eax, edx
and edx, 252645135
shr eax, 4
and eax, 252645135
add eax, edx
mov edx, eax
and eax, 16711935
shr edx, 8
and edx, 16711935
add edx, eax
movzx eax, dx
shr edx, 16
add eax, edx
ret
Rather than using this implementation, gcc could be told to use these
built-in functions. This would give the same function the following
implementation:
popcount:
mov eax, edi
popcnt rax, rax
ret
This patch implements the popcount operation in terms of gcc's built-ins
in all cases, not just the 128-bit case.
Signed-off-by: Andy Sadler <[email protected]>
0 commit comments