You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[AArch64] Improve scalar and Neon popcount with SVE CNT. (#143870)
When available, we can use SVE's CNT instruction to improve the lowering
of scalar and fixed-length popcount (CTPOP) since the SVE instruction
supports types that the Neon variant doesn't.
For the scalar types, I see the following speedups on NVIDIA Grace CPU:
| size (bits) | before (Gibit/s) | after (Gibit/s) | speedup |
|------------:|-----------------:|----------------:|--------:|
| 32 | 75.20 | 86.79 | 1.15 |
| 64 | 149.87 | 173.70 | 1.16 |
| 128 | 158.56 | 164.88 | 1.04 |
0 commit comments