You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, LLVM lowers a cttz8 on x86_64 to these instructions:
```asm
movzbl %dil, %eax
bsfl %eax, %eax
movl $32, %ecx
cmovnel %eax, %ecx
cmpl $32, %ecx
movl $8, %eax
cmovnel %ecx, %eax
```
To improve the codegen, we can zero extend the 8 bit integer, then set
bit 8 and perform a cttz operation on the extended value. That way
there's no conditional operation involved at all.
This was discovered by this benchmark: https://github.com/Kimundi/long_strings_without_repeats
Timings on my box with the current nightly:
```
running 4 tests
test bench_cpp_naive_big ... bench: 5479222 ns/iter (+/- 254222)
test bench_noop_big ... bench: 571405 ns/iter (+/- 111950)
test bench_rust_naive_big ... bench: 7798102 ns/iter (+/- 148841)
test bench_rust_unsafe_big ... bench: 6606488 ns/iter (+/- 67529)
```
Timings with the patch applied:
```
running 4 tests
test bench_cpp_naive_big ... bench: 5470944 ns/iter (+/- 7109)
test bench_noop_big ... bench: 568944 ns/iter (+/- 6895)
test bench_rust_naive_big ... bench: 6795901 ns/iter (+/- 43806)
test bench_rust_unsafe_big ... bench: 5584879 ns/iter (+/- 5291)
```
0 commit comments