Always inline byteswap functions #11280

c-a · 2014-01-02T22:46:47Z

After writing some benchmarks for ebml::reader::vuint_at() I noticed that LLVM doesn't seem to inline the from_be32 function even though it only does a call to the bswap32 intrinsic in the x86_64 case. Marking the functions with #[inline(always)] fixes that and seems to me a reasonable thing to do. I got the following measurements in my vuint_at() benchmarks:

Before
test ebml::bench::vuint_at_A_aligned ... bench: 1075 ns/iter (+/- 58)
test ebml::bench::vuint_at_A_unaligned ... bench: 1073 ns/iter (+/- 5)
test ebml::bench::vuint_at_D_aligned ... bench: 1150 ns/iter (+/- 5)
test ebml::bench::vuint_at_D_unaligned ... bench: 1151 ns/iter (+/- 6)
Inline from_be32
test ebml::bench::vuint_at_A_aligned ... bench: 769 ns/iter (+/- 9)
test ebml::bench::vuint_at_A_unaligned ... bench: 795 ns/iter (+/- 6)
test ebml::bench::vuint_at_D_aligned ... bench: 758 ns/iter (+/- 8)
test ebml::bench::vuint_at_D_unaligned ... bench: 759 ns/iter (+/- 8)
Using vuint_at_slow()
test ebml::bench::vuint_at_A_aligned ... bench: 646 ns/iter (+/- 7)
test ebml::bench::vuint_at_A_unaligned ... bench: 645 ns/iter (+/- 3)
test ebml::bench::vuint_at_D_aligned ... bench: 907 ns/iter (+/- 4)
test ebml::bench::vuint_at_D_unaligned ... bench: 1085 ns/iter (+/- 16)

As expected inlining from_be32() gave a considerable speedup.
I also tried how the "slow" version fared against the optimized version and noticed that it's
actually a bit faster for small A class integers (using only two bytes) but slower for big D class integers (using four bytes)

thestinger · 2014-01-02T22:48:26Z

They're not inlined because cross-crate inlining without link-time optimization requires an explicit #[inline] or #[inline(always)] in Rust.

alexcrichton · 2014-01-02T23:00:42Z

We generally try to shy away from inline(always). I believe the reason for the slowness previously was according to what @thestinger mentioned, so can you see if you get the same speedups if you mark these functions as #[inline] instead of #[inline(always)]?

c-a · 2014-01-02T23:33:12Z

We generally try to shy away from inline(always). I believe the reason for the slowness previously was according to what @thestinger mentioned, so can you see if you get the same speedups if you mark these functions as #[inline] instead of #[inline(always)]?

Yep seems to work with #[inline] too. Pushed a fixup commit that changes it to use #[inline] instead. (As a bonus it fixes the code to not exceed 100 columns too)

After writing some benchmarks for ebml::reader::vuint_at() I noticed that LLVM doesn't seem to inline the from_be32 function even though it only does a call to the bswap32 intrinsic in the x86_64 case. Marking the functions with #[inline(always)] fixes that and seems to me a reasonable thing to do. I got the following measurements in my vuint_at() benchmarks: - Before test ebml::bench::vuint_at_A_aligned ... bench: 1075 ns/iter (+/- 58) test ebml::bench::vuint_at_A_unaligned ... bench: 1073 ns/iter (+/- 5) test ebml::bench::vuint_at_D_aligned ... bench: 1150 ns/iter (+/- 5) test ebml::bench::vuint_at_D_unaligned ... bench: 1151 ns/iter (+/- 6) - Inline from_be32 test ebml::bench::vuint_at_A_aligned ... bench: 769 ns/iter (+/- 9) test ebml::bench::vuint_at_A_unaligned ... bench: 795 ns/iter (+/- 6) test ebml::bench::vuint_at_D_aligned ... bench: 758 ns/iter (+/- 8) test ebml::bench::vuint_at_D_unaligned ... bench: 759 ns/iter (+/- 8) - Using vuint_at_slow() test ebml::bench::vuint_at_A_aligned ... bench: 646 ns/iter (+/- 7) test ebml::bench::vuint_at_A_unaligned ... bench: 645 ns/iter (+/- 3) test ebml::bench::vuint_at_D_aligned ... bench: 907 ns/iter (+/- 4) test ebml::bench::vuint_at_D_unaligned ... bench: 1085 ns/iter (+/- 16) As expected inlining from_be32() gave a considerable speedup. I also tried how the "slow" version fared against the optimized version and noticed that it's actually a bit faster for small A class integers (using only two bytes) but slower for big D class integers (using four bytes)

[new_without_default]: include `where` clause in suggestions, make applicable changelog: [`new_without_default`]: include `where` clause in suggestions

c-a added 2 commits January 2, 2014 23:22

libextra: Add benchmarks for ebml::reader::vuint_at()

3250e65

libstd: Always inline all byteswap functions

1749d61

fixup! libstd: Always inline all byteswap functions

a82f32b

c-a closed this Jan 2, 2014

c-a reopened this Jan 2, 2014

bors closed this Jan 4, 2014

bors merged commit a82f32b into rust-lang:master Jan 4, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Always inline byteswap functions #11280

Always inline byteswap functions #11280

Uh oh!

c-a commented Jan 2, 2014

Uh oh!

thestinger commented Jan 2, 2014

Uh oh!

alexcrichton commented Jan 2, 2014

Uh oh!

c-a commented Jan 2, 2014

Uh oh!

Uh oh!

Always inline byteswap functions #11280

Always inline byteswap functions #11280

Uh oh!

Conversation

c-a commented Jan 2, 2014

Uh oh!

thestinger commented Jan 2, 2014

Uh oh!

alexcrichton commented Jan 2, 2014

Uh oh!

c-a commented Jan 2, 2014

Uh oh!

Uh oh!