-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[stdlib] Improve performance of heterogeneous binary integer ==
and <
#63034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Improve performance of heterogeneous binary integer ==
and <
#63034
Conversation
==
and <
==
and <
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
In the implementation comments, I'm not sure that "bitcasting" or "bitcast operation" are the correct terms. Could the |
What would you suggest? I’d like something more succinct than “constructing a value of
It could, just as most if not all uses of literal I’d rather the workaround until then be (a) visually thorny rather than potentially appear like a stylistic preference; (b) generally applicable for all values—we can’t very well write |
https://github.com/apple/swift/blob/release/5.7.0/stdlib/public/core/Integers.swift#L419-L425 |
4f5380a
to
563c682
Compare
563c682
to
f671e50
Compare
This comment was marked as outdated.
This comment was marked as outdated.
@swift-ci benchmark |
==
and <
==
and <
This PR also apparently affects the temporary allocation codegen test on macOS, which I'll need to update I guess.
|
Most of the benchmark regressions are marked with |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
The string comparison benchmarks are consistently showing up, so it'll take some manual inspection of the generated code to know if it's noise or real. |
@swift-ci please build toolchain |
Can confirm identical SIL is generated for all benchmarks in |
This comment was marked as outdated.
This comment was marked as outdated.
ce54c11
to
263f611
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
3 similar comments
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
7b720eb
to
e19bb87
Compare
@grynspan I've had to make edits to the temporary allocation codegen test to keep CI happy. In the IR, the relative order in which basic blocks for stack versus heap allocation are emitted for the large allocation test is arbitrary, but the existing test only accepts one specific order. With this PR, the basic blocks are now omitted in a different order on macOS because the generated code (rightly) has some different inlining choices earlier in the file with respect to a comparison operation with a randomly generated fixed-width integer. To make the test robust to arbitrary changes in the order of basic blocks, I've changed certain "CHECK"s to "CHECK-DAG"s. At some point in the past decade, folks proposed a "CHECK-LABEL-DAG" feature to be added to FileCheck, but unfortunately it didn't gain traction and it doesn't appear possible to check entire basic blocks being reordered relative to one another in one invocation of FileCheck, just single lines. Therefore, I've also changed the test to invoke FileCheck three times, with the first run testing for presence of the expected branch instructions, and the latter two specifically testing either the heap allocation IR or the stack allocation IR. Note that the test as it currently exists on Let me know if this change looks at least reasonable to you. I'd like to defer any detailed tinkering to you or others as it's not salient to this PR, but some degree of change was required to unblock this work and I figured I'd at least make a real effort to leave it better than I found it. |
@swift-ci test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't object to the changes to the temp-alloc unit tests, so long as they continue to pass. ;)
I'm planning to rework these further, but this seems like a reasonable start... |
One style nit, otherwise LGTM. |
@swift-ci test and merge |
We are seeing multiple aarch64 bots failing:
https://ci.swift.org/job/oss-swift-package-ubuntu-20_04-aarch64//1442/console cc: @xwu @stephentyrone |
🫤 this is what I get for trying to unblock things by fixing rather than just disabling the test. Most of the tests weren't actually being checked and my good deed was properly enabling it, but looks like cc @grynspan |
Incidentally, @shahmishal, given CI passed multiple times, was there any way to spot this issue pre-emptively? |
As it turns out, we can simplify this logic quite a bit and get more optimal generated code in the process:
In the first iteration of this PR, we unconditionally apply the "roundtripping" logic used in the existing code to widen a comparand without checking the instance property
bitWidth
:rhs
can be represented as a value of typeSelf
, or iflhs
can be represented as a value of typeOther
, then the comparison is straightforward.lhs
andrhs
is smaller just by knowing which ofSelf
orOther
is a signed type (see comments in the code).This change allows Swift to generate the same, optimal code (at
-O
) for the following input as reported in #62260, which isn't the case today:However, because the roundtripping logic has to start with either the left-hand side or the right-hand side, the same optimal output can't be obtained when we swap the parameters above—i.e., when
x
is of typeUInt32
andy
is of typeInt
. The generated code is still better than the status quo, but it's not the best possible.To address this issue, in the second iteration of this PR, we compare comparand bit widths explicitly, but only once along each code path in case it's expensive to compute for values of a hypothetical arbitrary-width type. (I'd expect this should be negligible as compared to the widening operation that follows anyway.)
For fixed-width integer types, this approach produces optimal code generation for heterogeneous comparison of any two types. This is because we defer comparison to zero until we reach the only scenario where it's required (when a signed and therefore possibly negative value is being compared to an unsigned value of greater bit width) and otherwise stick to operations that can be optimized away in the concrete case.
This change apparently speeds up generic floating-point conversion severalfold, improves code size for a number of integer and floating-point algorithms, and even shrinks
libswiftCore.dylib
code size by 2%. It also revealed brittleness in an IRGen test which has been modified as a result to improve robustness.Resolves #62260.