-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AArch64] Mark AESD and AESE instructions as commutative. #83390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: David Green (davemgreen) ChangesThis come from https://discourse.llvm.org/t/combining-aes-and-xor-can-be-improved-further/77248. These instructions start out with:
The initial XOR means that they can be treated as commutative, removing some of the unnecessary mov's introduced during register allocation. Full diff: https://github.com/llvm/llvm-project/pull/83390.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index b01a8cd00025f8..0fc91be1ad56d2 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -8216,8 +8216,10 @@ defm ST4 : SIMDLdSt4SingleAliases<"st4">;
//----------------------------------------------------------------------------
let Predicates = [HasAES] in {
+let isCommutable = 1 in {
def AESErr : AESTiedInst<0b0100, "aese", int_aarch64_crypto_aese>;
def AESDrr : AESTiedInst<0b0101, "aesd", int_aarch64_crypto_aesd>;
+}
def AESMCrr : AESInst< 0b0110, "aesmc", int_aarch64_crypto_aesmc>;
def AESIMCrr : AESInst< 0b0111, "aesimc", int_aarch64_crypto_aesimc>;
}
diff --git a/llvm/test/CodeGen/AArch64/aes.ll b/llvm/test/CodeGen/AArch64/aes.ll
index 2bef28de895baf..386114f4a0d79d 100644
--- a/llvm/test/CodeGen/AArch64/aes.ll
+++ b/llvm/test/CodeGen/AArch64/aes.ll
@@ -16,8 +16,7 @@ define <16 x i8> @aese(<16 x i8> %a, <16 x i8> %b) {
define <16 x i8> @aese_c(<16 x i8> %a, <16 x i8> %b) {
; CHECK-LABEL: aese_c:
; CHECK: // %bb.0:
-; CHECK-NEXT: aese v1.16b, v0.16b
-; CHECK-NEXT: mov v0.16b, v1.16b
+; CHECK-NEXT: aese v0.16b, v1.16b
; CHECK-NEXT: ret
%r = call <16 x i8> @llvm.aarch64.crypto.aese(<16 x i8> %b, <16 x i8> %a)
ret <16 x i8> %r
@@ -35,8 +34,7 @@ define <16 x i8> @aesd(<16 x i8> %a, <16 x i8> %b) {
define <16 x i8> @aesd_c(<16 x i8> %a, <16 x i8> %b) {
; CHECK-LABEL: aesd_c:
; CHECK: // %bb.0:
-; CHECK-NEXT: aesd v1.16b, v0.16b
-; CHECK-NEXT: mov v0.16b, v1.16b
+; CHECK-NEXT: aesd v0.16b, v1.16b
; CHECK-NEXT: ret
%r = call <16 x i8> @llvm.aarch64.crypto.aesd(<16 x i8> %b, <16 x i8> %a)
ret <16 x i8> %r
|
@iucoen FYI - I didn't seem to be able to add you as a reviewer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is correct. Reading ArmARM I can see that both AESE and AESD seem to perform an exclusive OR on their arguments before applying shift-rows/sub-bytes on the result.
This come from https://discourse.llvm.org/t/combining-aes-and-xor-can-be-improved-further/77248. These instructions start out with: XOR Vd, Vn <some complicated math> The initial XOR means that they can be treated as commutative, removing some of the unnecessary mov's introduced during register allocation.
fdef355
to
3c2905d
Compare
Thanks. I forgot to update this aes fusion test, which was showing differences in the assembly than before (it is a bit difficult to update, often being different between different cpus). |
32 bit ARM NEON also has the same instruction. Should be applied there as well. |
Yeah thanks, will do. We want keep them as separated patches though, for the different backends. |
Similar to #83390, this marks AESD and AESE as commutative, as the logic of the instructions starts as a XOR between the two operands.
This come from https://discourse.llvm.org/t/combining-aes-and-xor-can-be-improved-further/77248.
These instructions start out with:
The initial XOR means that they can be treated as commutative, removing some of the unnecessary mov's introduced during register allocation.