-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AArch64][GlobalISel] Improve and expand fcopysign lowering #71283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-llvm-globalisel Author: David Green (davemgreen) ChangesThis alters the lowering of G_COPYSIGN to support vector types. The general idea is that we just lower it to vector operations using and/or and a mask, which are now converted to a BIF/BIT/BSP. In the process the existing AArch64LegalizerInfo::legalizeFCopySign can be removed, replying on expanding the scalar versions to vector instead, which just needs a small adjustment to allow widening scalars to vectors. With vector immediates now supported they are lowered to movi instructions, except for the f64 "negative zero", which was previously lowered as a fneg(mov 0), which can be added as a separate optimization. (But that hasn't been written yet, neither for SDAG or GlobalISel) Patch is 42.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/71283.diff 7 Files Affected:
diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
index 00d9f3f7c30c95f..304bdbe4bed0054 100644
--- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
@@ -5008,6 +5008,7 @@ LegalizerHelper::moreElementsVector(MachineInstr &MI, unsigned TypeIdx,
case TargetOpcode::G_FSUB:
case TargetOpcode::G_FMUL:
case TargetOpcode::G_FDIV:
+ case TargetOpcode::G_FCOPYSIGN:
case TargetOpcode::G_UADDSAT:
case TargetOpcode::G_USUBSAT:
case TargetOpcode::G_SADDSAT:
diff --git a/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp b/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
index 5b4e2b725e1dd76..7365d588283ee09 100644
--- a/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
@@ -269,14 +269,18 @@ MachineIRBuilder::buildDeleteTrailingVectorElements(const DstOp &Res,
LLT ResTy = Res.getLLTTy(*getMRI());
LLT Op0Ty = Op0.getLLTTy(*getMRI());
- assert((ResTy.isVector() && Op0Ty.isVector()) && "Non vector type");
- assert((ResTy.getElementType() == Op0Ty.getElementType()) &&
+ assert(Op0Ty.isVector() && "Non vector type");
+ assert(((ResTy.isScalar() && (ResTy == Op0Ty.getElementType())) ||
+ (ResTy.isVector() &&
+ (ResTy.getElementType() == Op0Ty.getElementType()))) &&
"Different vector element types");
- assert((ResTy.getNumElements() < Op0Ty.getNumElements()) &&
+ assert((ResTy.isScalar() || (ResTy.getNumElements() < Op0Ty.getNumElements())) &&
"Op0 has fewer elements");
- SmallVector<Register, 8> Regs;
auto Unmerge = buildUnmerge(Op0Ty.getElementType(), Op0);
+ if (ResTy.isScalar())
+ return buildCopy(Res, Unmerge.getReg(0));
+ SmallVector<Register, 8> Regs;
for (unsigned i = 0; i < ResTy.getNumElements(); ++i)
Regs.push_back(Unmerge.getReg(i));
return buildMergeLikeInstr(Res, Regs);
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 7edfa41d237836a..742581a90d5e778 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -1070,10 +1070,15 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
getActionDefinitionsBuilder({G_LROUND, G_LLROUND})
.legalFor({{s64, s32}, {s64, s64}});
- // TODO: Custom legalization for vector types.
// TODO: Custom legalization for mismatched types.
- // TODO: s16 support.
- getActionDefinitionsBuilder(G_FCOPYSIGN).customFor({{s32, s32}, {s64, s64}});
+ getActionDefinitionsBuilder(G_FCOPYSIGN)
+ .moreElementsIf(
+ [](const LegalityQuery &Query) { return Query.Types[0].isScalar(); },
+ [=](const LegalityQuery &Query) {
+ const LLT Ty = Query.Types[0];
+ return std::pair(0, LLT::fixed_vector(Ty == s16 ? 4 : 2, Ty));
+ })
+ .lower();
getActionDefinitionsBuilder(G_FMAD).lower();
@@ -1124,8 +1129,6 @@ bool AArch64LegalizerInfo::legalizeCustom(LegalizerHelper &Helper,
case TargetOpcode::G_MEMMOVE:
case TargetOpcode::G_MEMSET:
return legalizeMemOps(MI, Helper);
- case TargetOpcode::G_FCOPYSIGN:
- return legalizeFCopySign(MI, Helper);
case TargetOpcode::G_EXTRACT_VECTOR_ELT:
return legalizeExtractVectorElt(MI, MRI, Helper);
}
@@ -1829,66 +1832,6 @@ bool AArch64LegalizerInfo::legalizeMemOps(MachineInstr &MI,
return false;
}
-bool AArch64LegalizerInfo::legalizeFCopySign(MachineInstr &MI,
- LegalizerHelper &Helper) const {
- MachineIRBuilder &MIRBuilder = Helper.MIRBuilder;
- MachineRegisterInfo &MRI = *MIRBuilder.getMRI();
- Register Dst = MI.getOperand(0).getReg();
- LLT DstTy = MRI.getType(Dst);
- assert(DstTy.isScalar() && "Only expected scalars right now!");
- const unsigned DstSize = DstTy.getSizeInBits();
- assert((DstSize == 32 || DstSize == 64) && "Unexpected dst type!");
- assert(MRI.getType(MI.getOperand(2).getReg()) == DstTy &&
- "Expected homogeneous types!");
-
- // We want to materialize a mask with the high bit set.
- uint64_t EltMask;
- LLT VecTy;
-
- // TODO: s16 support.
- switch (DstSize) {
- default:
- llvm_unreachable("Unexpected type for G_FCOPYSIGN!");
- case 64: {
- // AdvSIMD immediate moves cannot materialize out mask in a single
- // instruction for 64-bit elements. Instead, materialize zero and then
- // negate it.
- EltMask = 0;
- VecTy = LLT::fixed_vector(2, DstTy);
- break;
- }
- case 32:
- EltMask = 0x80000000ULL;
- VecTy = LLT::fixed_vector(4, DstTy);
- break;
- }
-
- // Widen In1 and In2 to 128 bits. We want these to eventually become
- // INSERT_SUBREGs.
- auto Undef = MIRBuilder.buildUndef(VecTy);
- auto Zero = MIRBuilder.buildConstant(DstTy, 0);
- auto Ins1 = MIRBuilder.buildInsertVectorElement(
- VecTy, Undef, MI.getOperand(1).getReg(), Zero);
- auto Ins2 = MIRBuilder.buildInsertVectorElement(
- VecTy, Undef, MI.getOperand(2).getReg(), Zero);
-
- // Construct the mask.
- auto Mask = MIRBuilder.buildConstant(VecTy, EltMask);
- if (DstSize == 64)
- Mask = MIRBuilder.buildFNeg(VecTy, Mask);
-
- auto Sel = MIRBuilder.buildInstr(AArch64::G_BSP, {VecTy}, {Mask, Ins2, Ins1});
-
- // Build an unmerge whose 0th elt is the original G_FCOPYSIGN destination. We
- // want this to eventually become an EXTRACT_SUBREG.
- SmallVector<Register, 2> DstRegs(1, Dst);
- for (unsigned I = 1, E = VecTy.getNumElements(); I < E; ++I)
- DstRegs.push_back(MRI.createGenericVirtualRegister(DstTy));
- MIRBuilder.buildUnmerge(DstRegs, Sel);
- MI.eraseFromParent();
- return true;
-}
-
bool AArch64LegalizerInfo::legalizeExtractVectorElt(
MachineInstr &MI, MachineRegisterInfo &MRI, LegalizerHelper &Helper) const {
assert(MI.getOpcode() == TargetOpcode::G_EXTRACT_VECTOR_ELT);
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.h b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.h
index e6c9182da912dba..4925896db6d076b 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.h
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.h
@@ -60,7 +60,6 @@ class AArch64LegalizerInfo : public LegalizerInfo {
LegalizerHelper &Helper) const;
bool legalizeCTTZ(MachineInstr &MI, LegalizerHelper &Helper) const;
bool legalizeMemOps(MachineInstr &MI, LegalizerHelper &Helper) const;
- bool legalizeFCopySign(MachineInstr &MI, LegalizerHelper &Helper) const;
bool legalizeExtractVectorElt(MachineInstr &MI, MachineRegisterInfo &MRI,
LegalizerHelper &Helper) const;
const AArch64Subtarget *ST;
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-fcopysign.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-fcopysign.mir
index 86824127132da28..dd794b7af946689 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-fcopysign.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-fcopysign.mir
@@ -13,14 +13,18 @@ body: |
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: %val:_(s32) = COPY $s0
; CHECK-NEXT: %sign:_(s32) = COPY $s1
- ; CHECK-NEXT: [[DEF:%[0-9]+]]:_(<4 x s32>) = G_IMPLICIT_DEF
- ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
- ; CHECK-NEXT: [[IVEC:%[0-9]+]]:_(<4 x s32>) = G_INSERT_VECTOR_ELT [[DEF]], %val(s32), [[C]](s32)
- ; CHECK-NEXT: [[IVEC1:%[0-9]+]]:_(<4 x s32>) = G_INSERT_VECTOR_ELT [[DEF]], %sign(s32), [[C]](s32)
- ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 -2147483648
- ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[C1]](s32), [[C1]](s32), [[C1]](s32), [[C1]](s32)
- ; CHECK-NEXT: [[BSP:%[0-9]+]]:_(<4 x s32>) = G_BSP [[BUILD_VECTOR]], [[IVEC1]], [[IVEC]]
- ; CHECK-NEXT: %fcopysign:_(s32), %10:_(s32), %11:_(s32), %12:_(s32) = G_UNMERGE_VALUES [[BSP]](<4 x s32>)
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
+ ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR %val(s32), [[DEF]](s32)
+ ; CHECK-NEXT: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR %sign(s32), [[DEF]](s32)
+ ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 -2147483648
+ ; CHECK-NEXT: [[BUILD_VECTOR2:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[C]](s32), [[C]](s32)
+ ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 2147483647
+ ; CHECK-NEXT: [[BUILD_VECTOR3:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[C1]](s32), [[C1]](s32)
+ ; CHECK-NEXT: [[AND:%[0-9]+]]:_(<2 x s32>) = G_AND [[BUILD_VECTOR]], [[BUILD_VECTOR3]]
+ ; CHECK-NEXT: [[AND1:%[0-9]+]]:_(<2 x s32>) = G_AND [[BUILD_VECTOR1]], [[BUILD_VECTOR2]]
+ ; CHECK-NEXT: [[OR:%[0-9]+]]:_(<2 x s32>) = G_OR [[AND]], [[AND1]]
+ ; CHECK-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[OR]](<2 x s32>)
+ ; CHECK-NEXT: %fcopysign:_(s32) = COPY [[UV]](s32)
; CHECK-NEXT: $s0 = COPY %fcopysign(s32)
; CHECK-NEXT: RET_ReallyLR implicit $s0
%val:_(s32) = COPY $s0
@@ -41,14 +45,18 @@ body: |
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: %val:_(s64) = COPY $d0
; CHECK-NEXT: %sign:_(s64) = COPY $d1
- ; CHECK-NEXT: [[DEF:%[0-9]+]]:_(<2 x s64>) = G_IMPLICIT_DEF
- ; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0
- ; CHECK-NEXT: [[IVEC:%[0-9]+]]:_(<2 x s64>) = G_INSERT_VECTOR_ELT [[DEF]], %val(s64), [[C]](s64)
- ; CHECK-NEXT: [[IVEC1:%[0-9]+]]:_(<2 x s64>) = G_INSERT_VECTOR_ELT [[DEF]], %sign(s64), [[C]](s64)
- ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s64>) = G_BUILD_VECTOR [[C]](s64), [[C]](s64)
- ; CHECK-NEXT: [[FNEG:%[0-9]+]]:_(<2 x s64>) = G_FNEG [[BUILD_VECTOR]]
- ; CHECK-NEXT: [[BSP:%[0-9]+]]:_(<2 x s64>) = G_BSP [[FNEG]], [[IVEC1]], [[IVEC]]
- ; CHECK-NEXT: %fcopysign:_(s64), %10:_(s64) = G_UNMERGE_VALUES [[BSP]](<2 x s64>)
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:_(s64) = G_IMPLICIT_DEF
+ ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s64>) = G_BUILD_VECTOR %val(s64), [[DEF]](s64)
+ ; CHECK-NEXT: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s64>) = G_BUILD_VECTOR %sign(s64), [[DEF]](s64)
+ ; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 -9223372036854775808
+ ; CHECK-NEXT: [[BUILD_VECTOR2:%[0-9]+]]:_(<2 x s64>) = G_BUILD_VECTOR [[C]](s64), [[C]](s64)
+ ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 9223372036854775807
+ ; CHECK-NEXT: [[BUILD_VECTOR3:%[0-9]+]]:_(<2 x s64>) = G_BUILD_VECTOR [[C1]](s64), [[C1]](s64)
+ ; CHECK-NEXT: [[AND:%[0-9]+]]:_(<2 x s64>) = G_AND [[BUILD_VECTOR]], [[BUILD_VECTOR3]]
+ ; CHECK-NEXT: [[AND1:%[0-9]+]]:_(<2 x s64>) = G_AND [[BUILD_VECTOR1]], [[BUILD_VECTOR2]]
+ ; CHECK-NEXT: [[OR:%[0-9]+]]:_(<2 x s64>) = G_OR [[AND]], [[AND1]]
+ ; CHECK-NEXT: [[UV:%[0-9]+]]:_(s64), [[UV1:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES [[OR]](<2 x s64>)
+ ; CHECK-NEXT: %fcopysign:_(s64) = COPY [[UV]](s64)
; CHECK-NEXT: $d0 = COPY %fcopysign(s64)
; CHECK-NEXT: RET_ReallyLR implicit $d0
%val:_(s64) = COPY $d0
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
index f7493b128de1e23..e008990ddc72c31 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
@@ -521,8 +521,8 @@
# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
# DEBUG-NEXT: G_FCOPYSIGN (opcode {{[0-9]+}}): 2 type indices
-# DEBUG-NEXT: .. the first uncovered type index: 2, OK
-# DEBUG-NEXT: .. the first uncovered imm index: 0, OK
+# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
+# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
# DEBUG-NEXT: G_IS_FPCLASS (opcode {{[0-9]+}}): 2 type indices, 0 imm indices
# DEBUG-NEXT: .. type index coverage check SKIPPED: no rules defined
# DEBUG-NEXT: .. imm index coverage check SKIPPED: no rules defined
diff --git a/llvm/test/CodeGen/AArch64/fcopysign.ll b/llvm/test/CodeGen/AArch64/fcopysign.ll
index 4abd115da21c1f4..83baf154d48b33b 100644
--- a/llvm/test/CodeGen/AArch64/fcopysign.ll
+++ b/llvm/test/CodeGen/AArch64/fcopysign.ll
@@ -1,6 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
; RUN: llc -mtriple=aarch64-none-eabi -verify-machineinstrs %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-SD
-; RUN: llc -mtriple=aarch64-none-eabi -global-isel -global-isel-abort=2 -verify-machineinstrs %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-GI
+; RUN: llc -mtriple=aarch64-none-eabi -global-isel -verify-machineinstrs %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-GI
define double @copysign_f64(double %a, double %b) {
; CHECK-SD-LABEL: copysign_f64:
@@ -15,13 +15,11 @@ define double @copysign_f64(double %a, double %b) {
;
; CHECK-GI-LABEL: copysign_f64:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: movi v2.2d, #0000000000000000
+; CHECK-GI-NEXT: adrp x8, .LCPI0_0
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
; CHECK-GI-NEXT: // kill: def $d1 killed $d1 def $q1
-; CHECK-GI-NEXT: mov v0.d[0], v0.d[0]
-; CHECK-GI-NEXT: mov v1.d[0], v1.d[0]
-; CHECK-GI-NEXT: fneg v2.2d, v2.2d
-; CHECK-GI-NEXT: bit v0.16b, v1.16b, v2.16b
+; CHECK-GI-NEXT: ldr q2, [x8, :lo12:.LCPI0_0]
+; CHECK-GI-NEXT: bif v0.16b, v1.16b, v2.16b
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-GI-NEXT: ret
entry:
@@ -41,12 +39,10 @@ define float @copysign_f32(float %a, float %b) {
;
; CHECK-GI-LABEL: copysign_f32:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: // kill: def $s0 killed $s0 def $q0
-; CHECK-GI-NEXT: // kill: def $s1 killed $s1 def $q1
-; CHECK-GI-NEXT: movi v2.4s, #128, lsl #24
-; CHECK-GI-NEXT: mov v0.s[0], v0.s[0]
-; CHECK-GI-NEXT: mov v1.s[0], v1.s[0]
-; CHECK-GI-NEXT: bit v0.16b, v1.16b, v2.16b
+; CHECK-GI-NEXT: mvni v2.2s, #128, lsl #24
+; CHECK-GI-NEXT: // kill: def $s0 killed $s0 def $d0
+; CHECK-GI-NEXT: // kill: def $s1 killed $s1 def $d1
+; CHECK-GI-NEXT: bif v0.8b, v1.8b, v2.8b
; CHECK-GI-NEXT: // kill: def $s0 killed $s0 killed $q0
; CHECK-GI-NEXT: ret
entry:
@@ -55,64 +51,109 @@ entry:
}
define half @copysign_f16(half %a, half %b) {
-; CHECK-LABEL: copysign_f16:
-; CHECK: // %bb.0: // %entry
-; CHECK-NEXT: fcvt s1, h1
-; CHECK-NEXT: fcvt s0, h0
-; CHECK-NEXT: mvni v2.4s, #128, lsl #24
-; CHECK-NEXT: bif v0.16b, v1.16b, v2.16b
-; CHECK-NEXT: fcvt h0, s0
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: copysign_f16:
+; CHECK-SD: // %bb.0: // %entry
+; CHECK-SD-NEXT: fcvt s1, h1
+; CHECK-SD-NEXT: fcvt s0, h0
+; CHECK-SD-NEXT: mvni v2.4s, #128, lsl #24
+; CHECK-SD-NEXT: bif v0.16b, v1.16b, v2.16b
+; CHECK-SD-NEXT: fcvt h0, s0
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: copysign_f16:
+; CHECK-GI: // %bb.0: // %entry
+; CHECK-GI-NEXT: mvni v2.4h, #128, lsl #8
+; CHECK-GI-NEXT: // kill: def $h0 killed $h0 def $d0
+; CHECK-GI-NEXT: // kill: def $h1 killed $h1 def $d1
+; CHECK-GI-NEXT: bif v0.8b, v1.8b, v2.8b
+; CHECK-GI-NEXT: // kill: def $h0 killed $h0 killed $q0
+; CHECK-GI-NEXT: ret
entry:
%c = call half @llvm.copysign.f16(half %a, half %b)
ret half %c
}
define <2 x double> @copysign_v2f64(<2 x double> %a, <2 x double> %b) {
-; CHECK-LABEL: copysign_v2f64:
-; CHECK: // %bb.0: // %entry
-; CHECK-NEXT: movi v2.2d, #0xffffffffffffffff
-; CHECK-NEXT: fneg v2.2d, v2.2d
-; CHECK-NEXT: bif v0.16b, v1.16b, v2.16b
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: copysign_v2f64:
+; CHECK-SD: // %bb.0: // %entry
+; CHECK-SD-NEXT: movi v2.2d, #0xffffffffffffffff
+; CHECK-SD-NEXT: fneg v2.2d, v2.2d
+; CHECK-SD-NEXT: bif v0.16b, v1.16b, v2.16b
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: copysign_v2f64:
+; CHECK-GI: // %bb.0: // %entry
+; CHECK-GI-NEXT: adrp x8, .LCPI3_0
+; CHECK-GI-NEXT: ldr q2, [x8, :lo12:.LCPI3_0]
+; CHECK-GI-NEXT: bif v0.16b, v1.16b, v2.16b
+; CHECK-GI-NEXT: ret
entry:
%c = call <2 x double> @llvm.copysign.v2f64(<2 x double> %a, <2 x double> %b)
ret <2 x double> %c
}
define <3 x double> @copysign_v3f64(<3 x double> %a, <3 x double> %b) {
-; CHECK-LABEL: copysign_v3f64:
-; CHECK: // %bb.0: // %entry
-; CHECK-NEXT: movi v6.2d, #0xffffffffffffffff
-; CHECK-NEXT: // kill: def $d3 killed $d3 def $q3
-; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
-; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-NEXT: // kill: def $d4 killed $d4 def $q4
-; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2
-; CHECK-NEXT: // kill: def $d5 killed $d5 def $q5
-; CHECK-NEXT: mov v3.d[1], v4.d[0]
-; CHECK-NEXT: mov v0.d[1], v1.d[0]
-; CHECK-NEXT: fneg v1.2d, v6.2d
-; CHECK-NEXT: bif v0.16b, v3.16b, v1.16b
-; CHECK-NEXT: bif v2.16b, v5.16b, v1.16b
-; CHECK-NEXT: // kill: def $d2 killed $d2 killed $q2
-; CHECK-NEXT: ext v1.16b, v0.16b, v0.16b, #8
-; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
-; CHECK-NEXT: // kill: def $d1 killed $d1 killed $q1
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: copysign_v3f64:
+; CHECK-SD: // %bb.0: // %entry
+; CHECK-SD-NEXT: movi v6.2d, #0xffffffffffffffff
+; CHECK-SD-NEXT: // kill: def $d3 killed $d3 def $q3
+; CHECK-SD-NEXT: // kill: def $d1 killed $d1 def $q1
+; CHECK-SD-NEXT: // kill: def $d0 killed $d0 def $q0
+; CHECK-SD-NEXT: // kill: def $d4 killed $d4 def $q4
+; CHECK-SD-NEXT: // kill: def $d2 killed $d2 def $q2
+; CHECK-SD-NEXT: // kill: def $d5 killed $d5 def $q5
+; CHECK-SD-NEXT: mov v3.d[1], v4.d[0]
+; CHECK-SD-NEXT: mov v0.d[1], v1.d[0]
+; CHECK-SD-NEXT: fneg v1.2d, v6.2d
+; CHECK-SD-NEXT: bif v0.16b, v3.16b, v1.16b
+; CHECK-SD-NEXT: bif v2.16b, v5.16b, v1.16b
+; CHECK-SD-NEXT: // kill: def $d2 killed $d2 killed $q2
+; CHECK-SD-NEXT: ext v1.16b, v0.16b, v0.16b, #8
+; CHECK-SD-NEXT: // kill: def $d0 killed $d0 killed $q0
+; CHECK-SD-NEXT: // kill: def $d1 killed $d1 killed $q1
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: copysign_v3f64:
+; CHECK-GI: // %bb.0: // %entry
+; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
+; CHECK-GI-NEXT: // kill: def $d3 killed $d3 def $q3
+; CHECK-GI-NEXT: // kill: def $d1 killed $d1 def $q1
+; CHECK-GI-NEXT: // kill: def $d4 killed $d4 def $q4
+; CHECK-GI-NEXT: adrp x8, .LCPI4_0
+; CHECK-GI-NEXT: fmov x9, d5
+; CHECK-GI-NEXT: mov v0.d[1], v1.d[0]
+; CHECK-GI-NEXT: mov v3.d[1], v4.d[0]
+; CHECK-GI-NEXT: ldr q1, [x8, :lo12:.LCPI4_0]
+; CHECK-GI-NEXT: fmov x8, d2
+; CHECK-GI-NEXT: and x9, x9, #0x8000000000000000
+; CHECK-GI-NEXT: bif v0.16b, v3.16b, v1.16b
+; CHECK-GI-NEXT: and x8, x8, #0x7fffffffffffffff
+; CHECK-GI-NEXT: orr x8, x8, x9
+; CHECK-GI-NEXT: fmov d2, x8
+; CHECK-GI-NEXT: mov d1, v0.d[1]
+; CHECK-GI-NEXT: // kill: def $d0 killed $d0 killed $q0
+; CHECK-GI-NEXT: ret
entry:
%c = call <3 x double> @llvm.copysign.v3f64(<3 x double> %a, <3 x double> %b)
ret <3 x double> %c
}
define <4 x double> @copysign_v4f64(<4 x double> %a, <4 x double> %b) {
-; CHECK-LABEL: copysign_v4f64:
-; CHECK: // %bb.0: // %entry
-; CHECK-NEXT: movi v4.2d, #0xffffffffffffffff
-; CHECK-NEXT: fneg v4.2d, v4.2d
-; CHECK-NEXT: bif v0.16b, v2.16b, v4.16b
-; CHECK-NEXT: bif v1.16b, v3.16b, v4.16b
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: copysign_v4f64:
+; CHECK-SD: // %bb.0: // %entry
+; CHECK-SD-NEXT: movi v4.2d, #0xffffffffffffffff
+; CHECK-SD-NEXT: fneg v4.2d, v4.2d
+; CHECK-SD-NEXT: bif v0.16b, v2.16b, v4.16b
+; CHECK-SD-NEXT: bif v1.16b, v3.16b, v4.16b
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: copysign_v4f64:
+; CHECK-GI: // %bb.0: // %entry
+; CHECK-GI-NEXT: adrp x8, .LCPI5_0
+; CHECK-GI-NEXT: ldr q4, [x8, :lo12:.LCPI5_0]
+; CHECK-GI-NEXT: bif v0.16b, v2.16b, v4.16b
+; CHECK-GI-NEXT: bif v1.16b, v3.16b, v4.16b
+; CHECK-GI-NEXT: ret
entry:
%c = call <4 x double> @llvm.copysign.v4f64(<4 x double> %a, <4 x double> %b)
ret <4 x double> %c
@@ -130,11 +171,28 @@ entry:
}
define <3 x float> @copysign_v3f32(<3 x float> %a, <3 x float> %b) {
-; CHECK-LABEL: ...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
E.g. G_ABS does not support scalars currently. With the new |
; CHECK-SD-NEXT: movi v4.2d, #0xffffffffffffffff | ||
; CHECK-SD-NEXT: fneg v4.2d, v4.2d | ||
; CHECK-SD-NEXT: bif v0.16b, v2.16b, v4.16b | ||
; CHECK-SD-NEXT: bif v1.16b, v3.16b, v4.16b | ||
; CHECK-SD-NEXT: ret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DAG looks better?
a39ed08
to
55a6660
Compare
Thanks for the reviews and sorry for the delay on this one. There is now a separate optimization to produce fneg(movi 0), which should cover the places where this was previously potentially performing worse. |
55a6660
to
ae0ea4f
Compare
This alters the lowering of G_COPYSIGN to support vector types. The general idea is that we just lower it to vector operations using and/or and a mask, which are now converted to a BIF/BIT/BSP. In the process the existing AArch64LegalizerInfo::legalizeFCopySign can be removed, replying on expanding the scalar versions to vector instead, which just needs a small adjustment to allow widening scalars to vectors.
ae0ea4f
to
88c781d
Compare
This alters the lowering of G_COPYSIGN to support vector types. The general idea is that we just lower it to vector operations using and/or and a mask, which are now converted to a BIF/BIT/BSP.
In the process the existing AArch64LegalizerInfo::legalizeFCopySign can be removed, replying on expanding the scalar versions to vector instead, which just needs a small adjustment to allow widening scalars to vectors.