-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LoongArch] [CodeGen] Add options for Clang to generate LoongArch-specific frecipe & frsqrte instructions #109917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-backend-loongarch @llvm/pr-subscribers-clang Author: None (tangaac) ChangesTwo options: Patch is 39.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109917.diff 14 Files Affected:
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 23bd686a85f526..811fb5490d6707 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -5373,6 +5373,10 @@ def mno_lasx : Flag<["-"], "mno-lasx">, Group<m_loongarch_Features_Group>,
def msimd_EQ : Joined<["-"], "msimd=">, Group<m_loongarch_Features_Group>,
Flags<[TargetSpecific]>,
HelpText<"Select the SIMD extension(s) to be enabled in LoongArch either 'none', 'lsx', 'lasx'.">;
+def mfrecipe : Flag<["-"], "mfrecipe">, Group<m_loongarch_Features_Group>,
+ HelpText<"Enable frecipe.{s/d} and frsqrte.{s/d}">;
+def mno_frecipe : Flag<["-"], "mno-frecipe">, Group<m_loongarch_Features_Group>,
+ HelpText<"Disable frecipe.{s/d} and frsqrte.{s/d}">;
def mnop_mcount : Flag<["-"], "mnop-mcount">, HelpText<"Generate mcount/__fentry__ calls as nops. To activate they need to be patched in.">,
Visibility<[ClangOption, CC1Option]>, Group<m_Group>,
MarshallingInfoFlag<CodeGenOpts<"MNopMCount">>;
diff --git a/clang/lib/Driver/ToolChains/Arch/LoongArch.cpp b/clang/lib/Driver/ToolChains/Arch/LoongArch.cpp
index 771adade93813f..62233a32d0d396 100644
--- a/clang/lib/Driver/ToolChains/Arch/LoongArch.cpp
+++ b/clang/lib/Driver/ToolChains/Arch/LoongArch.cpp
@@ -251,6 +251,20 @@ void loongarch::getLoongArchTargetFeatures(const Driver &D,
} else /*-mno-lasx*/
Features.push_back("-lasx");
}
+
+ // Select frecipe feature determined by -m[no-]frecipe.
+ if (const Arg *A =
+ Args.getLastArg(options::OPT_mfrecipe, options::OPT_mno_frecipe)) {
+ // FRECIPE depends on 64-bit FPU.
+ // -mno-frecipe conflicts with -mfrecipe.
+ if (A->getOption().matches(options::OPT_mfrecipe)) {
+ if (llvm::find(Features, "-d") != Features.end())
+ D.Diag(diag::err_drv_loongarch_wrong_fpu_width) << /*FRECIPE*/ 2;
+ else /*-mfrecipe*/
+ Features.push_back("+frecipe");
+ } else /*-mnofrecipe*/
+ Features.push_back("-frecipe");
+ }
}
std::string loongarch::postProcessTargetCPUString(const std::string &CPU,
diff --git a/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td
index d6a83c0c8cd8fb..8f909d26cfd08a 100644
--- a/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td
@@ -19,12 +19,15 @@ def SDT_LoongArchMOVGR2FR_W_LA64
def SDT_LoongArchMOVFR2GR_S_LA64
: SDTypeProfile<1, 1, [SDTCisVT<0, i64>, SDTCisVT<1, f32>]>;
def SDT_LoongArchFTINT : SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisFP<1>]>;
+def SDT_LoongArchFRECIPE : SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisFP<1>]>;
def loongarch_movgr2fr_w_la64
: SDNode<"LoongArchISD::MOVGR2FR_W_LA64", SDT_LoongArchMOVGR2FR_W_LA64>;
def loongarch_movfr2gr_s_la64
: SDNode<"LoongArchISD::MOVFR2GR_S_LA64", SDT_LoongArchMOVFR2GR_S_LA64>;
def loongarch_ftint : SDNode<"LoongArchISD::FTINT", SDT_LoongArchFTINT>;
+def loongarch_frecipe_s : SDNode<"LoongArchISD::FRECIPE_S", SDT_LoongArchFRECIPE>;
+def loongarch_frsqrte_s : SDNode<"LoongArchISD::FRSQRTE_S", SDT_LoongArchFRECIPE>;
//===----------------------------------------------------------------------===//
// Instructions
@@ -286,6 +289,8 @@ let Predicates = [HasFrecipe] in {
// FP approximate reciprocal operation
def : Pat<(int_loongarch_frecipe_s FPR32:$src), (FRECIPE_S FPR32:$src)>;
def : Pat<(int_loongarch_frsqrte_s FPR32:$src), (FRSQRTE_S FPR32:$src)>;
+def : Pat<(loongarch_frecipe_s FPR32:$src), (FRECIPE_S FPR32:$src)>;
+def : Pat<(loongarch_frsqrte_s FPR32:$src), (FRSQRTE_S FPR32:$src)>;
}
// fmadd.s: fj * fk + fa
diff --git a/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td
index 30cce8439640f1..aabb58c0d68eff 100644
--- a/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td
@@ -10,6 +10,13 @@
//
//===----------------------------------------------------------------------===//
+// ===----------------------------------------------------------------------===//
+// LoongArch specific DAG Nodes.
+// ===----------------------------------------------------------------------===//
+
+def loongarch_frecipe_d : SDNode<"LoongArchISD::FRECIPE_D", SDT_LoongArchFRECIPE>;
+def loongarch_frsqrte_d : SDNode<"LoongArchISD::FRSQRTE_D", SDT_LoongArchFRECIPE>;
+
//===----------------------------------------------------------------------===//
// Instructions
//===----------------------------------------------------------------------===//
@@ -253,6 +260,8 @@ let Predicates = [HasFrecipe] in {
// FP approximate reciprocal operation
def : Pat<(int_loongarch_frecipe_d FPR64:$src), (FRECIPE_D FPR64:$src)>;
def : Pat<(int_loongarch_frsqrte_d FPR64:$src), (FRSQRTE_D FPR64:$src)>;
+def : Pat<(loongarch_frecipe_d FPR64:$src), (FRECIPE_D FPR64:$src)>;
+def : Pat<(loongarch_frsqrte_d FPR64:$src), (FRSQRTE_D FPR64:$src)>;
}
// fmadd.d: fj * fk + fa
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index bfafb331752108..bbff8d097a80e7 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -4697,6 +4697,18 @@ const char *LoongArchTargetLowering::getTargetNodeName(unsigned Opcode) const {
NODE_NAME_CASE(VANY_ZERO)
NODE_NAME_CASE(VALL_NONZERO)
NODE_NAME_CASE(VANY_NONZERO)
+ NODE_NAME_CASE(FRECIPE_S)
+ NODE_NAME_CASE(FRECIPE_D)
+ NODE_NAME_CASE(FRSQRTE_S)
+ NODE_NAME_CASE(FRSQRTE_D)
+ NODE_NAME_CASE(VFRECIPE_S)
+ NODE_NAME_CASE(VFRECIPE_D)
+ NODE_NAME_CASE(VFRSQRTE_S)
+ NODE_NAME_CASE(VFRSQRTE_D)
+ NODE_NAME_CASE(XVFRECIPE_S)
+ NODE_NAME_CASE(XVFRECIPE_D)
+ NODE_NAME_CASE(XVFRSQRTE_S)
+ NODE_NAME_CASE(XVFRSQRTE_D)
}
#undef NODE_NAME_CASE
return nullptr;
@@ -5902,6 +5914,91 @@ Register LoongArchTargetLowering::getExceptionSelectorRegister(
return LoongArch::R5;
}
+//===----------------------------------------------------------------------===//
+// Target Optimization Hooks
+//===----------------------------------------------------------------------===//
+
+static int getEstimateRefinementSteps(EVT VT, const LoongArchSubtarget &Subtarget) {
+ // Feature FRECIPE instrucions relative accuracy is 2^-14.
+ // IEEE float has 23 digits and double has 52 digits.
+ int RefinementSteps = VT.getScalarType() == MVT::f64 ? 2: 1;
+ return RefinementSteps;
+}
+
+SDValue LoongArchTargetLowering::getSqrtEstimate(SDValue Operand,
+ SelectionDAG &DAG, int Enabled,
+ int &RefinementSteps,
+ bool &UseOneConstNR,
+ bool Reciprocal) const {
+ if (Subtarget.hasFrecipe()) {
+ SDLoc DL(Operand);
+ EVT VT = Operand.getValueType();
+ unsigned Opcode;
+
+ if (VT == MVT::f32) {
+ Opcode = LoongArchISD::FRSQRTE_S;
+ } else if (VT == MVT::f64 && Subtarget.hasBasicD()) {
+ Opcode = LoongArchISD::FRSQRTE_D;
+ } else if (VT == MVT::v4f32 && Subtarget.hasExtLSX()) {
+ Opcode = LoongArchISD::VFRSQRTE_S;
+ } else if (VT == MVT::v2f64 && Subtarget.hasExtLSX()) {
+ Opcode = LoongArchISD::VFRSQRTE_D;
+ } else if (VT == MVT::v8f32 && Subtarget.hasExtLASX()) {
+ Opcode = LoongArchISD::XVFRSQRTE_S;
+ } else if (VT == MVT::v4f64 && Subtarget.hasExtLASX()) {
+ Opcode = LoongArchISD::XVFRSQRTE_D;
+ } else {
+ return SDValue();
+ }
+
+ UseOneConstNR = false;
+ if (RefinementSteps == ReciprocalEstimate::Unspecified)
+ RefinementSteps = getEstimateRefinementSteps(VT, Subtarget);
+
+ SDValue Estimate = DAG.getNode(Opcode, DL, VT, Operand);
+ if (Reciprocal) {
+ Estimate = DAG.getNode(ISD::FMUL, DL, VT, Operand, Estimate);
+ }
+ return Estimate;
+ }
+
+ return SDValue();
+}
+
+SDValue LoongArchTargetLowering::getRecipEstimate(SDValue Operand,
+ SelectionDAG &DAG,
+ int Enabled,
+ int &RefinementSteps) const {
+ if (Subtarget.hasFrecipe()) {
+ SDLoc DL(Operand);
+ EVT VT = Operand.getValueType();
+ unsigned Opcode;
+
+ if (VT == MVT::f32) {
+ Opcode = LoongArchISD::FRECIPE_S;
+ } else if (VT == MVT::f64 && Subtarget.hasBasicD()) {
+ Opcode = LoongArchISD::FRECIPE_D;
+ } else if (VT == MVT::v4f32 && Subtarget.hasExtLSX()) {
+ Opcode = LoongArchISD::VFRECIPE_S;
+ } else if (VT == MVT::v2f64 && Subtarget.hasExtLSX()) {
+ Opcode = LoongArchISD::VFRECIPE_D;
+ } else if (VT == MVT::v8f32 && Subtarget.hasExtLASX()) {
+ Opcode = LoongArchISD::XVFRECIPE_S;
+ } else if (VT == MVT::v4f64 && Subtarget.hasExtLASX()) {
+ Opcode = LoongArchISD::XVFRECIPE_D;
+ } else {
+ return SDValue();
+ }
+
+ if (RefinementSteps == ReciprocalEstimate::Unspecified)
+ RefinementSteps = getEstimateRefinementSteps(VT, Subtarget);
+
+ return DAG.getNode(Opcode, DL, VT, Operand);
+ }
+
+ return SDValue();
+}
+
//===----------------------------------------------------------------------===//
// LoongArch Inline Assembly Support
//===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.h b/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
index 6177884bd19501..a721cfc5f518e1 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
@@ -141,6 +141,22 @@ enum NodeType : unsigned {
VALL_NONZERO,
VANY_NONZERO,
+ // Floating point approximate reciprocal operation
+ FRECIPE_S,
+ FRECIPE_D,
+ FRSQRTE_S,
+ FRSQRTE_D,
+
+ VFRECIPE_S,
+ VFRECIPE_D,
+ VFRSQRTE_S,
+ VFRSQRTE_D,
+
+ XVFRECIPE_S,
+ XVFRECIPE_D,
+ XVFRSQRTE_S,
+ XVFRSQRTE_D,
+
// Intrinsic operations end =============================================
};
} // end namespace LoongArchISD
@@ -216,6 +232,17 @@ class LoongArchTargetLowering : public TargetLowering {
Register
getExceptionSelectorRegister(const Constant *PersonalityFn) const override;
+ bool isFsqrtCheap(SDValue Operand, SelectionDAG &DAG) const override {
+ return true;
+ }
+
+ SDValue getSqrtEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
+ int &RefinementSteps, bool &UseOneConstNR,
+ bool Reciprocal) const override;
+
+ SDValue getRecipEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
+ int &RefinementSteps) const override;
+
ISD::NodeType getExtendForAtomicOps() const override {
return ISD::SIGN_EXTEND;
}
diff --git a/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
index dd7e5713e45fe9..23ae6f038dceff 100644
--- a/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
@@ -9,9 +9,18 @@
// This file describes the Advanced SIMD extension instructions.
//
//===----------------------------------------------------------------------===//
+def SDT_LoongArchXVFRECIPE_S : SDTypeProfile<1, 1, [SDTCisVT<0, v8f32>, SDTCisVT<1, v8f32>]>;
+def SDT_LoongArchXVFRECIPE_D : SDTypeProfile<1, 1, [SDTCisVT<0, v4f64>, SDTCisVT<1, v4f64>]>;
+// Target nodes.
def loongarch_xvpermi: SDNode<"LoongArchISD::XVPERMI", SDT_LoongArchV1RUimm>;
+def loongarch_xvfrecipe_s: SDNode<"LoongArchISD::XVFRECIPE_S", SDT_LoongArchXVFRECIPE_S>;
+def loongarch_xvfrecipe_d: SDNode<"LoongArchISD::XVFRECIPE_D", SDT_LoongArchXVFRECIPE_D>;
+def loongarch_xvfrsqrte_s: SDNode<"LoongArchISD::XVFRSQRTE_S", SDT_LoongArchXVFRECIPE_S>;
+def loongarch_xvfrsqrte_d: SDNode<"LoongArchISD::XVFRSQRTE_D", SDT_LoongArchXVFRECIPE_D>;
+
+
def lasxsplati8
: PatFrag<(ops node:$e0),
(v32i8 (build_vector node:$e0, node:$e0, node:$e0, node:$e0,
@@ -2094,6 +2103,15 @@ foreach Inst = ["XVFRECIPE_S", "XVFRSQRTE_S"] in
foreach Inst = ["XVFRECIPE_D", "XVFRSQRTE_D"] in
def : Pat<(deriveLASXIntrinsic<Inst>.ret (v4f64 LASX256:$xj)),
(!cast<LAInst>(Inst) LASX256:$xj)>;
+
+def : Pat<(loongarch_xvfrecipe_s v8f32:$src),
+ (XVFRECIPE_S v8f32:$src)>;
+def : Pat<(loongarch_xvfrecipe_d v4f64:$src),
+ (XVFRECIPE_D v4f64:$src)>;
+def : Pat<(loongarch_xvfrsqrte_s v8f32:$src),
+ (XVFRSQRTE_S v8f32:$src)>;
+def : Pat<(loongarch_xvfrsqrte_d v4f64:$src),
+ (XVFRSQRTE_D v4f64:$src)>;
}
def : Pat<(int_loongarch_lasx_xvpickve_w_f v8f32:$xj, timm:$imm),
diff --git a/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
index e7ac9f3bd04cbf..510b1241edd4e0 100644
--- a/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
@@ -23,6 +23,8 @@ def SDT_LoongArchV2R : SDTypeProfile<1, 2, [SDTCisVec<0>,
SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>]>;
def SDT_LoongArchV1RUimm: SDTypeProfile<1, 2, [SDTCisVec<0>,
SDTCisSameAs<0,1>, SDTCisVT<2, i64>]>;
+def SDT_LoongArchVFRECIPE_S : SDTypeProfile<1, 1, [SDTCisVT<0, v4f32>, SDTCisVT<1, v4f32>]>;
+def SDT_LoongArchVFRECIPE_D : SDTypeProfile<1, 1, [SDTCisVT<0, v2f64>, SDTCisVT<1, v2f64>]>;
// Target nodes.
def loongarch_vreplve : SDNode<"LoongArchISD::VREPLVE", SDT_LoongArchVreplve>;
@@ -50,6 +52,10 @@ def loongarch_vilvh: SDNode<"LoongArchISD::VILVH", SDT_LoongArchV2R>;
def loongarch_vshuf4i: SDNode<"LoongArchISD::VSHUF4I", SDT_LoongArchV1RUimm>;
def loongarch_vreplvei: SDNode<"LoongArchISD::VREPLVEI", SDT_LoongArchV1RUimm>;
+def loongarch_vfrecipe_s: SDNode<"LoongArchISD::VFRECIPE_S", SDT_LoongArchVFRECIPE_S>;
+def loongarch_vfrecipe_d: SDNode<"LoongArchISD::VFRECIPE_D", SDT_LoongArchVFRECIPE_D>;
+def loongarch_vfrsqrte_s: SDNode<"LoongArchISD::VFRSQRTE_S", SDT_LoongArchVFRECIPE_S>;
+def loongarch_vfrsqrte_d: SDNode<"LoongArchISD::VFRSQRTE_D", SDT_LoongArchVFRECIPE_D>;
def immZExt1 : ImmLeaf<i64, [{return isUInt<1>(Imm);}]>;
def immZExt2 : ImmLeaf<i64, [{return isUInt<2>(Imm);}]>;
@@ -2238,6 +2244,15 @@ foreach Inst = ["VFRECIPE_S", "VFRSQRTE_S"] in
foreach Inst = ["VFRECIPE_D", "VFRSQRTE_D"] in
def : Pat<(deriveLSXIntrinsic<Inst>.ret (v2f64 LSX128:$vj)),
(!cast<LAInst>(Inst) LSX128:$vj)>;
+
+def : Pat<(loongarch_vfrecipe_s v4f32:$src),
+ (VFRECIPE_S v4f32:$src)>;
+def : Pat<(loongarch_vfrecipe_d v2f64:$src),
+ (VFRECIPE_D v2f64:$src)>;
+def : Pat<(loongarch_vfrsqrte_s v4f32:$src),
+ (VFRSQRTE_S v4f32:$src)>;
+def : Pat<(loongarch_vfrsqrte_d v2f64:$src),
+ (VFRSQRTE_D v2f64:$src)>;
}
// load
diff --git a/llvm/test/CodeGen/LoongArch/fdiv-reciprocal-estimate.ll b/llvm/test/CodeGen/LoongArch/fdiv-reciprocal-estimate.ll
new file mode 100644
index 00000000000000..b4b280a43055f1
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/fdiv-reciprocal-estimate.ll
@@ -0,0 +1,43 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc --mtriple=loongarch64 --mattr=+d,-frecipe < %s | FileCheck %s --check-prefix=FAULT
+; RUN: llc --mtriple=loongarch64 --mattr=+d,+frecipe < %s | FileCheck %s
+
+;; Exercise the 'fdiv' LLVM IR: https://llvm.org/docs/LangRef.html#fdiv-instruction
+
+define float @fdiv_s(float %x, float %y) {
+; FAULT-LABEL: fdiv_s:
+; FAULT: # %bb.0:
+; FAULT-NEXT: fdiv.s $fa0, $fa0, $fa1
+; FAULT-NEXT: ret
+;
+; CHECK-LABEL: fdiv_s:
+; CHECK: # %bb.0:
+; CHECK-NEXT: frecipe.s $fa2, $fa1
+; CHECK-NEXT: fmul.s $fa3, $fa0, $fa2
+; CHECK-NEXT: fnmsub.s $fa0, $fa1, $fa3, $fa0
+; CHECK-NEXT: fmadd.s $fa0, $fa2, $fa0, $fa3
+; CHECK-NEXT: ret
+ %div = fdiv fast float %x, %y
+ ret float %div
+}
+
+define double @fdiv_d(double %x, double %y) {
+; FAULT-LABEL: fdiv_d:
+; FAULT: # %bb.0:
+; FAULT-NEXT: fdiv.d $fa0, $fa0, $fa1
+; FAULT-NEXT: ret
+;
+; CHECK-LABEL: fdiv_d:
+; CHECK: # %bb.0:
+; CHECK-NEXT: pcalau12i $a0, %pc_hi20(.LCPI1_0)
+; CHECK-NEXT: fld.d $fa2, $a0, %pc_lo12(.LCPI1_0)
+; CHECK-NEXT: frecipe.d $fa3, $fa1
+; CHECK-NEXT: fmadd.d $fa2, $fa1, $fa3, $fa2
+; CHECK-NEXT: fnmsub.d $fa2, $fa2, $fa3, $fa3
+; CHECK-NEXT: fmul.d $fa3, $fa0, $fa2
+; CHECK-NEXT: fnmsub.d $fa0, $fa1, $fa3, $fa0
+; CHECK-NEXT: fmadd.d $fa0, $fa2, $fa0, $fa3
+; CHECK-NEXT: ret
+ %div = fdiv fast double %x, %y
+ ret double %div
+}
diff --git a/llvm/test/CodeGen/LoongArch/fsqrt-reciprocal-estimate.ll b/llvm/test/CodeGen/LoongArch/fsqrt-reciprocal-estimate.ll
new file mode 100644
index 00000000000000..d683487fdd4073
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/fsqrt-reciprocal-estimate.ll
@@ -0,0 +1,209 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc --mtriple=loongarch64 --mattr=+d,-frecipe < %s | FileCheck %s --check-prefix=FAULT
+; RUN: llc --mtriple=loongarch64 --mattr=+d,+frecipe < %s | FileCheck %s
+
+declare float @llvm.sqrt.f32(float)
+declare double @llvm.sqrt.f64(double)
+
+define float @frsqrt_f32(float %a) nounwind {
+; FAULT-LABEL: frsqrt_f32:
+; FAULT: # %bb.0:
+; FAULT-NEXT: frsqrt.s $fa0, $fa0
+; FAULT-NEXT: ret
+;
+; CHECK-LABEL: frsqrt_f32:
+; CHECK: # %bb.0:
+; CHECK-NEXT: frsqrte.s $fa1, $fa0
+; CHECK-NEXT: pcalau12i $a0, %pc_hi20(.LCPI0_0)
+; CHECK-NEXT: fld.s $fa2, $a0, %pc_lo12(.LCPI0_0)
+; CHECK-NEXT: pcalau12i $a0, %pc_hi20(.LCPI0_1)
+; CHECK-NEXT: fld.s $fa3, $a0, %pc_lo12(.LCPI0_1)
+; CHECK-NEXT: fmul.s $fa1, $fa0, $fa1
+; CHECK-NEXT: fmul.s $fa0, $fa0, $fa1
+; CHECK-NEXT: fmadd.s $fa0, $fa0, $fa1, $fa2
+; CHECK-NEXT: fmul.s $fa1, $fa1, $fa3
+; CHECK-NEXT: fmul.s $fa0, $fa1, $fa0
+; CHECK-NEXT: ret
+
+ %1 = call fast float @llvm.sqrt.f32(float %a)
+ %2 = fdiv fast float 1.0, %1
+ ret float %2
+}
+
+define double @frsqrt_f64(double %a) nounwind {
+; FAULT-LABEL: frsqrt_f64:
+; FAULT: # %bb.0:
+; FAULT-NEXT: frsqrt.d $fa0, $fa0
+; FAULT-NEXT: ret
+;
+; CHECK-LABEL: frsqrt_f64:
+; CHECK: # %bb.0:
+; CHECK-NEXT: frsqrte.d $fa1, $fa0
+; CHECK-NEXT: pcalau12i $a0, %pc_hi20(.LCPI1_0)
+; CHECK-NEXT: fld.d $fa2, $a0, %pc_lo12(.LCPI1_0)
+; CHECK-NEXT: pcalau12i $a0, %pc_hi20(.LCPI1_1)
+; CHECK-NEXT: fld.d $fa3, $a0, %pc_lo12(.LCPI1_1)
+; CHECK-NEXT: fmul.d $fa1, $fa0, $fa1
+; CHECK-NEXT: fmul.d $fa4, $fa0, $fa1
+; CHECK-NEXT: fmadd.d $fa4, $fa4, $fa1, $fa2
+; CHECK-NEXT: fmul.d $fa1, $fa1, $fa3
+; CHECK-NEXT: fmul.d $fa1, $fa1, $fa4
+; CHECK-NEXT: fmul.d $fa0, $fa0, $fa1
+; CHECK-NEXT: fmadd.d $fa0, $fa0, $fa1, $fa2
+; CHECK-NEXT: fmul.d $fa1, $fa1, $fa3
+; CHECK-NEXT: fmul.d $fa0, $fa1, $fa0
+; CHECK-NEXT: ret
+ %1 = call fast double @llvm.sqrt.f64(double %a)
+ %2 = fdiv fast double 1.0, %1
+ ret double %2
+}
+
+define double @sqrt_simplify_before_recip_3_uses(double %x, ptr %p1, ptr %p2) nounwind {
+; FAULT-LABEL: sqrt_simplify_before_recip_3_uses:
+; FAULT: # %bb.0:
+; FAULT-NEXT: pcalau12i $a2, %pc_hi20(.LCPI2_0)
+; FAULT-NEXT: fld.d $fa2, $a2, %pc_lo12(.LCPI2_0)
+; FAULT-NEXT: fsqrt.d $fa1, $fa0
+; FAULT-NEXT: frsqrt.d $fa0, $fa0
+; FAULT-NEXT: fdiv.d $fa2, $fa2, $fa1
+; FAULT-NEXT: fst.d $fa0, $a0, 0
+; FAULT-NEXT: fst.d $fa2, $a1, 0
+; FAULT-NEXT: fmov.d $fa0, $fa1
+; FAULT-NEXT: ret
+;
+; CHECK-LABEL: sqrt_simplify_before_recip_3_uses:
+; CHECK: # %bb.0:
+; CHECK-NEXT: frsqrte.d $fa1, $fa0
+; CHECK-NEXT: pcalau12i $a2, %pc_hi20(.LCPI2_0)
+; CHECK-NEXT: fld.d $fa2, $a2, %pc_lo12(.LCPI2_0)
+; CHECK-NEXT: pcalau12i $a2, %pc_hi20(.LCPI2_1)
+; CHECK-NEXT: fld.d $fa3, $a2, %pc_lo12(.LCPI2_1)
+; CHECK-NEXT: fmul.d $fa1, $fa0, $fa1
+; CHECK-NEXT: fmul.d $fa4, $fa0, $fa1
+; CHECK-NEXT: fmadd.d $fa4, $fa4, $fa1, $fa2
+; CHECK-NEXT: fmul.d $fa1, $fa1, $fa3
+; CHECK-NEXT: fmul.d $fa1, $fa1, $fa4
+; CHECK-NEXT: fmul.d $fa4, $fa0, $fa1
+; CHECK-NEXT: pcalau12i $a2, %pc_hi20(.LCPI2_2)
+; CHECK-NEXT: fld.d $fa5, $a2, %pc_lo12(.LCPI2_2)
+; CHECK-NEXT: fmadd.d $fa2, $fa4, $fa1, $fa2
+; CHECK-NEXT: fmul.d $fa1, $fa1...
[truncated]
|
@llvm/pr-subscribers-clang-driver Author: None (tangaac) ChangesTwo options: Patch is 39.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109917.diff 14 Files Affected:
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 23bd686a85f526..811fb5490d6707 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -5373,6 +5373,10 @@ def mno_lasx : Flag<["-"], "mno-lasx">, Group<m_loongarch_Features_Group>,
def msimd_EQ : Joined<["-"], "msimd=">, Group<m_loongarch_Features_Group>,
Flags<[TargetSpecific]>,
HelpText<"Select the SIMD extension(s) to be enabled in LoongArch either 'none', 'lsx', 'lasx'.">;
+def mfrecipe : Flag<["-"], "mfrecipe">, Group<m_loongarch_Features_Group>,
+ HelpText<"Enable frecipe.{s/d} and frsqrte.{s/d}">;
+def mno_frecipe : Flag<["-"], "mno-frecipe">, Group<m_loongarch_Features_Group>,
+ HelpText<"Disable frecipe.{s/d} and frsqrte.{s/d}">;
def mnop_mcount : Flag<["-"], "mnop-mcount">, HelpText<"Generate mcount/__fentry__ calls as nops. To activate they need to be patched in.">,
Visibility<[ClangOption, CC1Option]>, Group<m_Group>,
MarshallingInfoFlag<CodeGenOpts<"MNopMCount">>;
diff --git a/clang/lib/Driver/ToolChains/Arch/LoongArch.cpp b/clang/lib/Driver/ToolChains/Arch/LoongArch.cpp
index 771adade93813f..62233a32d0d396 100644
--- a/clang/lib/Driver/ToolChains/Arch/LoongArch.cpp
+++ b/clang/lib/Driver/ToolChains/Arch/LoongArch.cpp
@@ -251,6 +251,20 @@ void loongarch::getLoongArchTargetFeatures(const Driver &D,
} else /*-mno-lasx*/
Features.push_back("-lasx");
}
+
+ // Select frecipe feature determined by -m[no-]frecipe.
+ if (const Arg *A =
+ Args.getLastArg(options::OPT_mfrecipe, options::OPT_mno_frecipe)) {
+ // FRECIPE depends on 64-bit FPU.
+ // -mno-frecipe conflicts with -mfrecipe.
+ if (A->getOption().matches(options::OPT_mfrecipe)) {
+ if (llvm::find(Features, "-d") != Features.end())
+ D.Diag(diag::err_drv_loongarch_wrong_fpu_width) << /*FRECIPE*/ 2;
+ else /*-mfrecipe*/
+ Features.push_back("+frecipe");
+ } else /*-mnofrecipe*/
+ Features.push_back("-frecipe");
+ }
}
std::string loongarch::postProcessTargetCPUString(const std::string &CPU,
diff --git a/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td
index d6a83c0c8cd8fb..8f909d26cfd08a 100644
--- a/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td
@@ -19,12 +19,15 @@ def SDT_LoongArchMOVGR2FR_W_LA64
def SDT_LoongArchMOVFR2GR_S_LA64
: SDTypeProfile<1, 1, [SDTCisVT<0, i64>, SDTCisVT<1, f32>]>;
def SDT_LoongArchFTINT : SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisFP<1>]>;
+def SDT_LoongArchFRECIPE : SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisFP<1>]>;
def loongarch_movgr2fr_w_la64
: SDNode<"LoongArchISD::MOVGR2FR_W_LA64", SDT_LoongArchMOVGR2FR_W_LA64>;
def loongarch_movfr2gr_s_la64
: SDNode<"LoongArchISD::MOVFR2GR_S_LA64", SDT_LoongArchMOVFR2GR_S_LA64>;
def loongarch_ftint : SDNode<"LoongArchISD::FTINT", SDT_LoongArchFTINT>;
+def loongarch_frecipe_s : SDNode<"LoongArchISD::FRECIPE_S", SDT_LoongArchFRECIPE>;
+def loongarch_frsqrte_s : SDNode<"LoongArchISD::FRSQRTE_S", SDT_LoongArchFRECIPE>;
//===----------------------------------------------------------------------===//
// Instructions
@@ -286,6 +289,8 @@ let Predicates = [HasFrecipe] in {
// FP approximate reciprocal operation
def : Pat<(int_loongarch_frecipe_s FPR32:$src), (FRECIPE_S FPR32:$src)>;
def : Pat<(int_loongarch_frsqrte_s FPR32:$src), (FRSQRTE_S FPR32:$src)>;
+def : Pat<(loongarch_frecipe_s FPR32:$src), (FRECIPE_S FPR32:$src)>;
+def : Pat<(loongarch_frsqrte_s FPR32:$src), (FRSQRTE_S FPR32:$src)>;
}
// fmadd.s: fj * fk + fa
diff --git a/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td
index 30cce8439640f1..aabb58c0d68eff 100644
--- a/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td
@@ -10,6 +10,13 @@
//
//===----------------------------------------------------------------------===//
+// ===----------------------------------------------------------------------===//
+// LoongArch specific DAG Nodes.
+// ===----------------------------------------------------------------------===//
+
+def loongarch_frecipe_d : SDNode<"LoongArchISD::FRECIPE_D", SDT_LoongArchFRECIPE>;
+def loongarch_frsqrte_d : SDNode<"LoongArchISD::FRSQRTE_D", SDT_LoongArchFRECIPE>;
+
//===----------------------------------------------------------------------===//
// Instructions
//===----------------------------------------------------------------------===//
@@ -253,6 +260,8 @@ let Predicates = [HasFrecipe] in {
// FP approximate reciprocal operation
def : Pat<(int_loongarch_frecipe_d FPR64:$src), (FRECIPE_D FPR64:$src)>;
def : Pat<(int_loongarch_frsqrte_d FPR64:$src), (FRSQRTE_D FPR64:$src)>;
+def : Pat<(loongarch_frecipe_d FPR64:$src), (FRECIPE_D FPR64:$src)>;
+def : Pat<(loongarch_frsqrte_d FPR64:$src), (FRSQRTE_D FPR64:$src)>;
}
// fmadd.d: fj * fk + fa
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index bfafb331752108..bbff8d097a80e7 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -4697,6 +4697,18 @@ const char *LoongArchTargetLowering::getTargetNodeName(unsigned Opcode) const {
NODE_NAME_CASE(VANY_ZERO)
NODE_NAME_CASE(VALL_NONZERO)
NODE_NAME_CASE(VANY_NONZERO)
+ NODE_NAME_CASE(FRECIPE_S)
+ NODE_NAME_CASE(FRECIPE_D)
+ NODE_NAME_CASE(FRSQRTE_S)
+ NODE_NAME_CASE(FRSQRTE_D)
+ NODE_NAME_CASE(VFRECIPE_S)
+ NODE_NAME_CASE(VFRECIPE_D)
+ NODE_NAME_CASE(VFRSQRTE_S)
+ NODE_NAME_CASE(VFRSQRTE_D)
+ NODE_NAME_CASE(XVFRECIPE_S)
+ NODE_NAME_CASE(XVFRECIPE_D)
+ NODE_NAME_CASE(XVFRSQRTE_S)
+ NODE_NAME_CASE(XVFRSQRTE_D)
}
#undef NODE_NAME_CASE
return nullptr;
@@ -5902,6 +5914,91 @@ Register LoongArchTargetLowering::getExceptionSelectorRegister(
return LoongArch::R5;
}
+//===----------------------------------------------------------------------===//
+// Target Optimization Hooks
+//===----------------------------------------------------------------------===//
+
+static int getEstimateRefinementSteps(EVT VT, const LoongArchSubtarget &Subtarget) {
+ // Feature FRECIPE instrucions relative accuracy is 2^-14.
+ // IEEE float has 23 digits and double has 52 digits.
+ int RefinementSteps = VT.getScalarType() == MVT::f64 ? 2: 1;
+ return RefinementSteps;
+}
+
+SDValue LoongArchTargetLowering::getSqrtEstimate(SDValue Operand,
+ SelectionDAG &DAG, int Enabled,
+ int &RefinementSteps,
+ bool &UseOneConstNR,
+ bool Reciprocal) const {
+ if (Subtarget.hasFrecipe()) {
+ SDLoc DL(Operand);
+ EVT VT = Operand.getValueType();
+ unsigned Opcode;
+
+ if (VT == MVT::f32) {
+ Opcode = LoongArchISD::FRSQRTE_S;
+ } else if (VT == MVT::f64 && Subtarget.hasBasicD()) {
+ Opcode = LoongArchISD::FRSQRTE_D;
+ } else if (VT == MVT::v4f32 && Subtarget.hasExtLSX()) {
+ Opcode = LoongArchISD::VFRSQRTE_S;
+ } else if (VT == MVT::v2f64 && Subtarget.hasExtLSX()) {
+ Opcode = LoongArchISD::VFRSQRTE_D;
+ } else if (VT == MVT::v8f32 && Subtarget.hasExtLASX()) {
+ Opcode = LoongArchISD::XVFRSQRTE_S;
+ } else if (VT == MVT::v4f64 && Subtarget.hasExtLASX()) {
+ Opcode = LoongArchISD::XVFRSQRTE_D;
+ } else {
+ return SDValue();
+ }
+
+ UseOneConstNR = false;
+ if (RefinementSteps == ReciprocalEstimate::Unspecified)
+ RefinementSteps = getEstimateRefinementSteps(VT, Subtarget);
+
+ SDValue Estimate = DAG.getNode(Opcode, DL, VT, Operand);
+ if (Reciprocal) {
+ Estimate = DAG.getNode(ISD::FMUL, DL, VT, Operand, Estimate);
+ }
+ return Estimate;
+ }
+
+ return SDValue();
+}
+
+SDValue LoongArchTargetLowering::getRecipEstimate(SDValue Operand,
+ SelectionDAG &DAG,
+ int Enabled,
+ int &RefinementSteps) const {
+ if (Subtarget.hasFrecipe()) {
+ SDLoc DL(Operand);
+ EVT VT = Operand.getValueType();
+ unsigned Opcode;
+
+ if (VT == MVT::f32) {
+ Opcode = LoongArchISD::FRECIPE_S;
+ } else if (VT == MVT::f64 && Subtarget.hasBasicD()) {
+ Opcode = LoongArchISD::FRECIPE_D;
+ } else if (VT == MVT::v4f32 && Subtarget.hasExtLSX()) {
+ Opcode = LoongArchISD::VFRECIPE_S;
+ } else if (VT == MVT::v2f64 && Subtarget.hasExtLSX()) {
+ Opcode = LoongArchISD::VFRECIPE_D;
+ } else if (VT == MVT::v8f32 && Subtarget.hasExtLASX()) {
+ Opcode = LoongArchISD::XVFRECIPE_S;
+ } else if (VT == MVT::v4f64 && Subtarget.hasExtLASX()) {
+ Opcode = LoongArchISD::XVFRECIPE_D;
+ } else {
+ return SDValue();
+ }
+
+ if (RefinementSteps == ReciprocalEstimate::Unspecified)
+ RefinementSteps = getEstimateRefinementSteps(VT, Subtarget);
+
+ return DAG.getNode(Opcode, DL, VT, Operand);
+ }
+
+ return SDValue();
+}
+
//===----------------------------------------------------------------------===//
// LoongArch Inline Assembly Support
//===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.h b/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
index 6177884bd19501..a721cfc5f518e1 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
@@ -141,6 +141,22 @@ enum NodeType : unsigned {
VALL_NONZERO,
VANY_NONZERO,
+ // Floating point approximate reciprocal operation
+ FRECIPE_S,
+ FRECIPE_D,
+ FRSQRTE_S,
+ FRSQRTE_D,
+
+ VFRECIPE_S,
+ VFRECIPE_D,
+ VFRSQRTE_S,
+ VFRSQRTE_D,
+
+ XVFRECIPE_S,
+ XVFRECIPE_D,
+ XVFRSQRTE_S,
+ XVFRSQRTE_D,
+
// Intrinsic operations end =============================================
};
} // end namespace LoongArchISD
@@ -216,6 +232,17 @@ class LoongArchTargetLowering : public TargetLowering {
Register
getExceptionSelectorRegister(const Constant *PersonalityFn) const override;
+ bool isFsqrtCheap(SDValue Operand, SelectionDAG &DAG) const override {
+ return true;
+ }
+
+ SDValue getSqrtEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
+ int &RefinementSteps, bool &UseOneConstNR,
+ bool Reciprocal) const override;
+
+ SDValue getRecipEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
+ int &RefinementSteps) const override;
+
ISD::NodeType getExtendForAtomicOps() const override {
return ISD::SIGN_EXTEND;
}
diff --git a/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
index dd7e5713e45fe9..23ae6f038dceff 100644
--- a/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
@@ -9,9 +9,18 @@
// This file describes the Advanced SIMD extension instructions.
//
//===----------------------------------------------------------------------===//
+def SDT_LoongArchXVFRECIPE_S : SDTypeProfile<1, 1, [SDTCisVT<0, v8f32>, SDTCisVT<1, v8f32>]>;
+def SDT_LoongArchXVFRECIPE_D : SDTypeProfile<1, 1, [SDTCisVT<0, v4f64>, SDTCisVT<1, v4f64>]>;
+// Target nodes.
def loongarch_xvpermi: SDNode<"LoongArchISD::XVPERMI", SDT_LoongArchV1RUimm>;
+def loongarch_xvfrecipe_s: SDNode<"LoongArchISD::XVFRECIPE_S", SDT_LoongArchXVFRECIPE_S>;
+def loongarch_xvfrecipe_d: SDNode<"LoongArchISD::XVFRECIPE_D", SDT_LoongArchXVFRECIPE_D>;
+def loongarch_xvfrsqrte_s: SDNode<"LoongArchISD::XVFRSQRTE_S", SDT_LoongArchXVFRECIPE_S>;
+def loongarch_xvfrsqrte_d: SDNode<"LoongArchISD::XVFRSQRTE_D", SDT_LoongArchXVFRECIPE_D>;
+
+
def lasxsplati8
: PatFrag<(ops node:$e0),
(v32i8 (build_vector node:$e0, node:$e0, node:$e0, node:$e0,
@@ -2094,6 +2103,15 @@ foreach Inst = ["XVFRECIPE_S", "XVFRSQRTE_S"] in
foreach Inst = ["XVFRECIPE_D", "XVFRSQRTE_D"] in
def : Pat<(deriveLASXIntrinsic<Inst>.ret (v4f64 LASX256:$xj)),
(!cast<LAInst>(Inst) LASX256:$xj)>;
+
+def : Pat<(loongarch_xvfrecipe_s v8f32:$src),
+ (XVFRECIPE_S v8f32:$src)>;
+def : Pat<(loongarch_xvfrecipe_d v4f64:$src),
+ (XVFRECIPE_D v4f64:$src)>;
+def : Pat<(loongarch_xvfrsqrte_s v8f32:$src),
+ (XVFRSQRTE_S v8f32:$src)>;
+def : Pat<(loongarch_xvfrsqrte_d v4f64:$src),
+ (XVFRSQRTE_D v4f64:$src)>;
}
def : Pat<(int_loongarch_lasx_xvpickve_w_f v8f32:$xj, timm:$imm),
diff --git a/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
index e7ac9f3bd04cbf..510b1241edd4e0 100644
--- a/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
@@ -23,6 +23,8 @@ def SDT_LoongArchV2R : SDTypeProfile<1, 2, [SDTCisVec<0>,
SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>]>;
def SDT_LoongArchV1RUimm: SDTypeProfile<1, 2, [SDTCisVec<0>,
SDTCisSameAs<0,1>, SDTCisVT<2, i64>]>;
+def SDT_LoongArchVFRECIPE_S : SDTypeProfile<1, 1, [SDTCisVT<0, v4f32>, SDTCisVT<1, v4f32>]>;
+def SDT_LoongArchVFRECIPE_D : SDTypeProfile<1, 1, [SDTCisVT<0, v2f64>, SDTCisVT<1, v2f64>]>;
// Target nodes.
def loongarch_vreplve : SDNode<"LoongArchISD::VREPLVE", SDT_LoongArchVreplve>;
@@ -50,6 +52,10 @@ def loongarch_vilvh: SDNode<"LoongArchISD::VILVH", SDT_LoongArchV2R>;
def loongarch_vshuf4i: SDNode<"LoongArchISD::VSHUF4I", SDT_LoongArchV1RUimm>;
def loongarch_vreplvei: SDNode<"LoongArchISD::VREPLVEI", SDT_LoongArchV1RUimm>;
+def loongarch_vfrecipe_s: SDNode<"LoongArchISD::VFRECIPE_S", SDT_LoongArchVFRECIPE_S>;
+def loongarch_vfrecipe_d: SDNode<"LoongArchISD::VFRECIPE_D", SDT_LoongArchVFRECIPE_D>;
+def loongarch_vfrsqrte_s: SDNode<"LoongArchISD::VFRSQRTE_S", SDT_LoongArchVFRECIPE_S>;
+def loongarch_vfrsqrte_d: SDNode<"LoongArchISD::VFRSQRTE_D", SDT_LoongArchVFRECIPE_D>;
def immZExt1 : ImmLeaf<i64, [{return isUInt<1>(Imm);}]>;
def immZExt2 : ImmLeaf<i64, [{return isUInt<2>(Imm);}]>;
@@ -2238,6 +2244,15 @@ foreach Inst = ["VFRECIPE_S", "VFRSQRTE_S"] in
foreach Inst = ["VFRECIPE_D", "VFRSQRTE_D"] in
def : Pat<(deriveLSXIntrinsic<Inst>.ret (v2f64 LSX128:$vj)),
(!cast<LAInst>(Inst) LSX128:$vj)>;
+
+def : Pat<(loongarch_vfrecipe_s v4f32:$src),
+ (VFRECIPE_S v4f32:$src)>;
+def : Pat<(loongarch_vfrecipe_d v2f64:$src),
+ (VFRECIPE_D v2f64:$src)>;
+def : Pat<(loongarch_vfrsqrte_s v4f32:$src),
+ (VFRSQRTE_S v4f32:$src)>;
+def : Pat<(loongarch_vfrsqrte_d v2f64:$src),
+ (VFRSQRTE_D v2f64:$src)>;
}
// load
diff --git a/llvm/test/CodeGen/LoongArch/fdiv-reciprocal-estimate.ll b/llvm/test/CodeGen/LoongArch/fdiv-reciprocal-estimate.ll
new file mode 100644
index 00000000000000..b4b280a43055f1
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/fdiv-reciprocal-estimate.ll
@@ -0,0 +1,43 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc --mtriple=loongarch64 --mattr=+d,-frecipe < %s | FileCheck %s --check-prefix=FAULT
+; RUN: llc --mtriple=loongarch64 --mattr=+d,+frecipe < %s | FileCheck %s
+
+;; Exercise the 'fdiv' LLVM IR: https://llvm.org/docs/LangRef.html#fdiv-instruction
+
+define float @fdiv_s(float %x, float %y) {
+; FAULT-LABEL: fdiv_s:
+; FAULT: # %bb.0:
+; FAULT-NEXT: fdiv.s $fa0, $fa0, $fa1
+; FAULT-NEXT: ret
+;
+; CHECK-LABEL: fdiv_s:
+; CHECK: # %bb.0:
+; CHECK-NEXT: frecipe.s $fa2, $fa1
+; CHECK-NEXT: fmul.s $fa3, $fa0, $fa2
+; CHECK-NEXT: fnmsub.s $fa0, $fa1, $fa3, $fa0
+; CHECK-NEXT: fmadd.s $fa0, $fa2, $fa0, $fa3
+; CHECK-NEXT: ret
+ %div = fdiv fast float %x, %y
+ ret float %div
+}
+
+define double @fdiv_d(double %x, double %y) {
+; FAULT-LABEL: fdiv_d:
+; FAULT: # %bb.0:
+; FAULT-NEXT: fdiv.d $fa0, $fa0, $fa1
+; FAULT-NEXT: ret
+;
+; CHECK-LABEL: fdiv_d:
+; CHECK: # %bb.0:
+; CHECK-NEXT: pcalau12i $a0, %pc_hi20(.LCPI1_0)
+; CHECK-NEXT: fld.d $fa2, $a0, %pc_lo12(.LCPI1_0)
+; CHECK-NEXT: frecipe.d $fa3, $fa1
+; CHECK-NEXT: fmadd.d $fa2, $fa1, $fa3, $fa2
+; CHECK-NEXT: fnmsub.d $fa2, $fa2, $fa3, $fa3
+; CHECK-NEXT: fmul.d $fa3, $fa0, $fa2
+; CHECK-NEXT: fnmsub.d $fa0, $fa1, $fa3, $fa0
+; CHECK-NEXT: fmadd.d $fa0, $fa2, $fa0, $fa3
+; CHECK-NEXT: ret
+ %div = fdiv fast double %x, %y
+ ret double %div
+}
diff --git a/llvm/test/CodeGen/LoongArch/fsqrt-reciprocal-estimate.ll b/llvm/test/CodeGen/LoongArch/fsqrt-reciprocal-estimate.ll
new file mode 100644
index 00000000000000..d683487fdd4073
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/fsqrt-reciprocal-estimate.ll
@@ -0,0 +1,209 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc --mtriple=loongarch64 --mattr=+d,-frecipe < %s | FileCheck %s --check-prefix=FAULT
+; RUN: llc --mtriple=loongarch64 --mattr=+d,+frecipe < %s | FileCheck %s
+
+declare float @llvm.sqrt.f32(float)
+declare double @llvm.sqrt.f64(double)
+
+define float @frsqrt_f32(float %a) nounwind {
+; FAULT-LABEL: frsqrt_f32:
+; FAULT: # %bb.0:
+; FAULT-NEXT: frsqrt.s $fa0, $fa0
+; FAULT-NEXT: ret
+;
+; CHECK-LABEL: frsqrt_f32:
+; CHECK: # %bb.0:
+; CHECK-NEXT: frsqrte.s $fa1, $fa0
+; CHECK-NEXT: pcalau12i $a0, %pc_hi20(.LCPI0_0)
+; CHECK-NEXT: fld.s $fa2, $a0, %pc_lo12(.LCPI0_0)
+; CHECK-NEXT: pcalau12i $a0, %pc_hi20(.LCPI0_1)
+; CHECK-NEXT: fld.s $fa3, $a0, %pc_lo12(.LCPI0_1)
+; CHECK-NEXT: fmul.s $fa1, $fa0, $fa1
+; CHECK-NEXT: fmul.s $fa0, $fa0, $fa1
+; CHECK-NEXT: fmadd.s $fa0, $fa0, $fa1, $fa2
+; CHECK-NEXT: fmul.s $fa1, $fa1, $fa3
+; CHECK-NEXT: fmul.s $fa0, $fa1, $fa0
+; CHECK-NEXT: ret
+
+ %1 = call fast float @llvm.sqrt.f32(float %a)
+ %2 = fdiv fast float 1.0, %1
+ ret float %2
+}
+
+define double @frsqrt_f64(double %a) nounwind {
+; FAULT-LABEL: frsqrt_f64:
+; FAULT: # %bb.0:
+; FAULT-NEXT: frsqrt.d $fa0, $fa0
+; FAULT-NEXT: ret
+;
+; CHECK-LABEL: frsqrt_f64:
+; CHECK: # %bb.0:
+; CHECK-NEXT: frsqrte.d $fa1, $fa0
+; CHECK-NEXT: pcalau12i $a0, %pc_hi20(.LCPI1_0)
+; CHECK-NEXT: fld.d $fa2, $a0, %pc_lo12(.LCPI1_0)
+; CHECK-NEXT: pcalau12i $a0, %pc_hi20(.LCPI1_1)
+; CHECK-NEXT: fld.d $fa3, $a0, %pc_lo12(.LCPI1_1)
+; CHECK-NEXT: fmul.d $fa1, $fa0, $fa1
+; CHECK-NEXT: fmul.d $fa4, $fa0, $fa1
+; CHECK-NEXT: fmadd.d $fa4, $fa4, $fa1, $fa2
+; CHECK-NEXT: fmul.d $fa1, $fa1, $fa3
+; CHECK-NEXT: fmul.d $fa1, $fa1, $fa4
+; CHECK-NEXT: fmul.d $fa0, $fa0, $fa1
+; CHECK-NEXT: fmadd.d $fa0, $fa0, $fa1, $fa2
+; CHECK-NEXT: fmul.d $fa1, $fa1, $fa3
+; CHECK-NEXT: fmul.d $fa0, $fa1, $fa0
+; CHECK-NEXT: ret
+ %1 = call fast double @llvm.sqrt.f64(double %a)
+ %2 = fdiv fast double 1.0, %1
+ ret double %2
+}
+
+define double @sqrt_simplify_before_recip_3_uses(double %x, ptr %p1, ptr %p2) nounwind {
+; FAULT-LABEL: sqrt_simplify_before_recip_3_uses:
+; FAULT: # %bb.0:
+; FAULT-NEXT: pcalau12i $a2, %pc_hi20(.LCPI2_0)
+; FAULT-NEXT: fld.d $fa2, $a2, %pc_lo12(.LCPI2_0)
+; FAULT-NEXT: fsqrt.d $fa1, $fa0
+; FAULT-NEXT: frsqrt.d $fa0, $fa0
+; FAULT-NEXT: fdiv.d $fa2, $fa2, $fa1
+; FAULT-NEXT: fst.d $fa0, $a0, 0
+; FAULT-NEXT: fst.d $fa2, $a1, 0
+; FAULT-NEXT: fmov.d $fa0, $fa1
+; FAULT-NEXT: ret
+;
+; CHECK-LABEL: sqrt_simplify_before_recip_3_uses:
+; CHECK: # %bb.0:
+; CHECK-NEXT: frsqrte.d $fa1, $fa0
+; CHECK-NEXT: pcalau12i $a2, %pc_hi20(.LCPI2_0)
+; CHECK-NEXT: fld.d $fa2, $a2, %pc_lo12(.LCPI2_0)
+; CHECK-NEXT: pcalau12i $a2, %pc_hi20(.LCPI2_1)
+; CHECK-NEXT: fld.d $fa3, $a2, %pc_lo12(.LCPI2_1)
+; CHECK-NEXT: fmul.d $fa1, $fa0, $fa1
+; CHECK-NEXT: fmul.d $fa4, $fa0, $fa1
+; CHECK-NEXT: fmadd.d $fa4, $fa4, $fa1, $fa2
+; CHECK-NEXT: fmul.d $fa1, $fa1, $fa3
+; CHECK-NEXT: fmul.d $fa1, $fa1, $fa4
+; CHECK-NEXT: fmul.d $fa4, $fa0, $fa1
+; CHECK-NEXT: pcalau12i $a2, %pc_hi20(.LCPI2_2)
+; CHECK-NEXT: fld.d $fa5, $a2, %pc_lo12(.LCPI2_2)
+; CHECK-NEXT: fmadd.d $fa2, $fa4, $fa1, $fa2
+; CHECK-NEXT: fmul.d $fa1, $fa1...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
…ipe & frsqrte instructions
eff8b3b
to
b720995
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks.
…e supplementary tests
587c4e0
to
37454c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except a nit.
Co-authored-by: Lu Weining <[email protected]>
@tangaac Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/7233 Here is the relevant piece of the build log for the reference
|
Two options:
-mfrecipe
&-mno-frecipe
.Enable or Disable frecipe.{s/d} and frsqrte.{s/d} instructions.
The default is
-mno-frecipe
.