Skip to content

[LoongArch] [CodeGen] Add options for Clang to generate LoongArch-specific frecipe & frsqrte instructions #109917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 18, 2024

Conversation

tangaac
Copy link
Contributor

@tangaac tangaac commented Sep 25, 2024

Two options: -mfrecipe & -mno-frecipe.
Enable or Disable frecipe.{s/d} and frsqrte.{s/d} instructions.
The default is -mno-frecipe.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' backend:loongarch labels Sep 25, 2024
@llvmbot
Copy link
Member

llvmbot commented Sep 25, 2024

@llvm/pr-subscribers-backend-loongarch

@llvm/pr-subscribers-clang

Author: None (tangaac)

Changes

Two options: -mfrecipe & -mno-frecipe.
Enable or Disable frecipe.{s/d} and frsqrte.{s/d} instructions.
The default is -mno-frecipe.


Patch is 39.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109917.diff

14 Files Affected:

  • (modified) clang/include/clang/Driver/Options.td (+4)
  • (modified) clang/lib/Driver/ToolChains/Arch/LoongArch.cpp (+14)
  • (modified) llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td (+5)
  • (modified) llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td (+9)
  • (modified) llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp (+97)
  • (modified) llvm/lib/Target/LoongArch/LoongArchISelLowering.h (+27)
  • (modified) llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td (+18)
  • (modified) llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td (+15)
  • (added) llvm/test/CodeGen/LoongArch/fdiv-reciprocal-estimate.ll (+43)
  • (added) llvm/test/CodeGen/LoongArch/fsqrt-reciprocal-estimate.ll (+209)
  • (added) llvm/test/CodeGen/LoongArch/lasx/fdiv-reciprocal-estimate.ll (+114)
  • (added) llvm/test/CodeGen/LoongArch/lasx/fsqrt-reciprocal-estimate.ll (+75)
  • (added) llvm/test/CodeGen/LoongArch/lsx/fdiv-reciprocal-estimate.ll (+114)
  • (added) llvm/test/CodeGen/LoongArch/lsx/fsqrt-reciprocal-estimate.ll (+75)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 23bd686a85f526..811fb5490d6707 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -5373,6 +5373,10 @@ def mno_lasx : Flag<["-"], "mno-lasx">, Group<m_loongarch_Features_Group>,
 def msimd_EQ : Joined<["-"], "msimd=">, Group<m_loongarch_Features_Group>,
   Flags<[TargetSpecific]>,
   HelpText<"Select the SIMD extension(s) to be enabled in LoongArch either 'none', 'lsx', 'lasx'.">;
+def mfrecipe : Flag<["-"], "mfrecipe">, Group<m_loongarch_Features_Group>,
+  HelpText<"Enable frecipe.{s/d} and frsqrte.{s/d}">;
+def mno_frecipe : Flag<["-"], "mno-frecipe">, Group<m_loongarch_Features_Group>,
+  HelpText<"Disable frecipe.{s/d} and frsqrte.{s/d}">;
 def mnop_mcount : Flag<["-"], "mnop-mcount">, HelpText<"Generate mcount/__fentry__ calls as nops. To activate they need to be patched in.">,
   Visibility<[ClangOption, CC1Option]>, Group<m_Group>,
   MarshallingInfoFlag<CodeGenOpts<"MNopMCount">>;
diff --git a/clang/lib/Driver/ToolChains/Arch/LoongArch.cpp b/clang/lib/Driver/ToolChains/Arch/LoongArch.cpp
index 771adade93813f..62233a32d0d396 100644
--- a/clang/lib/Driver/ToolChains/Arch/LoongArch.cpp
+++ b/clang/lib/Driver/ToolChains/Arch/LoongArch.cpp
@@ -251,6 +251,20 @@ void loongarch::getLoongArchTargetFeatures(const Driver &D,
     } else /*-mno-lasx*/
       Features.push_back("-lasx");
   }
+
+  // Select frecipe feature determined by -m[no-]frecipe.
+  if (const Arg *A =
+          Args.getLastArg(options::OPT_mfrecipe, options::OPT_mno_frecipe)) {
+    // FRECIPE depends on 64-bit FPU.
+    // -mno-frecipe conflicts with -mfrecipe.
+    if (A->getOption().matches(options::OPT_mfrecipe)) {
+      if (llvm::find(Features, "-d") != Features.end())
+        D.Diag(diag::err_drv_loongarch_wrong_fpu_width) << /*FRECIPE*/ 2;
+      else /*-mfrecipe*/
+        Features.push_back("+frecipe");
+    } else /*-mnofrecipe*/
+      Features.push_back("-frecipe");
+  }
 }
 
 std::string loongarch::postProcessTargetCPUString(const std::string &CPU,
diff --git a/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td
index d6a83c0c8cd8fb..8f909d26cfd08a 100644
--- a/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td
@@ -19,12 +19,15 @@ def SDT_LoongArchMOVGR2FR_W_LA64
 def SDT_LoongArchMOVFR2GR_S_LA64
     : SDTypeProfile<1, 1, [SDTCisVT<0, i64>, SDTCisVT<1, f32>]>;
 def SDT_LoongArchFTINT : SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisFP<1>]>;
+def SDT_LoongArchFRECIPE : SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisFP<1>]>;
 
 def loongarch_movgr2fr_w_la64
     : SDNode<"LoongArchISD::MOVGR2FR_W_LA64", SDT_LoongArchMOVGR2FR_W_LA64>;
 def loongarch_movfr2gr_s_la64
     : SDNode<"LoongArchISD::MOVFR2GR_S_LA64", SDT_LoongArchMOVFR2GR_S_LA64>;
 def loongarch_ftint : SDNode<"LoongArchISD::FTINT", SDT_LoongArchFTINT>;
+def loongarch_frecipe_s : SDNode<"LoongArchISD::FRECIPE_S", SDT_LoongArchFRECIPE>;
+def loongarch_frsqrte_s : SDNode<"LoongArchISD::FRSQRTE_S", SDT_LoongArchFRECIPE>;
 
 //===----------------------------------------------------------------------===//
 // Instructions
@@ -286,6 +289,8 @@ let Predicates = [HasFrecipe] in {
 // FP approximate reciprocal operation
 def : Pat<(int_loongarch_frecipe_s FPR32:$src), (FRECIPE_S FPR32:$src)>;
 def : Pat<(int_loongarch_frsqrte_s FPR32:$src), (FRSQRTE_S FPR32:$src)>;
+def : Pat<(loongarch_frecipe_s FPR32:$src), (FRECIPE_S FPR32:$src)>;
+def : Pat<(loongarch_frsqrte_s FPR32:$src), (FRSQRTE_S FPR32:$src)>;
 }
 
 // fmadd.s: fj * fk + fa
diff --git a/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td
index 30cce8439640f1..aabb58c0d68eff 100644
--- a/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td
@@ -10,6 +10,13 @@
 //
 //===----------------------------------------------------------------------===//
 
+// ===----------------------------------------------------------------------===//
+// LoongArch specific DAG Nodes.
+// ===----------------------------------------------------------------------===//
+
+def loongarch_frecipe_d : SDNode<"LoongArchISD::FRECIPE_D", SDT_LoongArchFRECIPE>;
+def loongarch_frsqrte_d : SDNode<"LoongArchISD::FRSQRTE_D", SDT_LoongArchFRECIPE>;
+
 //===----------------------------------------------------------------------===//
 // Instructions
 //===----------------------------------------------------------------------===//
@@ -253,6 +260,8 @@ let Predicates = [HasFrecipe] in {
 // FP approximate reciprocal operation
 def : Pat<(int_loongarch_frecipe_d FPR64:$src), (FRECIPE_D FPR64:$src)>;
 def : Pat<(int_loongarch_frsqrte_d FPR64:$src), (FRSQRTE_D FPR64:$src)>;
+def : Pat<(loongarch_frecipe_d FPR64:$src), (FRECIPE_D FPR64:$src)>;
+def : Pat<(loongarch_frsqrte_d FPR64:$src), (FRSQRTE_D FPR64:$src)>;
 }
 
 // fmadd.d: fj * fk + fa
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index bfafb331752108..bbff8d097a80e7 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -4697,6 +4697,18 @@ const char *LoongArchTargetLowering::getTargetNodeName(unsigned Opcode) const {
     NODE_NAME_CASE(VANY_ZERO)
     NODE_NAME_CASE(VALL_NONZERO)
     NODE_NAME_CASE(VANY_NONZERO)
+    NODE_NAME_CASE(FRECIPE_S)
+    NODE_NAME_CASE(FRECIPE_D)
+    NODE_NAME_CASE(FRSQRTE_S)
+    NODE_NAME_CASE(FRSQRTE_D)
+    NODE_NAME_CASE(VFRECIPE_S)
+    NODE_NAME_CASE(VFRECIPE_D)
+    NODE_NAME_CASE(VFRSQRTE_S)
+    NODE_NAME_CASE(VFRSQRTE_D)
+    NODE_NAME_CASE(XVFRECIPE_S)
+    NODE_NAME_CASE(XVFRECIPE_D)
+    NODE_NAME_CASE(XVFRSQRTE_S)
+    NODE_NAME_CASE(XVFRSQRTE_D)
   }
 #undef NODE_NAME_CASE
   return nullptr;
@@ -5902,6 +5914,91 @@ Register LoongArchTargetLowering::getExceptionSelectorRegister(
   return LoongArch::R5;
 }
 
+//===----------------------------------------------------------------------===//
+// Target Optimization Hooks
+//===----------------------------------------------------------------------===//
+
+static int getEstimateRefinementSteps(EVT VT, const LoongArchSubtarget &Subtarget) {
+  // Feature FRECIPE instrucions relative accuracy is 2^-14. 
+  // IEEE float has 23 digits and double has 52 digits.
+  int RefinementSteps = VT.getScalarType() == MVT::f64 ? 2: 1;
+  return RefinementSteps;
+}
+
+SDValue LoongArchTargetLowering::getSqrtEstimate(SDValue Operand,
+                                                 SelectionDAG &DAG, int Enabled,
+                                                 int &RefinementSteps,
+                                                 bool &UseOneConstNR,
+                                                 bool Reciprocal) const {
+  if (Subtarget.hasFrecipe()) {
+    SDLoc DL(Operand);
+    EVT VT = Operand.getValueType();
+    unsigned Opcode;
+
+    if (VT == MVT::f32) {
+      Opcode = LoongArchISD::FRSQRTE_S;
+    } else if (VT == MVT::f64 && Subtarget.hasBasicD()) {
+      Opcode = LoongArchISD::FRSQRTE_D;
+    } else if (VT == MVT::v4f32 && Subtarget.hasExtLSX()) {
+      Opcode = LoongArchISD::VFRSQRTE_S;
+    } else if (VT == MVT::v2f64 && Subtarget.hasExtLSX()) {
+      Opcode = LoongArchISD::VFRSQRTE_D;
+    } else if (VT == MVT::v8f32 && Subtarget.hasExtLASX()) {
+      Opcode = LoongArchISD::XVFRSQRTE_S;
+    } else if (VT == MVT::v4f64 && Subtarget.hasExtLASX()) {
+      Opcode = LoongArchISD::XVFRSQRTE_D;
+    } else {
+      return SDValue();
+    }
+
+    UseOneConstNR = false;
+    if (RefinementSteps == ReciprocalEstimate::Unspecified)
+      RefinementSteps = getEstimateRefinementSteps(VT, Subtarget);
+
+    SDValue Estimate = DAG.getNode(Opcode, DL, VT, Operand);
+    if (Reciprocal) {
+      Estimate = DAG.getNode(ISD::FMUL, DL, VT, Operand, Estimate);
+    }
+    return Estimate;
+  }
+
+  return SDValue();
+}
+
+SDValue LoongArchTargetLowering::getRecipEstimate(SDValue Operand,
+                                                  SelectionDAG &DAG,
+                                                  int Enabled,
+                                                  int &RefinementSteps) const {
+  if (Subtarget.hasFrecipe()) {
+    SDLoc DL(Operand);
+    EVT VT = Operand.getValueType();
+    unsigned Opcode;
+
+    if (VT == MVT::f32) {
+      Opcode = LoongArchISD::FRECIPE_S;
+    } else if (VT == MVT::f64 && Subtarget.hasBasicD()) {
+      Opcode = LoongArchISD::FRECIPE_D;
+    } else if (VT == MVT::v4f32 && Subtarget.hasExtLSX()) {
+      Opcode = LoongArchISD::VFRECIPE_S;
+    } else if (VT == MVT::v2f64 && Subtarget.hasExtLSX()) {
+      Opcode = LoongArchISD::VFRECIPE_D;
+    } else if (VT == MVT::v8f32 && Subtarget.hasExtLASX()) {
+      Opcode = LoongArchISD::XVFRECIPE_S;
+    } else if (VT == MVT::v4f64 && Subtarget.hasExtLASX()) {
+      Opcode = LoongArchISD::XVFRECIPE_D;
+    } else {
+      return SDValue();
+    }
+
+    if (RefinementSteps == ReciprocalEstimate::Unspecified)
+      RefinementSteps = getEstimateRefinementSteps(VT, Subtarget);
+
+    return DAG.getNode(Opcode, DL, VT, Operand);
+  }
+
+  return SDValue();
+}
+
 //===----------------------------------------------------------------------===//
 //                           LoongArch Inline Assembly Support
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.h b/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
index 6177884bd19501..a721cfc5f518e1 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
@@ -141,6 +141,22 @@ enum NodeType : unsigned {
   VALL_NONZERO,
   VANY_NONZERO,
 
+  // Floating point approximate reciprocal operation
+  FRECIPE_S,
+  FRECIPE_D,
+  FRSQRTE_S,
+  FRSQRTE_D,
+
+  VFRECIPE_S,
+  VFRECIPE_D,
+  VFRSQRTE_S,
+  VFRSQRTE_D,
+
+  XVFRECIPE_S,
+  XVFRECIPE_D,
+  XVFRSQRTE_S,
+  XVFRSQRTE_D,
+
   // Intrinsic operations end =============================================
 };
 } // end namespace LoongArchISD
@@ -216,6 +232,17 @@ class LoongArchTargetLowering : public TargetLowering {
   Register
   getExceptionSelectorRegister(const Constant *PersonalityFn) const override;
 
+  bool isFsqrtCheap(SDValue Operand, SelectionDAG &DAG) const override {
+    return true;
+  }
+
+  SDValue getSqrtEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
+                          int &RefinementSteps, bool &UseOneConstNR,
+                          bool Reciprocal) const override;
+
+  SDValue getRecipEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
+                           int &RefinementSteps) const override;
+
   ISD::NodeType getExtendForAtomicOps() const override {
     return ISD::SIGN_EXTEND;
   }
diff --git a/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
index dd7e5713e45fe9..23ae6f038dceff 100644
--- a/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
@@ -9,9 +9,18 @@
 // This file describes the Advanced SIMD extension instructions.
 //
 //===----------------------------------------------------------------------===//
+def SDT_LoongArchXVFRECIPE_S : SDTypeProfile<1, 1, [SDTCisVT<0, v8f32>, SDTCisVT<1, v8f32>]>;
+def SDT_LoongArchXVFRECIPE_D : SDTypeProfile<1, 1, [SDTCisVT<0, v4f64>, SDTCisVT<1, v4f64>]>;
 
+// Target nodes.
 def loongarch_xvpermi: SDNode<"LoongArchISD::XVPERMI", SDT_LoongArchV1RUimm>;
 
+def loongarch_xvfrecipe_s: SDNode<"LoongArchISD::XVFRECIPE_S", SDT_LoongArchXVFRECIPE_S>;
+def loongarch_xvfrecipe_d: SDNode<"LoongArchISD::XVFRECIPE_D", SDT_LoongArchXVFRECIPE_D>;
+def loongarch_xvfrsqrte_s: SDNode<"LoongArchISD::XVFRSQRTE_S", SDT_LoongArchXVFRECIPE_S>;
+def loongarch_xvfrsqrte_d: SDNode<"LoongArchISD::XVFRSQRTE_D", SDT_LoongArchXVFRECIPE_D>;
+
+
 def lasxsplati8
   : PatFrag<(ops node:$e0),
             (v32i8 (build_vector node:$e0, node:$e0, node:$e0, node:$e0,
@@ -2094,6 +2103,15 @@ foreach Inst = ["XVFRECIPE_S", "XVFRSQRTE_S"] in
 foreach Inst = ["XVFRECIPE_D", "XVFRSQRTE_D"] in
   def : Pat<(deriveLASXIntrinsic<Inst>.ret (v4f64 LASX256:$xj)),
             (!cast<LAInst>(Inst) LASX256:$xj)>;
+
+def : Pat<(loongarch_xvfrecipe_s v8f32:$src), 
+          (XVFRECIPE_S v8f32:$src)>;
+def : Pat<(loongarch_xvfrecipe_d v4f64:$src), 
+          (XVFRECIPE_D v4f64:$src)>;
+def : Pat<(loongarch_xvfrsqrte_s v8f32:$src), 
+          (XVFRSQRTE_S v8f32:$src)>;
+def : Pat<(loongarch_xvfrsqrte_d v4f64:$src), 
+          (XVFRSQRTE_D v4f64:$src)>;
 }
 
 def : Pat<(int_loongarch_lasx_xvpickve_w_f v8f32:$xj, timm:$imm),
diff --git a/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
index e7ac9f3bd04cbf..510b1241edd4e0 100644
--- a/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
@@ -23,6 +23,8 @@ def SDT_LoongArchV2R : SDTypeProfile<1, 2, [SDTCisVec<0>,
                                      SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>]>;
 def SDT_LoongArchV1RUimm: SDTypeProfile<1, 2, [SDTCisVec<0>,
                                         SDTCisSameAs<0,1>, SDTCisVT<2, i64>]>;
+def SDT_LoongArchVFRECIPE_S : SDTypeProfile<1, 1, [SDTCisVT<0, v4f32>, SDTCisVT<1, v4f32>]>;
+def SDT_LoongArchVFRECIPE_D : SDTypeProfile<1, 1, [SDTCisVT<0, v2f64>, SDTCisVT<1, v2f64>]>;
 
 // Target nodes.
 def loongarch_vreplve : SDNode<"LoongArchISD::VREPLVE", SDT_LoongArchVreplve>;
@@ -50,6 +52,10 @@ def loongarch_vilvh: SDNode<"LoongArchISD::VILVH", SDT_LoongArchV2R>;
 
 def loongarch_vshuf4i: SDNode<"LoongArchISD::VSHUF4I", SDT_LoongArchV1RUimm>;
 def loongarch_vreplvei: SDNode<"LoongArchISD::VREPLVEI", SDT_LoongArchV1RUimm>;
+def loongarch_vfrecipe_s: SDNode<"LoongArchISD::VFRECIPE_S", SDT_LoongArchVFRECIPE_S>;
+def loongarch_vfrecipe_d: SDNode<"LoongArchISD::VFRECIPE_D", SDT_LoongArchVFRECIPE_D>;
+def loongarch_vfrsqrte_s: SDNode<"LoongArchISD::VFRSQRTE_S", SDT_LoongArchVFRECIPE_S>;
+def loongarch_vfrsqrte_d: SDNode<"LoongArchISD::VFRSQRTE_D", SDT_LoongArchVFRECIPE_D>;
 
 def immZExt1 : ImmLeaf<i64, [{return isUInt<1>(Imm);}]>;
 def immZExt2 : ImmLeaf<i64, [{return isUInt<2>(Imm);}]>;
@@ -2238,6 +2244,15 @@ foreach Inst = ["VFRECIPE_S", "VFRSQRTE_S"] in
 foreach Inst = ["VFRECIPE_D", "VFRSQRTE_D"] in
   def : Pat<(deriveLSXIntrinsic<Inst>.ret (v2f64 LSX128:$vj)),
             (!cast<LAInst>(Inst) LSX128:$vj)>;
+
+def : Pat<(loongarch_vfrecipe_s v4f32:$src), 
+          (VFRECIPE_S v4f32:$src)>;
+def : Pat<(loongarch_vfrecipe_d v2f64:$src), 
+          (VFRECIPE_D v2f64:$src)>;
+def : Pat<(loongarch_vfrsqrte_s v4f32:$src), 
+          (VFRSQRTE_S v4f32:$src)>;
+def : Pat<(loongarch_vfrsqrte_d v2f64:$src), 
+          (VFRSQRTE_D v2f64:$src)>;
 }
 
 // load
diff --git a/llvm/test/CodeGen/LoongArch/fdiv-reciprocal-estimate.ll b/llvm/test/CodeGen/LoongArch/fdiv-reciprocal-estimate.ll
new file mode 100644
index 00000000000000..b4b280a43055f1
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/fdiv-reciprocal-estimate.ll
@@ -0,0 +1,43 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc --mtriple=loongarch64 --mattr=+d,-frecipe < %s | FileCheck %s --check-prefix=FAULT
+; RUN: llc --mtriple=loongarch64 --mattr=+d,+frecipe < %s | FileCheck %s
+
+;; Exercise the 'fdiv' LLVM IR: https://llvm.org/docs/LangRef.html#fdiv-instruction
+
+define float @fdiv_s(float %x, float %y) {
+; FAULT-LABEL: fdiv_s:
+; FAULT:       # %bb.0:
+; FAULT-NEXT:    fdiv.s $fa0, $fa0, $fa1
+; FAULT-NEXT:    ret
+;
+; CHECK-LABEL: fdiv_s:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    frecipe.s	$fa2, $fa1
+; CHECK-NEXT:    fmul.s	$fa3, $fa0, $fa2
+; CHECK-NEXT:    fnmsub.s	$fa0, $fa1, $fa3, $fa0
+; CHECK-NEXT:    fmadd.s	$fa0, $fa2, $fa0, $fa3
+; CHECK-NEXT:    ret
+  %div = fdiv fast float %x, %y
+  ret float %div
+}
+
+define double @fdiv_d(double %x, double %y) {
+; FAULT-LABEL: fdiv_d:
+; FAULT:       # %bb.0:
+; FAULT-NEXT:    fdiv.d $fa0, $fa0, $fa1
+; FAULT-NEXT:    ret
+;
+; CHECK-LABEL: fdiv_d:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    pcalau12i	$a0, %pc_hi20(.LCPI1_0)
+; CHECK-NEXT:    fld.d	$fa2, $a0, %pc_lo12(.LCPI1_0)
+; CHECK-NEXT:    frecipe.d	$fa3, $fa1
+; CHECK-NEXT:    fmadd.d	$fa2, $fa1, $fa3, $fa2
+; CHECK-NEXT:    fnmsub.d	$fa2, $fa2, $fa3, $fa3
+; CHECK-NEXT:    fmul.d	$fa3, $fa0, $fa2
+; CHECK-NEXT:    fnmsub.d	$fa0, $fa1, $fa3, $fa0
+; CHECK-NEXT:    fmadd.d	$fa0, $fa2, $fa0, $fa3
+; CHECK-NEXT:    ret
+  %div = fdiv fast double %x, %y
+  ret double %div
+}
diff --git a/llvm/test/CodeGen/LoongArch/fsqrt-reciprocal-estimate.ll b/llvm/test/CodeGen/LoongArch/fsqrt-reciprocal-estimate.ll
new file mode 100644
index 00000000000000..d683487fdd4073
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/fsqrt-reciprocal-estimate.ll
@@ -0,0 +1,209 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc --mtriple=loongarch64 --mattr=+d,-frecipe < %s | FileCheck %s --check-prefix=FAULT
+; RUN: llc --mtriple=loongarch64 --mattr=+d,+frecipe < %s | FileCheck %s
+
+declare float @llvm.sqrt.f32(float)
+declare double @llvm.sqrt.f64(double)
+
+define float @frsqrt_f32(float %a) nounwind {
+; FAULT-LABEL: frsqrt_f32:
+; FAULT:       # %bb.0:
+; FAULT-NEXT:    frsqrt.s $fa0, $fa0
+; FAULT-NEXT:    ret
+;
+; CHECK-LABEL: frsqrt_f32:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    frsqrte.s	$fa1, $fa0
+; CHECK-NEXT:    pcalau12i	$a0, %pc_hi20(.LCPI0_0)
+; CHECK-NEXT:    fld.s	$fa2, $a0, %pc_lo12(.LCPI0_0)
+; CHECK-NEXT:    pcalau12i	$a0, %pc_hi20(.LCPI0_1)
+; CHECK-NEXT:    fld.s	$fa3, $a0, %pc_lo12(.LCPI0_1)
+; CHECK-NEXT:    fmul.s	$fa1, $fa0, $fa1
+; CHECK-NEXT:    fmul.s	$fa0, $fa0, $fa1
+; CHECK-NEXT:    fmadd.s	$fa0, $fa0, $fa1, $fa2
+; CHECK-NEXT:    fmul.s	$fa1, $fa1, $fa3
+; CHECK-NEXT:    fmul.s	$fa0, $fa1, $fa0
+; CHECK-NEXT:    ret
+
+  %1 = call fast float @llvm.sqrt.f32(float %a)
+  %2 = fdiv fast float 1.0, %1
+  ret float %2
+}
+
+define double @frsqrt_f64(double %a) nounwind {
+; FAULT-LABEL: frsqrt_f64:
+; FAULT:       # %bb.0:
+; FAULT-NEXT:    frsqrt.d $fa0, $fa0
+; FAULT-NEXT:    ret
+;
+; CHECK-LABEL: frsqrt_f64:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    frsqrte.d	$fa1, $fa0
+; CHECK-NEXT:    pcalau12i	$a0, %pc_hi20(.LCPI1_0)
+; CHECK-NEXT:    fld.d	$fa2, $a0, %pc_lo12(.LCPI1_0)
+; CHECK-NEXT:    pcalau12i	$a0, %pc_hi20(.LCPI1_1)
+; CHECK-NEXT:    fld.d	$fa3, $a0, %pc_lo12(.LCPI1_1)
+; CHECK-NEXT:    fmul.d	$fa1, $fa0, $fa1
+; CHECK-NEXT:    fmul.d	$fa4, $fa0, $fa1
+; CHECK-NEXT:    fmadd.d	$fa4, $fa4, $fa1, $fa2
+; CHECK-NEXT:    fmul.d	$fa1, $fa1, $fa3
+; CHECK-NEXT:    fmul.d	$fa1, $fa1, $fa4
+; CHECK-NEXT:    fmul.d	$fa0, $fa0, $fa1
+; CHECK-NEXT:    fmadd.d	$fa0, $fa0, $fa1, $fa2
+; CHECK-NEXT:    fmul.d	$fa1, $fa1, $fa3
+; CHECK-NEXT:    fmul.d	$fa0, $fa1, $fa0
+; CHECK-NEXT:    ret
+  %1 = call fast double @llvm.sqrt.f64(double %a)
+  %2 = fdiv fast double 1.0, %1
+  ret double %2
+}
+
+define double @sqrt_simplify_before_recip_3_uses(double %x, ptr %p1, ptr %p2) nounwind {
+; FAULT-LABEL: sqrt_simplify_before_recip_3_uses:
+; FAULT:       # %bb.0:
+; FAULT-NEXT:    pcalau12i	$a2, %pc_hi20(.LCPI2_0)
+; FAULT-NEXT:    fld.d	$fa2, $a2, %pc_lo12(.LCPI2_0)
+; FAULT-NEXT:    fsqrt.d	$fa1, $fa0
+; FAULT-NEXT:    frsqrt.d	$fa0, $fa0
+; FAULT-NEXT:    fdiv.d	$fa2, $fa2, $fa1
+; FAULT-NEXT:    fst.d	$fa0, $a0, 0
+; FAULT-NEXT:    fst.d	$fa2, $a1, 0
+; FAULT-NEXT:    fmov.d	$fa0, $fa1
+; FAULT-NEXT:    ret
+;
+; CHECK-LABEL: sqrt_simplify_before_recip_3_uses:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    frsqrte.d	$fa1, $fa0
+; CHECK-NEXT:    pcalau12i	$a2, %pc_hi20(.LCPI2_0)
+; CHECK-NEXT:    fld.d	$fa2, $a2, %pc_lo12(.LCPI2_0)
+; CHECK-NEXT:    pcalau12i	$a2, %pc_hi20(.LCPI2_1)
+; CHECK-NEXT:    fld.d	$fa3, $a2, %pc_lo12(.LCPI2_1)
+; CHECK-NEXT:    fmul.d	$fa1, $fa0, $fa1
+; CHECK-NEXT:    fmul.d	$fa4, $fa0, $fa1
+; CHECK-NEXT:    fmadd.d	$fa4, $fa4, $fa1, $fa2
+; CHECK-NEXT:    fmul.d	$fa1, $fa1, $fa3
+; CHECK-NEXT:    fmul.d	$fa1, $fa1, $fa4
+; CHECK-NEXT:    fmul.d	$fa4, $fa0, $fa1
+; CHECK-NEXT:    pcalau12i	$a2, %pc_hi20(.LCPI2_2)
+; CHECK-NEXT:    fld.d	$fa5, $a2, %pc_lo12(.LCPI2_2)
+; CHECK-NEXT:    fmadd.d	$fa2, $fa4, $fa1, $fa2
+; CHECK-NEXT:    fmul.d	$fa1, $fa1...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Sep 25, 2024

@llvm/pr-subscribers-clang-driver

Author: None (tangaac)

Changes

Two options: -mfrecipe & -mno-frecipe.
Enable or Disable frecipe.{s/d} and frsqrte.{s/d} instructions.
The default is -mno-frecipe.


Patch is 39.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109917.diff

14 Files Affected:

  • (modified) clang/include/clang/Driver/Options.td (+4)
  • (modified) clang/lib/Driver/ToolChains/Arch/LoongArch.cpp (+14)
  • (modified) llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td (+5)
  • (modified) llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td (+9)
  • (modified) llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp (+97)
  • (modified) llvm/lib/Target/LoongArch/LoongArchISelLowering.h (+27)
  • (modified) llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td (+18)
  • (modified) llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td (+15)
  • (added) llvm/test/CodeGen/LoongArch/fdiv-reciprocal-estimate.ll (+43)
  • (added) llvm/test/CodeGen/LoongArch/fsqrt-reciprocal-estimate.ll (+209)
  • (added) llvm/test/CodeGen/LoongArch/lasx/fdiv-reciprocal-estimate.ll (+114)
  • (added) llvm/test/CodeGen/LoongArch/lasx/fsqrt-reciprocal-estimate.ll (+75)
  • (added) llvm/test/CodeGen/LoongArch/lsx/fdiv-reciprocal-estimate.ll (+114)
  • (added) llvm/test/CodeGen/LoongArch/lsx/fsqrt-reciprocal-estimate.ll (+75)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 23bd686a85f526..811fb5490d6707 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -5373,6 +5373,10 @@ def mno_lasx : Flag<["-"], "mno-lasx">, Group<m_loongarch_Features_Group>,
 def msimd_EQ : Joined<["-"], "msimd=">, Group<m_loongarch_Features_Group>,
   Flags<[TargetSpecific]>,
   HelpText<"Select the SIMD extension(s) to be enabled in LoongArch either 'none', 'lsx', 'lasx'.">;
+def mfrecipe : Flag<["-"], "mfrecipe">, Group<m_loongarch_Features_Group>,
+  HelpText<"Enable frecipe.{s/d} and frsqrte.{s/d}">;
+def mno_frecipe : Flag<["-"], "mno-frecipe">, Group<m_loongarch_Features_Group>,
+  HelpText<"Disable frecipe.{s/d} and frsqrte.{s/d}">;
 def mnop_mcount : Flag<["-"], "mnop-mcount">, HelpText<"Generate mcount/__fentry__ calls as nops. To activate they need to be patched in.">,
   Visibility<[ClangOption, CC1Option]>, Group<m_Group>,
   MarshallingInfoFlag<CodeGenOpts<"MNopMCount">>;
diff --git a/clang/lib/Driver/ToolChains/Arch/LoongArch.cpp b/clang/lib/Driver/ToolChains/Arch/LoongArch.cpp
index 771adade93813f..62233a32d0d396 100644
--- a/clang/lib/Driver/ToolChains/Arch/LoongArch.cpp
+++ b/clang/lib/Driver/ToolChains/Arch/LoongArch.cpp
@@ -251,6 +251,20 @@ void loongarch::getLoongArchTargetFeatures(const Driver &D,
     } else /*-mno-lasx*/
       Features.push_back("-lasx");
   }
+
+  // Select frecipe feature determined by -m[no-]frecipe.
+  if (const Arg *A =
+          Args.getLastArg(options::OPT_mfrecipe, options::OPT_mno_frecipe)) {
+    // FRECIPE depends on 64-bit FPU.
+    // -mno-frecipe conflicts with -mfrecipe.
+    if (A->getOption().matches(options::OPT_mfrecipe)) {
+      if (llvm::find(Features, "-d") != Features.end())
+        D.Diag(diag::err_drv_loongarch_wrong_fpu_width) << /*FRECIPE*/ 2;
+      else /*-mfrecipe*/
+        Features.push_back("+frecipe");
+    } else /*-mnofrecipe*/
+      Features.push_back("-frecipe");
+  }
 }
 
 std::string loongarch::postProcessTargetCPUString(const std::string &CPU,
diff --git a/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td
index d6a83c0c8cd8fb..8f909d26cfd08a 100644
--- a/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td
@@ -19,12 +19,15 @@ def SDT_LoongArchMOVGR2FR_W_LA64
 def SDT_LoongArchMOVFR2GR_S_LA64
     : SDTypeProfile<1, 1, [SDTCisVT<0, i64>, SDTCisVT<1, f32>]>;
 def SDT_LoongArchFTINT : SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisFP<1>]>;
+def SDT_LoongArchFRECIPE : SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisFP<1>]>;
 
 def loongarch_movgr2fr_w_la64
     : SDNode<"LoongArchISD::MOVGR2FR_W_LA64", SDT_LoongArchMOVGR2FR_W_LA64>;
 def loongarch_movfr2gr_s_la64
     : SDNode<"LoongArchISD::MOVFR2GR_S_LA64", SDT_LoongArchMOVFR2GR_S_LA64>;
 def loongarch_ftint : SDNode<"LoongArchISD::FTINT", SDT_LoongArchFTINT>;
+def loongarch_frecipe_s : SDNode<"LoongArchISD::FRECIPE_S", SDT_LoongArchFRECIPE>;
+def loongarch_frsqrte_s : SDNode<"LoongArchISD::FRSQRTE_S", SDT_LoongArchFRECIPE>;
 
 //===----------------------------------------------------------------------===//
 // Instructions
@@ -286,6 +289,8 @@ let Predicates = [HasFrecipe] in {
 // FP approximate reciprocal operation
 def : Pat<(int_loongarch_frecipe_s FPR32:$src), (FRECIPE_S FPR32:$src)>;
 def : Pat<(int_loongarch_frsqrte_s FPR32:$src), (FRSQRTE_S FPR32:$src)>;
+def : Pat<(loongarch_frecipe_s FPR32:$src), (FRECIPE_S FPR32:$src)>;
+def : Pat<(loongarch_frsqrte_s FPR32:$src), (FRSQRTE_S FPR32:$src)>;
 }
 
 // fmadd.s: fj * fk + fa
diff --git a/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td
index 30cce8439640f1..aabb58c0d68eff 100644
--- a/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td
@@ -10,6 +10,13 @@
 //
 //===----------------------------------------------------------------------===//
 
+// ===----------------------------------------------------------------------===//
+// LoongArch specific DAG Nodes.
+// ===----------------------------------------------------------------------===//
+
+def loongarch_frecipe_d : SDNode<"LoongArchISD::FRECIPE_D", SDT_LoongArchFRECIPE>;
+def loongarch_frsqrte_d : SDNode<"LoongArchISD::FRSQRTE_D", SDT_LoongArchFRECIPE>;
+
 //===----------------------------------------------------------------------===//
 // Instructions
 //===----------------------------------------------------------------------===//
@@ -253,6 +260,8 @@ let Predicates = [HasFrecipe] in {
 // FP approximate reciprocal operation
 def : Pat<(int_loongarch_frecipe_d FPR64:$src), (FRECIPE_D FPR64:$src)>;
 def : Pat<(int_loongarch_frsqrte_d FPR64:$src), (FRSQRTE_D FPR64:$src)>;
+def : Pat<(loongarch_frecipe_d FPR64:$src), (FRECIPE_D FPR64:$src)>;
+def : Pat<(loongarch_frsqrte_d FPR64:$src), (FRSQRTE_D FPR64:$src)>;
 }
 
 // fmadd.d: fj * fk + fa
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index bfafb331752108..bbff8d097a80e7 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -4697,6 +4697,18 @@ const char *LoongArchTargetLowering::getTargetNodeName(unsigned Opcode) const {
     NODE_NAME_CASE(VANY_ZERO)
     NODE_NAME_CASE(VALL_NONZERO)
     NODE_NAME_CASE(VANY_NONZERO)
+    NODE_NAME_CASE(FRECIPE_S)
+    NODE_NAME_CASE(FRECIPE_D)
+    NODE_NAME_CASE(FRSQRTE_S)
+    NODE_NAME_CASE(FRSQRTE_D)
+    NODE_NAME_CASE(VFRECIPE_S)
+    NODE_NAME_CASE(VFRECIPE_D)
+    NODE_NAME_CASE(VFRSQRTE_S)
+    NODE_NAME_CASE(VFRSQRTE_D)
+    NODE_NAME_CASE(XVFRECIPE_S)
+    NODE_NAME_CASE(XVFRECIPE_D)
+    NODE_NAME_CASE(XVFRSQRTE_S)
+    NODE_NAME_CASE(XVFRSQRTE_D)
   }
 #undef NODE_NAME_CASE
   return nullptr;
@@ -5902,6 +5914,91 @@ Register LoongArchTargetLowering::getExceptionSelectorRegister(
   return LoongArch::R5;
 }
 
+//===----------------------------------------------------------------------===//
+// Target Optimization Hooks
+//===----------------------------------------------------------------------===//
+
+static int getEstimateRefinementSteps(EVT VT, const LoongArchSubtarget &Subtarget) {
+  // Feature FRECIPE instrucions relative accuracy is 2^-14. 
+  // IEEE float has 23 digits and double has 52 digits.
+  int RefinementSteps = VT.getScalarType() == MVT::f64 ? 2: 1;
+  return RefinementSteps;
+}
+
+SDValue LoongArchTargetLowering::getSqrtEstimate(SDValue Operand,
+                                                 SelectionDAG &DAG, int Enabled,
+                                                 int &RefinementSteps,
+                                                 bool &UseOneConstNR,
+                                                 bool Reciprocal) const {
+  if (Subtarget.hasFrecipe()) {
+    SDLoc DL(Operand);
+    EVT VT = Operand.getValueType();
+    unsigned Opcode;
+
+    if (VT == MVT::f32) {
+      Opcode = LoongArchISD::FRSQRTE_S;
+    } else if (VT == MVT::f64 && Subtarget.hasBasicD()) {
+      Opcode = LoongArchISD::FRSQRTE_D;
+    } else if (VT == MVT::v4f32 && Subtarget.hasExtLSX()) {
+      Opcode = LoongArchISD::VFRSQRTE_S;
+    } else if (VT == MVT::v2f64 && Subtarget.hasExtLSX()) {
+      Opcode = LoongArchISD::VFRSQRTE_D;
+    } else if (VT == MVT::v8f32 && Subtarget.hasExtLASX()) {
+      Opcode = LoongArchISD::XVFRSQRTE_S;
+    } else if (VT == MVT::v4f64 && Subtarget.hasExtLASX()) {
+      Opcode = LoongArchISD::XVFRSQRTE_D;
+    } else {
+      return SDValue();
+    }
+
+    UseOneConstNR = false;
+    if (RefinementSteps == ReciprocalEstimate::Unspecified)
+      RefinementSteps = getEstimateRefinementSteps(VT, Subtarget);
+
+    SDValue Estimate = DAG.getNode(Opcode, DL, VT, Operand);
+    if (Reciprocal) {
+      Estimate = DAG.getNode(ISD::FMUL, DL, VT, Operand, Estimate);
+    }
+    return Estimate;
+  }
+
+  return SDValue();
+}
+
+SDValue LoongArchTargetLowering::getRecipEstimate(SDValue Operand,
+                                                  SelectionDAG &DAG,
+                                                  int Enabled,
+                                                  int &RefinementSteps) const {
+  if (Subtarget.hasFrecipe()) {
+    SDLoc DL(Operand);
+    EVT VT = Operand.getValueType();
+    unsigned Opcode;
+
+    if (VT == MVT::f32) {
+      Opcode = LoongArchISD::FRECIPE_S;
+    } else if (VT == MVT::f64 && Subtarget.hasBasicD()) {
+      Opcode = LoongArchISD::FRECIPE_D;
+    } else if (VT == MVT::v4f32 && Subtarget.hasExtLSX()) {
+      Opcode = LoongArchISD::VFRECIPE_S;
+    } else if (VT == MVT::v2f64 && Subtarget.hasExtLSX()) {
+      Opcode = LoongArchISD::VFRECIPE_D;
+    } else if (VT == MVT::v8f32 && Subtarget.hasExtLASX()) {
+      Opcode = LoongArchISD::XVFRECIPE_S;
+    } else if (VT == MVT::v4f64 && Subtarget.hasExtLASX()) {
+      Opcode = LoongArchISD::XVFRECIPE_D;
+    } else {
+      return SDValue();
+    }
+
+    if (RefinementSteps == ReciprocalEstimate::Unspecified)
+      RefinementSteps = getEstimateRefinementSteps(VT, Subtarget);
+
+    return DAG.getNode(Opcode, DL, VT, Operand);
+  }
+
+  return SDValue();
+}
+
 //===----------------------------------------------------------------------===//
 //                           LoongArch Inline Assembly Support
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.h b/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
index 6177884bd19501..a721cfc5f518e1 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
@@ -141,6 +141,22 @@ enum NodeType : unsigned {
   VALL_NONZERO,
   VANY_NONZERO,
 
+  // Floating point approximate reciprocal operation
+  FRECIPE_S,
+  FRECIPE_D,
+  FRSQRTE_S,
+  FRSQRTE_D,
+
+  VFRECIPE_S,
+  VFRECIPE_D,
+  VFRSQRTE_S,
+  VFRSQRTE_D,
+
+  XVFRECIPE_S,
+  XVFRECIPE_D,
+  XVFRSQRTE_S,
+  XVFRSQRTE_D,
+
   // Intrinsic operations end =============================================
 };
 } // end namespace LoongArchISD
@@ -216,6 +232,17 @@ class LoongArchTargetLowering : public TargetLowering {
   Register
   getExceptionSelectorRegister(const Constant *PersonalityFn) const override;
 
+  bool isFsqrtCheap(SDValue Operand, SelectionDAG &DAG) const override {
+    return true;
+  }
+
+  SDValue getSqrtEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
+                          int &RefinementSteps, bool &UseOneConstNR,
+                          bool Reciprocal) const override;
+
+  SDValue getRecipEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
+                           int &RefinementSteps) const override;
+
   ISD::NodeType getExtendForAtomicOps() const override {
     return ISD::SIGN_EXTEND;
   }
diff --git a/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
index dd7e5713e45fe9..23ae6f038dceff 100644
--- a/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
@@ -9,9 +9,18 @@
 // This file describes the Advanced SIMD extension instructions.
 //
 //===----------------------------------------------------------------------===//
+def SDT_LoongArchXVFRECIPE_S : SDTypeProfile<1, 1, [SDTCisVT<0, v8f32>, SDTCisVT<1, v8f32>]>;
+def SDT_LoongArchXVFRECIPE_D : SDTypeProfile<1, 1, [SDTCisVT<0, v4f64>, SDTCisVT<1, v4f64>]>;
 
+// Target nodes.
 def loongarch_xvpermi: SDNode<"LoongArchISD::XVPERMI", SDT_LoongArchV1RUimm>;
 
+def loongarch_xvfrecipe_s: SDNode<"LoongArchISD::XVFRECIPE_S", SDT_LoongArchXVFRECIPE_S>;
+def loongarch_xvfrecipe_d: SDNode<"LoongArchISD::XVFRECIPE_D", SDT_LoongArchXVFRECIPE_D>;
+def loongarch_xvfrsqrte_s: SDNode<"LoongArchISD::XVFRSQRTE_S", SDT_LoongArchXVFRECIPE_S>;
+def loongarch_xvfrsqrte_d: SDNode<"LoongArchISD::XVFRSQRTE_D", SDT_LoongArchXVFRECIPE_D>;
+
+
 def lasxsplati8
   : PatFrag<(ops node:$e0),
             (v32i8 (build_vector node:$e0, node:$e0, node:$e0, node:$e0,
@@ -2094,6 +2103,15 @@ foreach Inst = ["XVFRECIPE_S", "XVFRSQRTE_S"] in
 foreach Inst = ["XVFRECIPE_D", "XVFRSQRTE_D"] in
   def : Pat<(deriveLASXIntrinsic<Inst>.ret (v4f64 LASX256:$xj)),
             (!cast<LAInst>(Inst) LASX256:$xj)>;
+
+def : Pat<(loongarch_xvfrecipe_s v8f32:$src), 
+          (XVFRECIPE_S v8f32:$src)>;
+def : Pat<(loongarch_xvfrecipe_d v4f64:$src), 
+          (XVFRECIPE_D v4f64:$src)>;
+def : Pat<(loongarch_xvfrsqrte_s v8f32:$src), 
+          (XVFRSQRTE_S v8f32:$src)>;
+def : Pat<(loongarch_xvfrsqrte_d v4f64:$src), 
+          (XVFRSQRTE_D v4f64:$src)>;
 }
 
 def : Pat<(int_loongarch_lasx_xvpickve_w_f v8f32:$xj, timm:$imm),
diff --git a/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
index e7ac9f3bd04cbf..510b1241edd4e0 100644
--- a/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
@@ -23,6 +23,8 @@ def SDT_LoongArchV2R : SDTypeProfile<1, 2, [SDTCisVec<0>,
                                      SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>]>;
 def SDT_LoongArchV1RUimm: SDTypeProfile<1, 2, [SDTCisVec<0>,
                                         SDTCisSameAs<0,1>, SDTCisVT<2, i64>]>;
+def SDT_LoongArchVFRECIPE_S : SDTypeProfile<1, 1, [SDTCisVT<0, v4f32>, SDTCisVT<1, v4f32>]>;
+def SDT_LoongArchVFRECIPE_D : SDTypeProfile<1, 1, [SDTCisVT<0, v2f64>, SDTCisVT<1, v2f64>]>;
 
 // Target nodes.
 def loongarch_vreplve : SDNode<"LoongArchISD::VREPLVE", SDT_LoongArchVreplve>;
@@ -50,6 +52,10 @@ def loongarch_vilvh: SDNode<"LoongArchISD::VILVH", SDT_LoongArchV2R>;
 
 def loongarch_vshuf4i: SDNode<"LoongArchISD::VSHUF4I", SDT_LoongArchV1RUimm>;
 def loongarch_vreplvei: SDNode<"LoongArchISD::VREPLVEI", SDT_LoongArchV1RUimm>;
+def loongarch_vfrecipe_s: SDNode<"LoongArchISD::VFRECIPE_S", SDT_LoongArchVFRECIPE_S>;
+def loongarch_vfrecipe_d: SDNode<"LoongArchISD::VFRECIPE_D", SDT_LoongArchVFRECIPE_D>;
+def loongarch_vfrsqrte_s: SDNode<"LoongArchISD::VFRSQRTE_S", SDT_LoongArchVFRECIPE_S>;
+def loongarch_vfrsqrte_d: SDNode<"LoongArchISD::VFRSQRTE_D", SDT_LoongArchVFRECIPE_D>;
 
 def immZExt1 : ImmLeaf<i64, [{return isUInt<1>(Imm);}]>;
 def immZExt2 : ImmLeaf<i64, [{return isUInt<2>(Imm);}]>;
@@ -2238,6 +2244,15 @@ foreach Inst = ["VFRECIPE_S", "VFRSQRTE_S"] in
 foreach Inst = ["VFRECIPE_D", "VFRSQRTE_D"] in
   def : Pat<(deriveLSXIntrinsic<Inst>.ret (v2f64 LSX128:$vj)),
             (!cast<LAInst>(Inst) LSX128:$vj)>;
+
+def : Pat<(loongarch_vfrecipe_s v4f32:$src), 
+          (VFRECIPE_S v4f32:$src)>;
+def : Pat<(loongarch_vfrecipe_d v2f64:$src), 
+          (VFRECIPE_D v2f64:$src)>;
+def : Pat<(loongarch_vfrsqrte_s v4f32:$src), 
+          (VFRSQRTE_S v4f32:$src)>;
+def : Pat<(loongarch_vfrsqrte_d v2f64:$src), 
+          (VFRSQRTE_D v2f64:$src)>;
 }
 
 // load
diff --git a/llvm/test/CodeGen/LoongArch/fdiv-reciprocal-estimate.ll b/llvm/test/CodeGen/LoongArch/fdiv-reciprocal-estimate.ll
new file mode 100644
index 00000000000000..b4b280a43055f1
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/fdiv-reciprocal-estimate.ll
@@ -0,0 +1,43 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc --mtriple=loongarch64 --mattr=+d,-frecipe < %s | FileCheck %s --check-prefix=FAULT
+; RUN: llc --mtriple=loongarch64 --mattr=+d,+frecipe < %s | FileCheck %s
+
+;; Exercise the 'fdiv' LLVM IR: https://llvm.org/docs/LangRef.html#fdiv-instruction
+
+define float @fdiv_s(float %x, float %y) {
+; FAULT-LABEL: fdiv_s:
+; FAULT:       # %bb.0:
+; FAULT-NEXT:    fdiv.s $fa0, $fa0, $fa1
+; FAULT-NEXT:    ret
+;
+; CHECK-LABEL: fdiv_s:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    frecipe.s	$fa2, $fa1
+; CHECK-NEXT:    fmul.s	$fa3, $fa0, $fa2
+; CHECK-NEXT:    fnmsub.s	$fa0, $fa1, $fa3, $fa0
+; CHECK-NEXT:    fmadd.s	$fa0, $fa2, $fa0, $fa3
+; CHECK-NEXT:    ret
+  %div = fdiv fast float %x, %y
+  ret float %div
+}
+
+define double @fdiv_d(double %x, double %y) {
+; FAULT-LABEL: fdiv_d:
+; FAULT:       # %bb.0:
+; FAULT-NEXT:    fdiv.d $fa0, $fa0, $fa1
+; FAULT-NEXT:    ret
+;
+; CHECK-LABEL: fdiv_d:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    pcalau12i	$a0, %pc_hi20(.LCPI1_0)
+; CHECK-NEXT:    fld.d	$fa2, $a0, %pc_lo12(.LCPI1_0)
+; CHECK-NEXT:    frecipe.d	$fa3, $fa1
+; CHECK-NEXT:    fmadd.d	$fa2, $fa1, $fa3, $fa2
+; CHECK-NEXT:    fnmsub.d	$fa2, $fa2, $fa3, $fa3
+; CHECK-NEXT:    fmul.d	$fa3, $fa0, $fa2
+; CHECK-NEXT:    fnmsub.d	$fa0, $fa1, $fa3, $fa0
+; CHECK-NEXT:    fmadd.d	$fa0, $fa2, $fa0, $fa3
+; CHECK-NEXT:    ret
+  %div = fdiv fast double %x, %y
+  ret double %div
+}
diff --git a/llvm/test/CodeGen/LoongArch/fsqrt-reciprocal-estimate.ll b/llvm/test/CodeGen/LoongArch/fsqrt-reciprocal-estimate.ll
new file mode 100644
index 00000000000000..d683487fdd4073
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/fsqrt-reciprocal-estimate.ll
@@ -0,0 +1,209 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc --mtriple=loongarch64 --mattr=+d,-frecipe < %s | FileCheck %s --check-prefix=FAULT
+; RUN: llc --mtriple=loongarch64 --mattr=+d,+frecipe < %s | FileCheck %s
+
+declare float @llvm.sqrt.f32(float)
+declare double @llvm.sqrt.f64(double)
+
+define float @frsqrt_f32(float %a) nounwind {
+; FAULT-LABEL: frsqrt_f32:
+; FAULT:       # %bb.0:
+; FAULT-NEXT:    frsqrt.s $fa0, $fa0
+; FAULT-NEXT:    ret
+;
+; CHECK-LABEL: frsqrt_f32:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    frsqrte.s	$fa1, $fa0
+; CHECK-NEXT:    pcalau12i	$a0, %pc_hi20(.LCPI0_0)
+; CHECK-NEXT:    fld.s	$fa2, $a0, %pc_lo12(.LCPI0_0)
+; CHECK-NEXT:    pcalau12i	$a0, %pc_hi20(.LCPI0_1)
+; CHECK-NEXT:    fld.s	$fa3, $a0, %pc_lo12(.LCPI0_1)
+; CHECK-NEXT:    fmul.s	$fa1, $fa0, $fa1
+; CHECK-NEXT:    fmul.s	$fa0, $fa0, $fa1
+; CHECK-NEXT:    fmadd.s	$fa0, $fa0, $fa1, $fa2
+; CHECK-NEXT:    fmul.s	$fa1, $fa1, $fa3
+; CHECK-NEXT:    fmul.s	$fa0, $fa1, $fa0
+; CHECK-NEXT:    ret
+
+  %1 = call fast float @llvm.sqrt.f32(float %a)
+  %2 = fdiv fast float 1.0, %1
+  ret float %2
+}
+
+define double @frsqrt_f64(double %a) nounwind {
+; FAULT-LABEL: frsqrt_f64:
+; FAULT:       # %bb.0:
+; FAULT-NEXT:    frsqrt.d $fa0, $fa0
+; FAULT-NEXT:    ret
+;
+; CHECK-LABEL: frsqrt_f64:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    frsqrte.d	$fa1, $fa0
+; CHECK-NEXT:    pcalau12i	$a0, %pc_hi20(.LCPI1_0)
+; CHECK-NEXT:    fld.d	$fa2, $a0, %pc_lo12(.LCPI1_0)
+; CHECK-NEXT:    pcalau12i	$a0, %pc_hi20(.LCPI1_1)
+; CHECK-NEXT:    fld.d	$fa3, $a0, %pc_lo12(.LCPI1_1)
+; CHECK-NEXT:    fmul.d	$fa1, $fa0, $fa1
+; CHECK-NEXT:    fmul.d	$fa4, $fa0, $fa1
+; CHECK-NEXT:    fmadd.d	$fa4, $fa4, $fa1, $fa2
+; CHECK-NEXT:    fmul.d	$fa1, $fa1, $fa3
+; CHECK-NEXT:    fmul.d	$fa1, $fa1, $fa4
+; CHECK-NEXT:    fmul.d	$fa0, $fa0, $fa1
+; CHECK-NEXT:    fmadd.d	$fa0, $fa0, $fa1, $fa2
+; CHECK-NEXT:    fmul.d	$fa1, $fa1, $fa3
+; CHECK-NEXT:    fmul.d	$fa0, $fa1, $fa0
+; CHECK-NEXT:    ret
+  %1 = call fast double @llvm.sqrt.f64(double %a)
+  %2 = fdiv fast double 1.0, %1
+  ret double %2
+}
+
+define double @sqrt_simplify_before_recip_3_uses(double %x, ptr %p1, ptr %p2) nounwind {
+; FAULT-LABEL: sqrt_simplify_before_recip_3_uses:
+; FAULT:       # %bb.0:
+; FAULT-NEXT:    pcalau12i	$a2, %pc_hi20(.LCPI2_0)
+; FAULT-NEXT:    fld.d	$fa2, $a2, %pc_lo12(.LCPI2_0)
+; FAULT-NEXT:    fsqrt.d	$fa1, $fa0
+; FAULT-NEXT:    frsqrt.d	$fa0, $fa0
+; FAULT-NEXT:    fdiv.d	$fa2, $fa2, $fa1
+; FAULT-NEXT:    fst.d	$fa0, $a0, 0
+; FAULT-NEXT:    fst.d	$fa2, $a1, 0
+; FAULT-NEXT:    fmov.d	$fa0, $fa1
+; FAULT-NEXT:    ret
+;
+; CHECK-LABEL: sqrt_simplify_before_recip_3_uses:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    frsqrte.d	$fa1, $fa0
+; CHECK-NEXT:    pcalau12i	$a2, %pc_hi20(.LCPI2_0)
+; CHECK-NEXT:    fld.d	$fa2, $a2, %pc_lo12(.LCPI2_0)
+; CHECK-NEXT:    pcalau12i	$a2, %pc_hi20(.LCPI2_1)
+; CHECK-NEXT:    fld.d	$fa3, $a2, %pc_lo12(.LCPI2_1)
+; CHECK-NEXT:    fmul.d	$fa1, $fa0, $fa1
+; CHECK-NEXT:    fmul.d	$fa4, $fa0, $fa1
+; CHECK-NEXT:    fmadd.d	$fa4, $fa4, $fa1, $fa2
+; CHECK-NEXT:    fmul.d	$fa1, $fa1, $fa3
+; CHECK-NEXT:    fmul.d	$fa1, $fa1, $fa4
+; CHECK-NEXT:    fmul.d	$fa4, $fa0, $fa1
+; CHECK-NEXT:    pcalau12i	$a2, %pc_hi20(.LCPI2_2)
+; CHECK-NEXT:    fld.d	$fa5, $a2, %pc_lo12(.LCPI2_2)
+; CHECK-NEXT:    fmadd.d	$fa2, $fa4, $fa1, $fa2
+; CHECK-NEXT:    fmul.d	$fa1, $fa1...
[truncated]

Copy link

github-actions bot commented Sep 25, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@tangaac tangaac force-pushed the feat-frecipe-loongarch branch from eff8b3b to b720995 Compare September 25, 2024 06:56
Copy link
Member

@heiher heiher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks.

@tangaac tangaac force-pushed the feat-frecipe-loongarch branch from 587c4e0 to 37454c9 Compare October 15, 2024 08:23
Copy link
Contributor

@SixWeining SixWeining left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except a nit.

@SixWeining SixWeining merged commit e9eec14 into llvm:main Oct 18, 2024
8 checks passed
Copy link

@tangaac Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 18, 2024

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building clang,llvm at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/7233

Here is the relevant piece of the build log for the reference
Step 6 (test-openmp) failure: test (failure)
******************** TEST 'libomp :: tasking/issue-94260-2.c' FAILED ********************
Exit Code: -11

Command Output (stdout):
--
# RUN: at line 1
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp   -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/openmp/runtime/test -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -fno-omit-frame-pointer -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/openmp/runtime/test/ompt /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/openmp/runtime/test/tasking/issue-94260-2.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp -lm -latomic && /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/openmp/runtime/test -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -fno-omit-frame-pointer -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/openmp/runtime/test/ompt /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/openmp/runtime/test/tasking/issue-94260-2.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp -lm -latomic
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp
# note: command had no output on stdout or stderr
# error: command failed with exit status: -11

--

********************


@tangaac tangaac deleted the feat-frecipe-loongarch branch November 19, 2024 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:loongarch clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants