Skip to content

[SPARC] Prefer RDPC over CALL to implement GETPCX for 64-bit target #77196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

koachan
Copy link
Contributor

@koachan koachan commented Jan 6, 2024

On 64-bit target, prefer usng RDPC over CALL to get the value of %pc.
This is faster on modern processors (Niagara T1 and newer) and avoids polluting
the processor's predictor state.

The old behavior of using a fake CALL is still done when tuning for classic
UltraSPARC processors, since RDPC is much slower there.

A quick pgbench test on a SPARC T4 shows about 2% speedup on SELECT loads,
and about 7% speedup on INSERT/UPDATE loads.

Created using spr 1.3.4
@llvmbot
Copy link
Member

llvmbot commented Jan 6, 2024

@llvm/pr-subscribers-backend-sparc

Author: Koakuma (koachan)

Changes

On 64-bit target, prefer usng RDPC over CALL to get the value of %pc.
This is faster on modern processors (Niagara T1 and newer) and avoids polluting
the processor's predictor state.

The old behavior of using a fake CALL is still done when tuning for classic
UltraSPARC processors, since RDPC is much slower there.

A quick pgbench test on a SPARC T4 shows about 2% speedup on SELECT loads,
and about 7% speedup on INSERT/UPDATE loads.


Full diff: https://github.com/llvm/llvm-project/pull/77196.diff

3 Files Affected:

  • (modified) llvm/lib/Target/Sparc/Sparc.td (+14-4)
  • (modified) llvm/lib/Target/Sparc/SparcAsmPrinter.cpp (+19-2)
  • (added) llvm/test/CodeGen/SPARC/tune-getpcx.ll (+18)
diff --git a/llvm/lib/Target/Sparc/Sparc.td b/llvm/lib/Target/Sparc/Sparc.td
index 1a71cfed3128f0..7b103395652433 100644
--- a/llvm/lib/Target/Sparc/Sparc.td
+++ b/llvm/lib/Target/Sparc/Sparc.td
@@ -62,6 +62,13 @@ def UsePopc : SubtargetFeature<"popc", "UsePopc", "true",
 def FeatureSoftFloat : SubtargetFeature<"soft-float", "UseSoftFloat", "true",
                               "Use software emulation for floating point">;
 
+//===----------------------------------------------------------------------===//
+// SPARC Subtarget tuning features.
+//
+
+def TuneSlowRDPC : SubtargetFeature<"slow-rdpc", "HasSlowRDPC", "true",
+                                    "rd %pc, %XX is slow", [FeatureV9]>;
+
 //==== Features added predmoninantly for LEON subtarget support
 include "LeonFeatures.td"
 
@@ -89,8 +96,9 @@ def SparcAsmParserVariant : AsmParserVariant {
 // SPARC processors supported.
 //===----------------------------------------------------------------------===//
 
-class Proc<string Name, list<SubtargetFeature> Features>
- : Processor<Name, NoItineraries, Features>;
+class Proc<string Name, list<SubtargetFeature> Features,
+           list<SubtargetFeature> TuneFeatures = []>
+ : Processor<Name, NoItineraries, Features, TuneFeatures>;
 
 def : Proc<"generic",         []>;
 def : Proc<"v7",              [FeatureSoftMulDiv, FeatureNoFSMULD]>;
@@ -118,9 +126,11 @@ def : Proc<"ma2480",          [FeatureLeon, LeonCASA]>;
 def : Proc<"ma2485",          [FeatureLeon, LeonCASA]>;
 def : Proc<"ma2x8x",          [FeatureLeon, LeonCASA]>;
 def : Proc<"v9",              [FeatureV9]>;
-def : Proc<"ultrasparc",      [FeatureV9, FeatureV8Deprecated, FeatureVIS]>;
+def : Proc<"ultrasparc",      [FeatureV9, FeatureV8Deprecated, FeatureVIS],
+                              [TuneSlowRDPC]>;
 def : Proc<"ultrasparc3",     [FeatureV9, FeatureV8Deprecated, FeatureVIS,
-                               FeatureVIS2]>;
+                               FeatureVIS2],
+                              [TuneSlowRDPC]>;
 def : Proc<"niagara",         [FeatureV9, FeatureV8Deprecated, FeatureVIS,
                                FeatureVIS2]>;
 def : Proc<"niagara2",        [FeatureV9, FeatureV8Deprecated, UsePopc,
diff --git a/llvm/lib/Target/Sparc/SparcAsmPrinter.cpp b/llvm/lib/Target/Sparc/SparcAsmPrinter.cpp
index cca624e0926796..97abf10b18540d 100644
--- a/llvm/lib/Target/Sparc/SparcAsmPrinter.cpp
+++ b/llvm/lib/Target/Sparc/SparcAsmPrinter.cpp
@@ -13,6 +13,7 @@
 
 #include "MCTargetDesc/SparcInstPrinter.h"
 #include "MCTargetDesc/SparcMCExpr.h"
+#include "MCTargetDesc/SparcMCTargetDesc.h"
 #include "MCTargetDesc/SparcTargetStreamer.h"
 #include "Sparc.h"
 #include "SparcInstrInfo.h"
@@ -111,6 +112,15 @@ static void EmitCall(MCStreamer &OutStreamer,
   OutStreamer.emitInstruction(CallInst, STI);
 }
 
+static void EmitRDPC(MCStreamer &OutStreamer, MCOperand &RD,
+                     const MCSubtargetInfo &STI) {
+  MCInst RDPCInst;
+  RDPCInst.setOpcode(SP::RDASR);
+  RDPCInst.addOperand(RD);
+  RDPCInst.addOperand(MCOperand::createReg(SP::ASR5));
+  OutStreamer.emitInstruction(RDPCInst, STI);
+}
+
 static void EmitSETHI(MCStreamer &OutStreamer,
                       MCOperand &Imm, MCOperand &RD,
                       const MCSubtargetInfo &STI)
@@ -234,8 +244,15 @@ void SparcAsmPrinter::LowerGETPCXAndEmitMCInsts(const MachineInstr *MI,
   //   add <MO>, %o7, <MO>
 
   OutStreamer->emitLabel(StartLabel);
-  MCOperand Callee =  createPCXCallOP(EndLabel, OutContext);
-  EmitCall(*OutStreamer, Callee, STI);
+  if (!STI.getTargetTriple().isSPARC64() ||
+      STI.hasFeature(Sparc::TuneSlowRDPC)) {
+    MCOperand Callee = createPCXCallOP(EndLabel, OutContext);
+    EmitCall(*OutStreamer, Callee, STI);
+  } else {
+    // TODO make it possible to store PC in other registers
+    // so that leaf function optimization becomes possible.
+    EmitRDPC(*OutStreamer, RegO7, STI);
+  }
   OutStreamer->emitLabel(SethiLabel);
   MCOperand hiImm = createPCXRelExprOp(SparcMCExpr::VK_Sparc_PC22,
                                        GOTLabel, StartLabel, SethiLabel,
diff --git a/llvm/test/CodeGen/SPARC/tune-getpcx.ll b/llvm/test/CodeGen/SPARC/tune-getpcx.ll
new file mode 100644
index 00000000000000..7454fea0e38d57
--- /dev/null
+++ b/llvm/test/CodeGen/SPARC/tune-getpcx.ll
@@ -0,0 +1,18 @@
+; RUN: llc < %s -relocation-model=pic -mtriple=sparc | FileCheck --check-prefix=CALL %s
+; RUN: llc < %s -relocation-model=pic -mtriple=sparcv9 -mcpu=ultrasparc | FileCheck --check-prefix=CALL %s
+; RUN: llc < %s -relocation-model=pic -mtriple=sparcv9 | FileCheck --check-prefix=RDPC %s
+
+;; SPARC32 and SPARC64 for classic UltraSPARCs implement GETPCX
+;; with a fake `call`.
+;; All other SPARC64 targets implement it with `rd %pc, %o7`.
+
+@value = external global i32
+
+; CALL: call
+; CALL-NOT: rd %pc
+; RDPC: rd %pc
+; RDPC-not: call
+define i32 @test() {
+  %1 = load i32, i32* @value
+  ret i32 %1
+}

@@ -118,9 +126,11 @@ def : Proc<"ma2480", [FeatureLeon, LeonCASA]>;
def : Proc<"ma2485", [FeatureLeon, LeonCASA]>;
def : Proc<"ma2x8x", [FeatureLeon, LeonCASA]>;
def : Proc<"v9", [FeatureV9]>;
def : Proc<"ultrasparc", [FeatureV9, FeatureV8Deprecated, FeatureVIS]>;
def : Proc<"ultrasparc", [FeatureV9, FeatureV8Deprecated, FeatureVIS],
[TuneSlowRDPC]>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it in TuneFeatures? Sparc doesn't seem to support -mtune.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-mtune enablement is at PR #77195.
clang already accepts, recognises, and passes the flag on to the backend, it's just the backend haven't made any use of the provided info yet.

Copy link
Contributor

@s-barannikov s-barannikov Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. IIUC -mtune is translated into -tune-cpu llc option. If so, the test should use this option.

koachan added a commit to koachan/llvm-project that referenced this pull request Jan 9, 2024
On 64-bit target, prefer usng RDPC over CALL to get the value of %pc.
This is faster on modern processors (Niagara T1 and newer) and avoids polluting
the processor's predictor state.

The old behavior of using a fake CALL is still done when tuning for classic
UltraSPARC processors, since RDPC is much slower there.

A quick pgbench test on a SPARC T4 shows about 2% speedup on SELECT loads,
and about 7% speedup on INSERT/UPDATE loads.

Pull Request: llvm#77196
Created using spr 1.3.4
Created using spr 1.3.4
@brad0 brad0 merged commit 63f9829 into users/koachan/main.sparc-prefer-rdpc-over-call-to-implement-getpcx-for-64-bit-target Jan 14, 2024
@brad0 brad0 deleted the users/koachan/sparc-prefer-rdpc-over-call-to-implement-getpcx-for-64-bit-target branch January 14, 2024 21:28
@brad0 brad0 restored the users/koachan/sparc-prefer-rdpc-over-call-to-implement-getpcx-for-64-bit-target branch January 14, 2024 21:33
@s-barannikov s-barannikov deleted the users/koachan/sparc-prefer-rdpc-over-call-to-implement-getpcx-for-64-bit-target branch January 14, 2024 21:42
@brad0 brad0 restored the users/koachan/sparc-prefer-rdpc-over-call-to-implement-getpcx-for-64-bit-target branch January 14, 2024 22:48
@brad0
Copy link
Contributor

brad0 commented Jan 14, 2024

I intentionally restored the branch as I only noticed after the merge that the target branch was set incorrectly. It wasn't merged into the LLVM repo.

Is there a way of re-targeting the PR or make a new PR?

@koachan
Copy link
Contributor Author

koachan commented Jan 15, 2024

Lemme see if I can do it
I used spr to make the PRs, it should still be mergeable from there, I think?

@koachan
Copy link
Contributor Author

koachan commented Jan 15, 2024

Uh oh, seems like I couldn't merge it with the tool either...
Guess I'll open a new PR to merge this, would that be okay? @brad0 @s-barannikov

@s-barannikov
Copy link
Contributor

Yes, sure

@brad0
Copy link
Contributor

brad0 commented Jan 15, 2024

Uh oh, seems like I couldn't merge it with the tool either... Guess I'll open a new PR to merge this, would that be okay? @brad0 @s-barannikov

Ya, sure. Whatever path is easier and quickest for you.

@koachan
Copy link
Contributor Author

koachan commented Jan 16, 2024

Okay, new PR is at #78280.

@brad0 brad0 deleted the users/koachan/sparc-prefer-rdpc-over-call-to-implement-getpcx-for-64-bit-target branch January 28, 2025 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants