[RISCV][GISel] Support G_MERGE_VALUES/G_UNMERGE_VALUES with Zfa. #120379

topperc · 2024-12-18T08:27:45Z

Without Zfa we use pseudos that are lowered to a stack load/store. With Zfa we have instructions that can move a pair of registers to an FPR. Or move the high or low half of an FPR to a GPR.

I've used a GINodeEquiv to make use of 3 of the 4 tablegen patterns. The split case with Zfa requires 2 instructions which I'm doing through custom isel like we do in SelectionDAG. One concern I have is, I'm not sure if its a good idea to make GINodeEquiv between a target independent generic opcode and a target dependent SelectionDAG opcode. Similar is done on Mips. And I saw some G_LOAD/G_STORE equivalents in AMDGPU so maybe its ok?

Without Zfa we used pseudos that are lowered to a stack load/store. With Zfa we have instructions that can move a pair of registers to an FPR. Or move an high half of an FPR to a GPR. I've used a GINodeEquiv to make use of 3 of the 4 tablegen patterns. The split case with Zfa requires 2 instructions which I'm doing through custom isel like we do in SelectionDAG. One concern I have is, I'm not sure if its a good idea to make GINodeEquiv between a target independent generic opcode and a target dependent SelectionDAG opcode. Similar is done on Mips. And I saw some G_LOAD/G_STORE equivalents in AMDGPU so maybe its ok?

llvmbot · 2024-12-18T08:28:23Z

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-globalisel

Author: Craig Topper (topperc)

Changes

Without Zfa we use pseudos that are lowered to a stack load/store. With Zfa we have instructions that can move a pair of registers to an FPR. Or move the high or low half of an FPR to a GPR.

I've used a GINodeEquiv to make use of 3 of the 4 tablegen patterns. The split case with Zfa requires 2 instructions which I'm doing through custom isel like we do in SelectionDAG. One concern I have is, I'm not sure if its a good idea to make GINodeEquiv between a target independent generic opcode and a target dependent SelectionDAG opcode. Similar is done on Mips. And I saw some G_LOAD/G_STORE equivalents in AMDGPU so maybe its ok?

Full diff: https://github.com/llvm/llvm-project/pull/120379.diff

3 Files Affected:

(modified) llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp (+14-21)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfoD.td (+2)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/double-zfa.ll (+31-3)

diff --git a/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp b/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
index 985264c591e105..a9a16f209c24f7 100644
--- a/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
+++ b/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
@@ -80,7 +80,6 @@ class RISCVInstructionSelector : public InstructionSelector {
   bool selectFPCompare(MachineInstr &MI, MachineIRBuilder &MIB) const;
   void emitFence(AtomicOrdering FenceOrdering, SyncScope::ID FenceSSID,
                  MachineIRBuilder &MIB) const;
-  bool selectMergeValues(MachineInstr &MI, MachineIRBuilder &MIB) const;
   bool selectUnmergeValues(MachineInstr &MI, MachineIRBuilder &MIB) const;
 
   ComplexRendererFns selectShiftMask(MachineOperand &Root,
@@ -733,8 +732,6 @@ bool RISCVInstructionSelector::select(MachineInstr &MI) {
   }
   case TargetOpcode::G_IMPLICIT_DEF:
     return selectImplicitDef(MI, MIB);
-  case TargetOpcode::G_MERGE_VALUES:
-    return selectMergeValues(MI, MIB);
   case TargetOpcode::G_UNMERGE_VALUES:
     return selectUnmergeValues(MI, MIB);
   default:
@@ -742,26 +739,13 @@ bool RISCVInstructionSelector::select(MachineInstr &MI) {
   }
 }
 
-bool RISCVInstructionSelector::selectMergeValues(MachineInstr &MI,
-                                                 MachineIRBuilder &MIB) const {
-  assert(MI.getOpcode() == TargetOpcode::G_MERGE_VALUES);
-
-  // Build a F64 Pair from operands
-  if (MI.getNumOperands() != 3)
-    return false;
-  Register Dst = MI.getOperand(0).getReg();
-  Register Lo = MI.getOperand(1).getReg();
-  Register Hi = MI.getOperand(2).getReg();
-  if (!isRegInFprb(Dst) || !isRegInGprb(Lo) || !isRegInGprb(Hi))
-    return false;
-  MI.setDesc(TII.get(RISCV::BuildPairF64Pseudo));
-  return constrainSelectedInstRegOperands(MI, TII, TRI, RBI);
-}
-
 bool RISCVInstructionSelector::selectUnmergeValues(
     MachineInstr &MI, MachineIRBuilder &MIB) const {
   assert(MI.getOpcode() == TargetOpcode::G_UNMERGE_VALUES);
 
+  if (!Subtarget->hasStdExtZfa())
+    return false;
+
   // Split F64 Src into two s32 parts
   if (MI.getNumOperands() != 3)
     return false;
@@ -770,8 +754,17 @@ bool RISCVInstructionSelector::selectUnmergeValues(
   Register Hi = MI.getOperand(1).getReg();
   if (!isRegInFprb(Src) || !isRegInGprb(Lo) || !isRegInGprb(Hi))
     return false;
-  MI.setDesc(TII.get(RISCV::SplitF64Pseudo));
-  return constrainSelectedInstRegOperands(MI, TII, TRI, RBI);
+
+  MachineInstr *ExtractLo = MIB.buildInstr(RISCV::FMV_X_W_FPR64, {Lo}, {Src});
+  if (!constrainSelectedInstRegOperands(*ExtractLo, TII, TRI, RBI))
+    return false;
+
+  MachineInstr *ExtractHi = MIB.buildInstr(RISCV::FMVH_X_D, {Hi}, {Src});
+  if (!constrainSelectedInstRegOperands(*ExtractHi, TII, TRI, RBI))
+    return false;
+
+  MI.eraseFromParent();
+  return true;
 }
 
 bool RISCVInstructionSelector::replacePtrWithInt(MachineOperand &Op,
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoD.td b/llvm/lib/Target/RISCV/RISCVInstrInfoD.td
index ae969bff82fd12..349bc361c90fe8 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoD.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoD.td
@@ -23,7 +23,9 @@ def SDT_RISCVSplitF64     : SDTypeProfile<2, 1, [SDTCisVT<0, i32>,
                                                  SDTCisVT<2, f64>]>;
 
 def RISCVBuildPairF64 : SDNode<"RISCVISD::BuildPairF64", SDT_RISCVBuildPairF64>;
+def : GINodeEquiv<G_MERGE_VALUES, RISCVBuildPairF64>;
 def RISCVSplitF64     : SDNode<"RISCVISD::SplitF64", SDT_RISCVSplitF64>;
+def : GINodeEquiv<G_UNMERGE_VALUES, RISCVSplitF64>;
 
 def AddrRegImmINX : ComplexPattern<iPTR, 2, "SelectAddrRegImmRV32Zdinx">;
 
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/double-zfa.ll b/llvm/test/CodeGen/RISCV/GlobalISel/double-zfa.ll
index 385156b3b99d48..48786992265824 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/double-zfa.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/double-zfa.ll
@@ -1,9 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-
 ; RUN: llc -mtriple=riscv32 -mattr=+zfa,d -global-isel < %s \
-; RUN: | FileCheck %s
+; RUN: | FileCheck %s --check-prefixes=CHECK,RV32IDZFA
 ; RUN: llc -mtriple=riscv64 -mattr=+zfa,d -global-isel < %s \
-; RUN: | FileCheck %s
+; RUN: | FileCheck %s --check-prefixes=CHECK,RV64DZFA
 
 
 define double @fceil(double %a) {
@@ -86,3 +85,32 @@ define double @fminimum(double %a, double %b) {
   %c = call double @llvm.minimum.f64(double %a, double %b)
   ret double %c
 }
+
+define i64 @fmvh_x_d(double %fa) {
+; RV32IDZFA-LABEL: fmvh_x_d:
+; RV32IDZFA:       # %bb.0:
+; RV32IDZFA-NEXT:    fmv.x.w a0, fa0
+; RV32IDZFA-NEXT:    fmvh.x.d a1, fa0
+; RV32IDZFA-NEXT:    ret
+;
+; RV64DZFA-LABEL: fmvh_x_d:
+; RV64DZFA:       # %bb.0:
+; RV64DZFA-NEXT:    fmv.x.d a0, fa0
+; RV64DZFA-NEXT:    ret
+  %i = bitcast double %fa to i64
+  ret i64 %i
+}
+
+define double @fmvp_d_x(i64 %a) {
+; RV32IDZFA-LABEL: fmvp_d_x:
+; RV32IDZFA:       # %bb.0:
+; RV32IDZFA-NEXT:    fmvp.d.x fa0, a0, a1
+; RV32IDZFA-NEXT:    ret
+;
+; RV64DZFA-LABEL: fmvp_d_x:
+; RV64DZFA:       # %bb.0:
+; RV64DZFA-NEXT:    fmv.d.x fa0, a0
+; RV64DZFA-NEXT:    ret
+  %or = bitcast i64 %a to double
+  ret double %or
+}

aemerson · 2024-12-20T02:11:14Z

I'm not sure if its a good idea to make GINodeEquiv between a target independent generic opcode and a target dependent SelectionDAG opcode. Similar is done on Mips. And I saw some G_LOAD/G_STORE equivalents in AMDGPU so maybe it's ok?

Is the general case of merge/unmerge really a 1-1 semantic mapping with your target node? If not I wouldn't advise going down this route, maybe lowering them into something more specific (like G_RISCV_MERGE) and then specifying the node equivalence would be a more precise route.

topperc · 2024-12-20T04:14:49Z

I'm not sure if its a good idea to make GINodeEquiv between a target independent generic opcode and a target dependent SelectionDAG opcode. Similar is done on Mips. And I saw some G_LOAD/G_STORE equivalents in AMDGPU so maybe it's ok?

Is the general case of merge/unmerge really a 1-1 semantic mapping with your target node? If not I wouldn't advise going down this route, maybe lowering them into something more specific (like G_RISCV_MERGE) and then specifying the node equivalence would be a more precise route.

Its the only case we have of G_MERGE_VALUES/UNMERGE_VALUES right now. Not sure if we will need more in the future. Looking at tablegen it looks like the mapping is from SelectionDAG node to GISelEquiv so the same GISel opcode can be mapped to multiple SelectionDAG opcodes?

Where should I do the "lowering" if I were going to add G_RISCV_MERGE?

arsenm · 2024-12-20T04:46:57Z

And I saw some G_LOAD/G_STORE equivalents in AMDGPU so maybe its ok?

This is mostly a hack for glue in SelectionDAG. We have to hack in a glue input in some cases on load/store/atomicrmw, and these are boilerplate to keep the patterns importing

aemerson · 2024-12-20T04:47:29Z

I'm not sure if its a good idea to make GINodeEquiv between a target independent generic opcode and a target dependent SelectionDAG opcode. Similar is done on Mips. And I saw some G_LOAD/G_STORE equivalents in AMDGPU so maybe it's ok?

Is the general case of merge/unmerge really a 1-1 semantic mapping with your target node? If not I wouldn't advise going down this route, maybe lowering them into something more specific (like G_RISCV_MERGE) and then specifying the node equivalence would be a more precise route.

Its the only case we have of G_MERGE_VALUES/UNMERGE_VALUES right now. Not sure if we will need more in the future. Looking at tablegen it looks like the mapping is from SelectionDAG node to GISelEquiv so the same GISel opcode can be mapped to multiple SelectionDAG opcodes?

What I meant by 1-1 was that G_MERGE_VALUES takes any number of scalars and merges into a larger scalar. If your target node exactly implements the legal G_MERGE_VALUES variants for RISC-V then I guess it's ok. But if not you may run into issues later where the selector can't handle some of the edge cases.

Where should I do the "lowering" if I were going to add G_RISCV_MERGE?

If you don't want to implement a PostLegalizerLowering pass like AArch64 then I guess you could do it in preISelLower() instead? It's your call though, if you think this is ok for now go ahead. It's fairly easy to fix later if needed.

arsenm · 2024-12-20T04:52:11Z

llvm/lib/Target/RISCV/RISCVInstrInfoD.td

@@ -23,7 +23,9 @@ def SDT_RISCVSplitF64     : SDTypeProfile<2, 1, [SDTCisVT<0, i32>,
                                                 SDTCisVT<2, f64>]>;

 def RISCVBuildPairF64 : SDNode<"RISCVISD::BuildPairF64", SDT_RISCVBuildPairF64>;
+def : GINodeEquiv<G_MERGE_VALUES, RISCVBuildPairF64>;


If you have more variants of RISCVBuildPairF64, you'd eventually need some custom emitter code to differentiate them

Looks like the tablegen generated code looks like this

GIM_CheckNumOperands, /*MI*/0, /*Expected*/3, GIM_RootCheckType, /*Op*/0, /*Type*/GILLT_s64, GIM_RootCheckType, /*Op*/1, /*Type*/GILLT_s32, GIM_RootCheckType, /*Op*/2, /*Type*/GILLT_s32, GIM_RootCheckRegBankForClass, /*Op*/0, /*RC*/GIMT_Encode2(RISCV::FPR64RegClassID), GIM_RootCheckRegBankForClass, /*Op*/1, /*RC*/GIMT_Encode2(RISCV::GPRRegClassID), GIM_RootCheckRegBankForClass, /*Op*/2, /*RC*/GIMT_Encode2(RISCV::GPRRegClassID),

That seems disambiguated to the exact number of operands, types and regbank/class.

topperc · 2024-12-20T05:01:53Z

If you don't want to implement a PostLegalizerLowering pass like AArch64 then I guess you could do it in preISelLower() instead? It's your call though, if you think this is ok for now go ahead. It's fairly easy to fix later if needed.

I think we tried to add a PostLegalizerLowering pass for something vector related in the past and got some negative feedback.

arsenm · 2024-12-20T05:06:59Z

I think we tried to add a PostLegalizerLowering pass for something vector related in the past and got some negative feedback.

I still consider a pre selection lowering a hack papering over missing selection patterns or missing legalization

topperc · 2024-12-23T18:33:16Z

@arsenm @arsenm any issues with taking this as is?

topperc · 2025-01-03T18:46:02Z

Ping

aemerson

I don't have strong objections to it as it's internal to RISC-V.

topperc requested review from aemerson and arsenm December 18, 2024 08:27

llvmbot added backend:RISC-V llvm:globalisel labels Dec 18, 2024

arsenm reviewed Dec 20, 2024

View reviewed changes

aemerson approved these changes Jan 6, 2025

View reviewed changes

topperc merged commit 785b16a into llvm:main Jan 7, 2025
11 checks passed

topperc deleted the pr/gisel-split-pair branch January 7, 2025 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV][GISel] Support G_MERGE_VALUES/G_UNMERGE_VALUES with Zfa. #120379

[RISCV][GISel] Support G_MERGE_VALUES/G_UNMERGE_VALUES with Zfa. #120379

Uh oh!

topperc commented Dec 18, 2024

Uh oh!

llvmbot commented Dec 18, 2024 •

edited

Loading

Uh oh!

aemerson commented Dec 20, 2024 •

edited

Loading

Uh oh!

topperc commented Dec 20, 2024 •

edited

Loading

Uh oh!

arsenm commented Dec 20, 2024

Uh oh!

aemerson commented Dec 20, 2024 •

edited

Loading

Uh oh!

arsenm Dec 20, 2024

Uh oh!

topperc Dec 20, 2024

Uh oh!

topperc commented Dec 20, 2024

Uh oh!

arsenm commented Dec 20, 2024

Uh oh!

topperc commented Dec 23, 2024

Uh oh!

topperc commented Jan 3, 2025

Uh oh!

aemerson left a comment

Uh oh!

Uh oh!

Uh oh!

[RISCV][GISel] Support G_MERGE_VALUES/G_UNMERGE_VALUES with Zfa. #120379

[RISCV][GISel] Support G_MERGE_VALUES/G_UNMERGE_VALUES with Zfa. #120379

Uh oh!

Conversation

topperc commented Dec 18, 2024

Uh oh!

llvmbot commented Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aemerson commented Dec 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

topperc commented Dec 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Dec 20, 2024

Uh oh!

aemerson commented Dec 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm Dec 20, 2024

Choose a reason for hiding this comment

Uh oh!

topperc Dec 20, 2024

Choose a reason for hiding this comment

Uh oh!

topperc commented Dec 20, 2024

Uh oh!

arsenm commented Dec 20, 2024

Uh oh!

topperc commented Dec 23, 2024

Uh oh!

topperc commented Jan 3, 2025

Uh oh!

aemerson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Dec 18, 2024 •

edited

Loading

aemerson commented Dec 20, 2024 •

edited

Loading

topperc commented Dec 20, 2024 •

edited

Loading

aemerson commented Dec 20, 2024 •

edited

Loading