-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RISCV][GISel] Support G_MERGE_VALUES/G_UNMERGE_VALUES with Zfa. #120379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Without Zfa we used pseudos that are lowered to a stack load/store. With Zfa we have instructions that can move a pair of registers to an FPR. Or move an high half of an FPR to a GPR. I've used a GINodeEquiv to make use of 3 of the 4 tablegen patterns. The split case with Zfa requires 2 instructions which I'm doing through custom isel like we do in SelectionDAG. One concern I have is, I'm not sure if its a good idea to make GINodeEquiv between a target independent generic opcode and a target dependent SelectionDAG opcode. Similar is done on Mips. And I saw some G_LOAD/G_STORE equivalents in AMDGPU so maybe its ok?
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-llvm-globalisel Author: Craig Topper (topperc) ChangesWithout Zfa we use pseudos that are lowered to a stack load/store. With Zfa we have instructions that can move a pair of registers to an FPR. Or move the high or low half of an FPR to a GPR. I've used a GINodeEquiv to make use of 3 of the 4 tablegen patterns. The split case with Zfa requires 2 instructions which I'm doing through custom isel like we do in SelectionDAG. One concern I have is, I'm not sure if its a good idea to make GINodeEquiv between a target independent generic opcode and a target dependent SelectionDAG opcode. Similar is done on Mips. And I saw some G_LOAD/G_STORE equivalents in AMDGPU so maybe its ok? Full diff: https://github.com/llvm/llvm-project/pull/120379.diff 3 Files Affected:
diff --git a/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp b/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
index 985264c591e105..a9a16f209c24f7 100644
--- a/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
+++ b/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
@@ -80,7 +80,6 @@ class RISCVInstructionSelector : public InstructionSelector {
bool selectFPCompare(MachineInstr &MI, MachineIRBuilder &MIB) const;
void emitFence(AtomicOrdering FenceOrdering, SyncScope::ID FenceSSID,
MachineIRBuilder &MIB) const;
- bool selectMergeValues(MachineInstr &MI, MachineIRBuilder &MIB) const;
bool selectUnmergeValues(MachineInstr &MI, MachineIRBuilder &MIB) const;
ComplexRendererFns selectShiftMask(MachineOperand &Root,
@@ -733,8 +732,6 @@ bool RISCVInstructionSelector::select(MachineInstr &MI) {
}
case TargetOpcode::G_IMPLICIT_DEF:
return selectImplicitDef(MI, MIB);
- case TargetOpcode::G_MERGE_VALUES:
- return selectMergeValues(MI, MIB);
case TargetOpcode::G_UNMERGE_VALUES:
return selectUnmergeValues(MI, MIB);
default:
@@ -742,26 +739,13 @@ bool RISCVInstructionSelector::select(MachineInstr &MI) {
}
}
-bool RISCVInstructionSelector::selectMergeValues(MachineInstr &MI,
- MachineIRBuilder &MIB) const {
- assert(MI.getOpcode() == TargetOpcode::G_MERGE_VALUES);
-
- // Build a F64 Pair from operands
- if (MI.getNumOperands() != 3)
- return false;
- Register Dst = MI.getOperand(0).getReg();
- Register Lo = MI.getOperand(1).getReg();
- Register Hi = MI.getOperand(2).getReg();
- if (!isRegInFprb(Dst) || !isRegInGprb(Lo) || !isRegInGprb(Hi))
- return false;
- MI.setDesc(TII.get(RISCV::BuildPairF64Pseudo));
- return constrainSelectedInstRegOperands(MI, TII, TRI, RBI);
-}
-
bool RISCVInstructionSelector::selectUnmergeValues(
MachineInstr &MI, MachineIRBuilder &MIB) const {
assert(MI.getOpcode() == TargetOpcode::G_UNMERGE_VALUES);
+ if (!Subtarget->hasStdExtZfa())
+ return false;
+
// Split F64 Src into two s32 parts
if (MI.getNumOperands() != 3)
return false;
@@ -770,8 +754,17 @@ bool RISCVInstructionSelector::selectUnmergeValues(
Register Hi = MI.getOperand(1).getReg();
if (!isRegInFprb(Src) || !isRegInGprb(Lo) || !isRegInGprb(Hi))
return false;
- MI.setDesc(TII.get(RISCV::SplitF64Pseudo));
- return constrainSelectedInstRegOperands(MI, TII, TRI, RBI);
+
+ MachineInstr *ExtractLo = MIB.buildInstr(RISCV::FMV_X_W_FPR64, {Lo}, {Src});
+ if (!constrainSelectedInstRegOperands(*ExtractLo, TII, TRI, RBI))
+ return false;
+
+ MachineInstr *ExtractHi = MIB.buildInstr(RISCV::FMVH_X_D, {Hi}, {Src});
+ if (!constrainSelectedInstRegOperands(*ExtractHi, TII, TRI, RBI))
+ return false;
+
+ MI.eraseFromParent();
+ return true;
}
bool RISCVInstructionSelector::replacePtrWithInt(MachineOperand &Op,
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoD.td b/llvm/lib/Target/RISCV/RISCVInstrInfoD.td
index ae969bff82fd12..349bc361c90fe8 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoD.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoD.td
@@ -23,7 +23,9 @@ def SDT_RISCVSplitF64 : SDTypeProfile<2, 1, [SDTCisVT<0, i32>,
SDTCisVT<2, f64>]>;
def RISCVBuildPairF64 : SDNode<"RISCVISD::BuildPairF64", SDT_RISCVBuildPairF64>;
+def : GINodeEquiv<G_MERGE_VALUES, RISCVBuildPairF64>;
def RISCVSplitF64 : SDNode<"RISCVISD::SplitF64", SDT_RISCVSplitF64>;
+def : GINodeEquiv<G_UNMERGE_VALUES, RISCVSplitF64>;
def AddrRegImmINX : ComplexPattern<iPTR, 2, "SelectAddrRegImmRV32Zdinx">;
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/double-zfa.ll b/llvm/test/CodeGen/RISCV/GlobalISel/double-zfa.ll
index 385156b3b99d48..48786992265824 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/double-zfa.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/double-zfa.ll
@@ -1,9 +1,8 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-
; RUN: llc -mtriple=riscv32 -mattr=+zfa,d -global-isel < %s \
-; RUN: | FileCheck %s
+; RUN: | FileCheck %s --check-prefixes=CHECK,RV32IDZFA
; RUN: llc -mtriple=riscv64 -mattr=+zfa,d -global-isel < %s \
-; RUN: | FileCheck %s
+; RUN: | FileCheck %s --check-prefixes=CHECK,RV64DZFA
define double @fceil(double %a) {
@@ -86,3 +85,32 @@ define double @fminimum(double %a, double %b) {
%c = call double @llvm.minimum.f64(double %a, double %b)
ret double %c
}
+
+define i64 @fmvh_x_d(double %fa) {
+; RV32IDZFA-LABEL: fmvh_x_d:
+; RV32IDZFA: # %bb.0:
+; RV32IDZFA-NEXT: fmv.x.w a0, fa0
+; RV32IDZFA-NEXT: fmvh.x.d a1, fa0
+; RV32IDZFA-NEXT: ret
+;
+; RV64DZFA-LABEL: fmvh_x_d:
+; RV64DZFA: # %bb.0:
+; RV64DZFA-NEXT: fmv.x.d a0, fa0
+; RV64DZFA-NEXT: ret
+ %i = bitcast double %fa to i64
+ ret i64 %i
+}
+
+define double @fmvp_d_x(i64 %a) {
+; RV32IDZFA-LABEL: fmvp_d_x:
+; RV32IDZFA: # %bb.0:
+; RV32IDZFA-NEXT: fmvp.d.x fa0, a0, a1
+; RV32IDZFA-NEXT: ret
+;
+; RV64DZFA-LABEL: fmvp_d_x:
+; RV64DZFA: # %bb.0:
+; RV64DZFA-NEXT: fmv.d.x fa0, a0
+; RV64DZFA-NEXT: ret
+ %or = bitcast i64 %a to double
+ ret double %or
+}
|
Is the general case of merge/unmerge really a 1-1 semantic mapping with your target node? If not I wouldn't advise going down this route, maybe lowering them into something more specific (like G_RISCV_MERGE) and then specifying the node equivalence would be a more precise route. |
Its the only case we have of G_MERGE_VALUES/UNMERGE_VALUES right now. Not sure if we will need more in the future. Looking at tablegen it looks like the mapping is from SelectionDAG node to GISelEquiv so the same GISel opcode can be mapped to multiple SelectionDAG opcodes? Where should I do the "lowering" if I were going to add G_RISCV_MERGE? |
This is mostly a hack for glue in SelectionDAG. We have to hack in a glue input in some cases on load/store/atomicrmw, and these are boilerplate to keep the patterns importing |
What I meant by 1-1 was that G_MERGE_VALUES takes any number of scalars and merges into a larger scalar. If your target node exactly implements the legal G_MERGE_VALUES variants for RISC-V then I guess it's ok. But if not you may run into issues later where the selector can't handle some of the edge cases.
If you don't want to implement a PostLegalizerLowering pass like AArch64 then I guess you could do it in |
@@ -23,7 +23,9 @@ def SDT_RISCVSplitF64 : SDTypeProfile<2, 1, [SDTCisVT<0, i32>, | |||
SDTCisVT<2, f64>]>; | |||
|
|||
def RISCVBuildPairF64 : SDNode<"RISCVISD::BuildPairF64", SDT_RISCVBuildPairF64>; | |||
def : GINodeEquiv<G_MERGE_VALUES, RISCVBuildPairF64>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have more variants of RISCVBuildPairF64, you'd eventually need some custom emitter code to differentiate them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the tablegen generated code looks like this
GIM_CheckNumOperands, /*MI*/0, /*Expected*/3,
GIM_RootCheckType, /*Op*/0, /*Type*/GILLT_s64,
GIM_RootCheckType, /*Op*/1, /*Type*/GILLT_s32,
GIM_RootCheckType, /*Op*/2, /*Type*/GILLT_s32,
GIM_RootCheckRegBankForClass, /*Op*/0, /*RC*/GIMT_Encode2(RISCV::FPR64RegClassID),
GIM_RootCheckRegBankForClass, /*Op*/1, /*RC*/GIMT_Encode2(RISCV::GPRRegClassID),
GIM_RootCheckRegBankForClass, /*Op*/2, /*RC*/GIMT_Encode2(RISCV::GPRRegClassID),
That seems disambiguated to the exact number of operands, types and regbank/class.
I think we tried to add a PostLegalizerLowering pass for something vector related in the past and got some negative feedback. |
I still consider a pre selection lowering a hack papering over missing selection patterns or missing legalization |
Ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong objections to it as it's internal to RISC-V.
Without Zfa we use pseudos that are lowered to a stack load/store. With Zfa we have instructions that can move a pair of registers to an FPR. Or move the high or low half of an FPR to a GPR.
I've used a GINodeEquiv to make use of 3 of the 4 tablegen patterns. The split case with Zfa requires 2 instructions which I'm doing through custom isel like we do in SelectionDAG. One concern I have is, I'm not sure if its a good idea to make GINodeEquiv between a target independent generic opcode and a target dependent SelectionDAG opcode. Similar is done on Mips. And I saw some G_LOAD/G_STORE equivalents in AMDGPU so maybe its ok?