-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AArch64] Create set.fpmr intrinsic and assembly lowering #114248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch introduces new llvm.set.fpmr intrinsics for setting value in FPMR register and adds its lowering to series of read-compare-write instructions. This intrinsic will be generated during lowering of FP8 C intrinsics into LLVM-IR introduced in later patch.
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-llvm-ir Author: None (Lukacma) ChangesThis patch introduces new llvm.set.fpmr intrinsics for setting value in FPMR register and adds its lowering to series of read-compare-write instructions. This intrinsic will be generated during lowering of FP8 C intrinsics into LLVM-IR introduced in later patch. Full diff: https://github.com/llvm/llvm-project/pull/114248.diff 6 Files Affected:
diff --git a/llvm/include/llvm/IR/IntrinsicsAArch64.td b/llvm/include/llvm/IR/IntrinsicsAArch64.td
index 594069c619ceb0..4579f3fda523c3 100644
--- a/llvm/include/llvm/IR/IntrinsicsAArch64.td
+++ b/llvm/include/llvm/IR/IntrinsicsAArch64.td
@@ -778,6 +778,9 @@ def int_aarch64_get_fpcr : FPENV_Get_Intrinsic;
def int_aarch64_set_fpcr : FPENV_Set_Intrinsic;
def int_aarch64_get_fpsr : FPENV_Get_Intrinsic;
def int_aarch64_set_fpsr : FPENV_Set_Intrinsic;
+def int_aarch64_set_fpmr : DefaultAttrsIntrinsic<[], [llvm_i64_ty], [IntrInaccessibleMemOnly]>{
+ let TargetPrefix = "aarch64";
+}
// Armv8.5-A Random number generation intrinsics
def int_aarch64_rndr : RNDR_Intrinsic;
diff --git a/llvm/lib/CodeGen/LivePhysRegs.cpp b/llvm/lib/CodeGen/LivePhysRegs.cpp
index 96380d40848257..330ea65770847a 100644
--- a/llvm/lib/CodeGen/LivePhysRegs.cpp
+++ b/llvm/lib/CodeGen/LivePhysRegs.cpp
@@ -262,7 +262,7 @@ void llvm::addLiveIns(MachineBasicBlock &MBB, const LivePhysRegs &LiveRegs) {
const MachineRegisterInfo &MRI = MF.getRegInfo();
const TargetRegisterInfo &TRI = *MRI.getTargetRegisterInfo();
for (MCPhysReg Reg : LiveRegs) {
- if (MRI.isReserved(Reg))
+ if (TRI.getReservedRegs(MF).test(Reg))
continue;
// Skip the register if we are about to add one of its super registers.
if (any_of(TRI.superregs(Reg), [&](MCPhysReg SReg) {
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 32ba2866ac8180..f5d08bf8d57a4d 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -3007,6 +3007,49 @@ MachineBasicBlock *AArch64TargetLowering::EmitLoweredCatchRet(
return BB;
}
+MachineBasicBlock *
+AArch64TargetLowering::EmitLoweredSetFpmr(MachineInstr &MI,
+ MachineBasicBlock *MBB) const {
+ MachineFunction *MF = MBB->getParent();
+ const TargetInstrInfo *TII = Subtarget->getInstrInfo();
+ DebugLoc DL = MI.getDebugLoc();
+ const BasicBlock *LLVM_BB = MBB->getBasicBlock();
+ Register NewFpmrVal = MI.getOperand(0).getReg();
+
+ // Test if FPMR is set correctly already
+ Register OldFpmrVal =
+ MI.getMF()->getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
+ BuildMI(*MBB, MI, DL, TII->get(AArch64::MRS), OldFpmrVal)
+ .addImm(0xda22)
+ .addUse(AArch64::FPMR, RegState::Implicit);
+ BuildMI(*MBB, MI, DL, TII->get(AArch64::SUBSXrs), AArch64::XZR)
+ .addReg(OldFpmrVal)
+ .addReg(NewFpmrVal)
+ .addImm(0);
+
+ MachineBasicBlock *MsrBB = MF->CreateMachineBasicBlock(LLVM_BB);
+ // Transfer rest of current basic-block to EndBB
+ MachineBasicBlock *EndBB = MBB->splitAt(MI);
+ MF->insert(++MBB->getIterator(), MsrBB);
+
+ // If already set continue
+ BuildMI(*MBB, MI, DL, TII->get(AArch64::Bcc))
+ .addImm(AArch64CC::EQ)
+ .addMBB(EndBB);
+
+ BuildMI(*MsrBB, MsrBB->begin(), DL, TII->get(AArch64::MSR))
+ .addImm(0xda22)
+ .addReg(NewFpmrVal)
+ .addDef(AArch64::FPMR, RegState::Implicit);
+
+ MBB->addSuccessor(MsrBB);
+ // MsrBB falls through to the end.
+ MsrBB->addSuccessor(EndBB);
+
+ MI.eraseFromParent();
+ return EndBB;
+}
+
MachineBasicBlock *
AArch64TargetLowering::EmitDynamicProbedAlloc(MachineInstr &MI,
MachineBasicBlock *MBB) const {
@@ -3292,6 +3335,8 @@ MachineBasicBlock *AArch64TargetLowering::EmitInstrWithCustomInserter(
return EmitZTInstr(MI, BB, AArch64::ZERO_T, /*Op0IsDef=*/true);
case AArch64::MOVT_TIZ_PSEUDO:
return EmitZTInstr(MI, BB, AArch64::MOVT_TIZ, /*Op0IsDef=*/true);
+ case AArch64::SET_FPMR:
+ return EmitLoweredSetFpmr(MI, BB);
}
}
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index d696355bb062a8..4d72c93d255821 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -647,6 +647,9 @@ class AArch64TargetLowering : public TargetLowering {
MachineBasicBlock *EmitLoweredCatchRet(MachineInstr &MI,
MachineBasicBlock *BB) const;
+ MachineBasicBlock *EmitLoweredSetFpmr(MachineInstr &MI,
+ MachineBasicBlock *BB) const;
+
MachineBasicBlock *EmitDynamicProbedAlloc(MachineInstr &MI,
MachineBasicBlock *MBB) const;
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 6194de2d56b630..be62daceeed750 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -2145,6 +2145,11 @@ def MSR_FPSR : Pseudo<(outs), (ins GPR64:$val),
PseudoInstExpansion<(MSR 0xda21, GPR64:$val)>,
Sched<[WriteSys]>;
+let Uses = [FPMR], Defs = [FPMR, NZCV], usesCustomInserter = 1 in
+def SET_FPMR : Pseudo<(outs), (ins GPR64:$val),
+ [(int_aarch64_set_fpmr i64:$val)]>,
+ Sched<[WriteSys]>;
+
// Generic system instructions
def SYSxt : SystemXtI<0, "sys">;
def SYSLxt : SystemLXtI<1, "sysl">;
diff --git a/llvm/test/CodeGen/AArch64/arm64-fpenv.ll b/llvm/test/CodeGen/AArch64/arm64-fpenv.ll
index 030809caee3394..deb0a2df7b7fba 100644
--- a/llvm/test/CodeGen/AArch64/arm64-fpenv.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-fpenv.ll
@@ -1,5 +1,5 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
-; RUN: llc -mtriple=aarch64 < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64 -mattr=+fpmr -verify-machineinstrs < %s | FileCheck %s
define i64 @get_fpcr() #0 {
; CHECK-LABEL: get_fpcr:
@@ -37,6 +37,20 @@ define void @set_fpsr(i64 %sr) {
ret void
}
+define dso_local void @set_fpmr(i64 %sr) {
+; CHECK-LABEL: set_fpmr:
+; CHECK: // %bb.0:
+; CHECK-NEXT: mrs x8, FPMR
+; CHECK-NEXT: cmp x8, x0
+; CHECK-NEXT: b.eq .LBB4_2
+; CHECK-NEXT: // %bb.1:
+; CHECK-NEXT: msr FPMR, x0
+; CHECK-NEXT: .LBB4_2:
+; CHECK-NEXT: ret
+ call void @llvm.aarch64.set.fpmr(i64 %sr)
+ ret void
+}
+
declare i64 @llvm.aarch64.get.fpcr()
declare void @llvm.aarch64.set.fpcr(i64)
declare i64 @llvm.aarch64.get.fpsr()
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain how these are intended to be used? I am maybe a bit out of touch but I was under the impression that the FP8 intrinsics had fpmr operands to make data-flow analysis possible. Does a stand-alone intrinsic not go against that?
Do you mean that the compiler would have a pass to improve the writes into FMPR?Is that what you mean? For this implementation the compiler would generate a write to FMPR to every FP8 intrinsic. |
Hi. Yeah sorry there were two issues I had.
|
Hello David, Thank you for your feedback! I have now fixed the first issue you had with the patch. As for the 2nd one, for now I don't think creating data dependencies is necessary. Functionally, dependencies between defs and uses of FPMR are going to be handled using memory dependency through inaccessible memory. I do not see a "nice" way to handle overlaping ranges without using memory dependency, but if you know of a way I would be keen on hearing about it and we might change to data dependency only later. I agree that it will help with optimization to not have to search through the program for where the fpmr value might be coming from, but that is something we will think about once we start optimizing these intrinsics. |
As you ask here I will try and reply - we had a conversation about this internally and it seemed the response was that we plan to look into this more and hopefully change it later. (My worry with that is that is will hard to do the more time is spent with them working this way). As for handling overlapping ranges it would just be a case of sinking the intrinsic to the use, it sounds like this will need a pass to do the placement in either case. But I don't actually mean to say that it needs to remove the memory side effect entirely. It could keep the side effect to limit how much the intrinsics move and have the benefits of knowing the fpmr and no overlapping ranges at the same time. You can even remove one of them later if it is decided they are not useful (going the other way is obviously more difficult). Thanks for simplifying the codegen though - hopefully we won't end up needing it in the future, and if we do we can always resurrect it. |
This patch introduces new llvm.set.fpmr intrinsics for setting value in FPMR register and adds its lowering to series of read-compare-write instructions. This intrinsic will be generated during lowering of FP8 C intrinsics into LLVM-IR introduced in later patch. ***This is an experimental implementation of handling fp8 intriniscs and is likely to change in the future.***
This patch introduces new llvm.set.fpmr intrinsics for setting value in FPMR register and adds its lowering to series of read-compare-write instructions. This intrinsic will be generated during lowering of FP8 C intrinsics into LLVM-IR introduced in later patch.
This is an experimental implementation of handling fp8 intriniscs and is likely to change in the future.