-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[X86] Enhance FABS/FNEG lowering for scalar _Float16 with bitwise operations #128637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-x86 Author: Prashanth (StarOne01) ChangesThis patch optimizes _Float16 FABS and FNEG operations by using direct Before this patch, f16 FABS/FNEG would:
With this patch, we now:
Test Plan:
Performance Impact:
Full diff: https://github.com/llvm/llvm-project/pull/128637.diff 1 Files Affected:
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index a4357197e2843..e77dd30b3c367 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -10,7 +10,6 @@
// selection DAG.
//
//===----------------------------------------------------------------------===//
-
#include "X86ISelLowering.h"
#include "MCTargetDesc/X86ShuffleDecode.h"
#include "X86.h"
@@ -21987,8 +21986,6 @@ SDValue X86TargetLowering::LowerFP_EXTEND(SDValue Op, SelectionDAG &DAG) const {
return DAG.getNode(X86ISD::STRICT_VFPEXT, DL, {VT, MVT::Other},
{Op->getOperand(0), Res});
return DAG.getNode(X86ISD::VFPEXT, DL, VT, Res);
- } else if (VT == MVT::v4f64 || VT == MVT::v8f64) {
- return Op;
}
assert(SVT == MVT::v2f32 && "Only customize MVT::v2f32 type legalization!");
@@ -22285,10 +22282,45 @@ static SDValue LowerFROUND(SDValue Op, SelectionDAG &DAG) {
/// The only differences between FABS and FNEG are the mask and the logic op.
/// FNEG also has a folding opportunity for FNEG(FABS(x)).
static SDValue LowerFABSorFNEG(SDValue Op, SelectionDAG &DAG) {
+
assert((Op.getOpcode() == ISD::FABS || Op.getOpcode() == ISD::FNEG) &&
"Wrong opcode for lowering FABS or FNEG.");
bool IsFABS = (Op.getOpcode() == ISD::FABS);
+ SDLoc dl(Op);
+ MVT VT = Op.getSimpleValueType();
+
+ // Handle scalar _Float16 (f16) directly via integer bitwise operations.
+ if (VT == MVT::f16) {
+ SDValue Op0 = Op.getOperand(0);
+ bool IsFNABS = !IsFABS && Op0.getOpcode() == ISD::FABS;
+
+ // For FNABS (FNEG of FABS), bypass FABS.
+ if (IsFNABS)
+ Op0 = Op0.getOperand(0);
+
+ // Bitcast f16 to i16 for bitwise operations.
+ SDValue IntVal = DAG.getNode(ISD::BITCAST, dl, MVT::i16, Op0);
+
+ APInt MaskVal;
+ unsigned LogicOp;
+ if (IsFABS) {
+ MaskVal = APInt(16, 0x7FFF); // Clear sign bit.
+ LogicOp = ISD::AND;
+ } else if (IsFNABS) {
+ MaskVal = APInt(16, 0x8000); // Combine masks via OR.
+ LogicOp = ISD::OR;
+ } else {
+ MaskVal = APInt(16, 0x8000); // Flip sign bit.
+ LogicOp = ISD::XOR;
+ }
+
+ SDValue Mask = DAG.getConstant(MaskVal, dl, MVT::i16);
+ SDValue LogicNode = DAG.getNode(LogicOp, dl, MVT::i16, IntVal, Mask);
+
+ // Bitcast back to f16.
+ return DAG.getNode(ISD::BITCAST, dl, MVT::f16, LogicNode);
+ }
// If this is a FABS and it has an FNEG user, bail out to fold the combination
// into an FNABS. We'll lower the FABS after that if it is still in use.
@@ -22297,8 +22329,7 @@ static SDValue LowerFABSorFNEG(SDValue Op, SelectionDAG &DAG) {
if (User->getOpcode() == ISD::FNEG)
return Op;
- SDLoc dl(Op);
- MVT VT = Op.getSimpleValueType();
+ VT = Op.getSimpleValueType();
bool IsF128 = (VT == MVT::f128);
assert(VT.isFloatingPoint() && VT != MVT::f80 &&
@@ -22313,7 +22344,7 @@ static SDValue LowerFABSorFNEG(SDValue Op, SelectionDAG &DAG) {
// generate a 16-byte vector constant and logic op even for the scalar case.
// Using a 16-byte mask allows folding the load of the mask with
// the logic op, so it can save (~4 bytes) on code size.
- bool IsFakeVector = !VT.isVector() && !IsF128;
+ bool IsFakeVector = !VT.isVector() && !IsF128 && VT != MVT::f16;
MVT LogicVT = VT;
if (IsFakeVector)
LogicVT = (VT == MVT::f64) ? MVT::v2f64
|
@@ -22313,7 +22344,7 @@ static SDValue LowerFABSorFNEG(SDValue Op, SelectionDAG &DAG) { | |||
// generate a 16-byte vector constant and logic op even for the scalar case. | |||
// Using a 16-byte mask allows folding the load of the mask with | |||
// the logic op, so it can save (~4 bytes) on code size. | |||
bool IsFakeVector = !VT.isVector() && !IsF128; | |||
bool IsFakeVector = !VT.isVector() && !IsF128 && VT != MVT::f16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the IsFakeVector/LogicVT code already handles MVT::f16 cases - so my guess is the ISD::FABS/FNEG/FCOPYSIGN actions aren't set to Custom for SSE2+ targets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm... not very sure about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SSE2 setF16Action(MVT::f16, Promote)
call is setting the default values - these need to set to Custom for ISD::FABS/FNEG/FCOPYSIGN afterward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you suggest me something where i could read about the whole structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We make the MVT::f16 type legal for all SSE2+ targets:
addRegisterClass(MVT::f16, Subtarget.hasAVX512() ? &X86::FR16XRegClass
: &X86::FR16RegClass);
Later on we setup default actions for all the F16 opcodes with the setF16Action helper (including FNEG/FABS/FCOPYSIGN), and then override some of them:
setF16Action(MVT::f16, Promote);
setOperationAction(ISD::FADD, MVT::f16, Promote);
setOperationAction(ISD::FSUB, MVT::f16, Promote);
setOperationAction(ISD::FMUL, MVT::f16, Promote);
setOperationAction(ISD::FDIV, MVT::f16, Promote);
setOperationAction(ISD::FP_ROUND, MVT::f16, Custom);
setOperationAction(ISD::FP_EXTEND, MVT::f32, Custom);
setOperationAction(ISD::FP_EXTEND, MVT::f64, Custom);
For MVT::v8f16 we do this again, this time remembering to reset FNEG/FABS/FCOPYSIGN to Custom:
setF16Action(MVT::v8f16, Expand);
setOperationAction(ISD::FADD, MVT::v8f16, Expand);
setOperationAction(ISD::FSUB, MVT::v8f16, Expand);
setOperationAction(ISD::FMUL, MVT::v8f16, Expand);
setOperationAction(ISD::FDIV, MVT::v8f16, Expand);
setOperationAction(ISD::FNEG, MVT::v8f16, Custom);
setOperationAction(ISD::FABS, MVT::v8f16, Custom);
setOperationAction(ISD::FCOPYSIGN, MVT::v8f16, Custom);
We should have done something similar for the MVT::f16 cases earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for explaining. that's it ? No need of the current changes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so - it ok to reuse this PR and force push or start a new one - no strong preference.
This patch optimizes _Float16 FABS and FNEG operations by using direct
bitwise operations instead of converting to/from float (#126892).
Before this patch, f16 FABS/FNEG would:
With this patch, we now:
Test Plan:
Performance Impact: