Skip to content

[X86] Enhance FABS/FNEG lowering for scalar _Float16 with bitwise operations #128637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 37 additions & 6 deletions llvm/lib/Target/X86/X86ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
// selection DAG.
//
//===----------------------------------------------------------------------===//

#include "X86ISelLowering.h"
#include "MCTargetDesc/X86ShuffleDecode.h"
#include "X86.h"
Expand Down Expand Up @@ -21987,8 +21986,6 @@ SDValue X86TargetLowering::LowerFP_EXTEND(SDValue Op, SelectionDAG &DAG) const {
return DAG.getNode(X86ISD::STRICT_VFPEXT, DL, {VT, MVT::Other},
{Op->getOperand(0), Res});
return DAG.getNode(X86ISD::VFPEXT, DL, VT, Res);
} else if (VT == MVT::v4f64 || VT == MVT::v8f64) {
return Op;
}

assert(SVT == MVT::v2f32 && "Only customize MVT::v2f32 type legalization!");
Expand Down Expand Up @@ -22285,10 +22282,45 @@ static SDValue LowerFROUND(SDValue Op, SelectionDAG &DAG) {
/// The only differences between FABS and FNEG are the mask and the logic op.
/// FNEG also has a folding opportunity for FNEG(FABS(x)).
static SDValue LowerFABSorFNEG(SDValue Op, SelectionDAG &DAG) {

assert((Op.getOpcode() == ISD::FABS || Op.getOpcode() == ISD::FNEG) &&
"Wrong opcode for lowering FABS or FNEG.");

bool IsFABS = (Op.getOpcode() == ISD::FABS);
SDLoc dl(Op);
MVT VT = Op.getSimpleValueType();

// Handle scalar _Float16 (f16) directly via integer bitwise operations.
if (VT == MVT::f16) {
SDValue Op0 = Op.getOperand(0);
bool IsFNABS = !IsFABS && Op0.getOpcode() == ISD::FABS;

// For FNABS (FNEG of FABS), bypass FABS.
if (IsFNABS)
Op0 = Op0.getOperand(0);

// Bitcast f16 to i16 for bitwise operations.
SDValue IntVal = DAG.getNode(ISD::BITCAST, dl, MVT::i16, Op0);

APInt MaskVal;
unsigned LogicOp;
if (IsFABS) {
MaskVal = APInt(16, 0x7FFF); // Clear sign bit.
LogicOp = ISD::AND;
} else if (IsFNABS) {
MaskVal = APInt(16, 0x8000); // Combine masks via OR.
LogicOp = ISD::OR;
} else {
MaskVal = APInt(16, 0x8000); // Flip sign bit.
LogicOp = ISD::XOR;
}

SDValue Mask = DAG.getConstant(MaskVal, dl, MVT::i16);
SDValue LogicNode = DAG.getNode(LogicOp, dl, MVT::i16, IntVal, Mask);

// Bitcast back to f16.
return DAG.getNode(ISD::BITCAST, dl, MVT::f16, LogicNode);
}

// If this is a FABS and it has an FNEG user, bail out to fold the combination
// into an FNABS. We'll lower the FABS after that if it is still in use.
Expand All @@ -22297,8 +22329,7 @@ static SDValue LowerFABSorFNEG(SDValue Op, SelectionDAG &DAG) {
if (User->getOpcode() == ISD::FNEG)
return Op;

SDLoc dl(Op);
MVT VT = Op.getSimpleValueType();
VT = Op.getSimpleValueType();

bool IsF128 = (VT == MVT::f128);
assert(VT.isFloatingPoint() && VT != MVT::f80 &&
Expand All @@ -22313,7 +22344,7 @@ static SDValue LowerFABSorFNEG(SDValue Op, SelectionDAG &DAG) {
// generate a 16-byte vector constant and logic op even for the scalar case.
// Using a 16-byte mask allows folding the load of the mask with
// the logic op, so it can save (~4 bytes) on code size.
bool IsFakeVector = !VT.isVector() && !IsF128;
bool IsFakeVector = !VT.isVector() && !IsF128 && VT != MVT::f16;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the IsFakeVector/LogicVT code already handles MVT::f16 cases - so my guess is the ISD::FABS/FNEG/FCOPYSIGN actions aren't set to Custom for SSE2+ targets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm... not very sure about it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SSE2 setF16Action(MVT::f16, Promote) call is setting the default values - these need to set to Custom for ISD::FABS/FNEG/FCOPYSIGN afterward.

Copy link
Contributor Author

@StarOne01 StarOne01 Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you suggest me something where i could read about the whole structure?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We make the MVT::f16 type legal for all SSE2+ targets:

    addRegisterClass(MVT::f16, Subtarget.hasAVX512() ? &X86::FR16XRegClass
                                                     : &X86::FR16RegClass);

Later on we setup default actions for all the F16 opcodes with the setF16Action helper (including FNEG/FABS/FCOPYSIGN), and then override some of them:

    setF16Action(MVT::f16, Promote);
    setOperationAction(ISD::FADD, MVT::f16, Promote);
    setOperationAction(ISD::FSUB, MVT::f16, Promote);
    setOperationAction(ISD::FMUL, MVT::f16, Promote);
    setOperationAction(ISD::FDIV, MVT::f16, Promote);
    setOperationAction(ISD::FP_ROUND, MVT::f16, Custom);
    setOperationAction(ISD::FP_EXTEND, MVT::f32, Custom);
    setOperationAction(ISD::FP_EXTEND, MVT::f64, Custom);

For MVT::v8f16 we do this again, this time remembering to reset FNEG/FABS/FCOPYSIGN to Custom:

    setF16Action(MVT::v8f16, Expand);
    setOperationAction(ISD::FADD, MVT::v8f16, Expand);
    setOperationAction(ISD::FSUB, MVT::v8f16, Expand);
    setOperationAction(ISD::FMUL, MVT::v8f16, Expand);
    setOperationAction(ISD::FDIV, MVT::v8f16, Expand);
    setOperationAction(ISD::FNEG, MVT::v8f16, Custom);
    setOperationAction(ISD::FABS, MVT::v8f16, Custom);
    setOperationAction(ISD::FCOPYSIGN, MVT::v8f16, Custom);

We should have done something similar for the MVT::f16 cases earlier.

Copy link
Contributor Author

@StarOne01 StarOne01 Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for explaining. that's it ? No need of the current changes ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so - it ok to reuse this PR and force push or start a new one - no strong preference.

MVT LogicVT = VT;
if (IsFakeVector)
LogicVT = (VT == MVT::f64) ? MVT::v2f64
Expand Down
Loading