Skip to content

[X86] Enhance FABS/FNEG lowering for scalar _Float16 with bitwise operations #128637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

StarOne01
Copy link
Contributor

This patch optimizes _Float16 FABS and FNEG operations by using direct
bitwise operations instead of converting to/from float (#126892).

Before this patch, f16 FABS/FNEG would:

  1. Convert f16->f32 (vcvtph2ps)
  2. Perform operation on f32
  3. Convert f32->f16 (vcvtps2ph)

With this patch, we now:

  1. Bitcast f16->i16
  2. Use AND/XOR with appropriate mask
  3. Bitcast i16->f16

Test Plan:

  • Added test cases for scalar f16 FABS/FNEG
  • Verified generated assembly shows direct bitwise operations
  • Ran X86 regression tests

Performance Impact:

  • Eliminates f16<->f32 conversion overhead
  • Reduces instruction count from 3 to 1

@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2025

@llvm/pr-subscribers-backend-x86

Author: Prashanth (StarOne01)

Changes

This patch optimizes _Float16 FABS and FNEG operations by using direct
bitwise operations instead of converting to/from float (#126892).

Before this patch, f16 FABS/FNEG would:

  1. Convert f16->f32 (vcvtph2ps)
  2. Perform operation on f32
  3. Convert f32->f16 (vcvtps2ph)

With this patch, we now:

  1. Bitcast f16->i16
  2. Use AND/XOR with appropriate mask
  3. Bitcast i16->f16

Test Plan:

  • Added test cases for scalar f16 FABS/FNEG
  • Verified generated assembly shows direct bitwise operations
  • Ran X86 regression tests

Performance Impact:

  • Eliminates f16<->f32 conversion overhead
  • Reduces instruction count from 3 to 1

Full diff: https://github.com/llvm/llvm-project/pull/128637.diff

1 Files Affected:

  • (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+37-6)
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index a4357197e2843..e77dd30b3c367 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -10,7 +10,6 @@
 // selection DAG.
 //
 //===----------------------------------------------------------------------===//
-
 #include "X86ISelLowering.h"
 #include "MCTargetDesc/X86ShuffleDecode.h"
 #include "X86.h"
@@ -21987,8 +21986,6 @@ SDValue X86TargetLowering::LowerFP_EXTEND(SDValue Op, SelectionDAG &DAG) const {
       return DAG.getNode(X86ISD::STRICT_VFPEXT, DL, {VT, MVT::Other},
                          {Op->getOperand(0), Res});
     return DAG.getNode(X86ISD::VFPEXT, DL, VT, Res);
-  } else if (VT == MVT::v4f64 || VT == MVT::v8f64) {
-    return Op;
   }
 
   assert(SVT == MVT::v2f32 && "Only customize MVT::v2f32 type legalization!");
@@ -22285,10 +22282,45 @@ static SDValue LowerFROUND(SDValue Op, SelectionDAG &DAG) {
 /// The only differences between FABS and FNEG are the mask and the logic op.
 /// FNEG also has a folding opportunity for FNEG(FABS(x)).
 static SDValue LowerFABSorFNEG(SDValue Op, SelectionDAG &DAG) {
+
   assert((Op.getOpcode() == ISD::FABS || Op.getOpcode() == ISD::FNEG) &&
          "Wrong opcode for lowering FABS or FNEG.");
 
   bool IsFABS = (Op.getOpcode() == ISD::FABS);
+  SDLoc dl(Op);
+  MVT VT = Op.getSimpleValueType();
+
+  // Handle scalar _Float16 (f16) directly via integer bitwise operations.
+  if (VT == MVT::f16) {
+    SDValue Op0 = Op.getOperand(0);
+    bool IsFNABS = !IsFABS && Op0.getOpcode() == ISD::FABS;
+
+    // For FNABS (FNEG of FABS), bypass FABS.
+    if (IsFNABS)
+      Op0 = Op0.getOperand(0);
+
+    // Bitcast f16 to i16 for bitwise operations.
+    SDValue IntVal = DAG.getNode(ISD::BITCAST, dl, MVT::i16, Op0);
+
+    APInt MaskVal;
+    unsigned LogicOp;
+    if (IsFABS) {
+      MaskVal = APInt(16, 0x7FFF); // Clear sign bit.
+      LogicOp = ISD::AND;
+    } else if (IsFNABS) {
+      MaskVal = APInt(16, 0x8000); // Combine masks via OR.
+      LogicOp = ISD::OR;
+    } else {
+      MaskVal = APInt(16, 0x8000); // Flip sign bit.
+      LogicOp = ISD::XOR;
+    }
+
+    SDValue Mask = DAG.getConstant(MaskVal, dl, MVT::i16);
+    SDValue LogicNode = DAG.getNode(LogicOp, dl, MVT::i16, IntVal, Mask);
+
+    // Bitcast back to f16.
+    return DAG.getNode(ISD::BITCAST, dl, MVT::f16, LogicNode);
+  }
 
   // If this is a FABS and it has an FNEG user, bail out to fold the combination
   // into an FNABS. We'll lower the FABS after that if it is still in use.
@@ -22297,8 +22329,7 @@ static SDValue LowerFABSorFNEG(SDValue Op, SelectionDAG &DAG) {
       if (User->getOpcode() == ISD::FNEG)
         return Op;
 
-  SDLoc dl(Op);
-  MVT VT = Op.getSimpleValueType();
+  VT = Op.getSimpleValueType();
 
   bool IsF128 = (VT == MVT::f128);
   assert(VT.isFloatingPoint() && VT != MVT::f80 &&
@@ -22313,7 +22344,7 @@ static SDValue LowerFABSorFNEG(SDValue Op, SelectionDAG &DAG) {
   // generate a 16-byte vector constant and logic op even for the scalar case.
   // Using a 16-byte mask allows folding the load of the mask with
   // the logic op, so it can save (~4 bytes) on code size.
-  bool IsFakeVector = !VT.isVector() && !IsF128;
+  bool IsFakeVector = !VT.isVector() && !IsF128 && VT != MVT::f16;
   MVT LogicVT = VT;
   if (IsFakeVector)
     LogicVT = (VT == MVT::f64)   ? MVT::v2f64

@@ -22313,7 +22344,7 @@ static SDValue LowerFABSorFNEG(SDValue Op, SelectionDAG &DAG) {
// generate a 16-byte vector constant and logic op even for the scalar case.
// Using a 16-byte mask allows folding the load of the mask with
// the logic op, so it can save (~4 bytes) on code size.
bool IsFakeVector = !VT.isVector() && !IsF128;
bool IsFakeVector = !VT.isVector() && !IsF128 && VT != MVT::f16;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the IsFakeVector/LogicVT code already handles MVT::f16 cases - so my guess is the ISD::FABS/FNEG/FCOPYSIGN actions aren't set to Custom for SSE2+ targets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm... not very sure about it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SSE2 setF16Action(MVT::f16, Promote) call is setting the default values - these need to set to Custom for ISD::FABS/FNEG/FCOPYSIGN afterward.

Copy link
Contributor Author

@StarOne01 StarOne01 Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you suggest me something where i could read about the whole structure?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We make the MVT::f16 type legal for all SSE2+ targets:

    addRegisterClass(MVT::f16, Subtarget.hasAVX512() ? &X86::FR16XRegClass
                                                     : &X86::FR16RegClass);

Later on we setup default actions for all the F16 opcodes with the setF16Action helper (including FNEG/FABS/FCOPYSIGN), and then override some of them:

    setF16Action(MVT::f16, Promote);
    setOperationAction(ISD::FADD, MVT::f16, Promote);
    setOperationAction(ISD::FSUB, MVT::f16, Promote);
    setOperationAction(ISD::FMUL, MVT::f16, Promote);
    setOperationAction(ISD::FDIV, MVT::f16, Promote);
    setOperationAction(ISD::FP_ROUND, MVT::f16, Custom);
    setOperationAction(ISD::FP_EXTEND, MVT::f32, Custom);
    setOperationAction(ISD::FP_EXTEND, MVT::f64, Custom);

For MVT::v8f16 we do this again, this time remembering to reset FNEG/FABS/FCOPYSIGN to Custom:

    setF16Action(MVT::v8f16, Expand);
    setOperationAction(ISD::FADD, MVT::v8f16, Expand);
    setOperationAction(ISD::FSUB, MVT::v8f16, Expand);
    setOperationAction(ISD::FMUL, MVT::v8f16, Expand);
    setOperationAction(ISD::FDIV, MVT::v8f16, Expand);
    setOperationAction(ISD::FNEG, MVT::v8f16, Custom);
    setOperationAction(ISD::FABS, MVT::v8f16, Custom);
    setOperationAction(ISD::FCOPYSIGN, MVT::v8f16, Custom);

We should have done something similar for the MVT::f16 cases earlier.

Copy link
Contributor Author

@StarOne01 StarOne01 Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for explaining. that's it ? No need of the current changes ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so - it ok to reuse this PR and force push or start a new one - no strong preference.

@RKSimon RKSimon requested a review from phoebewang February 25, 2025 09:01
@RKSimon RKSimon closed this Feb 26, 2025
@StarOne01 StarOne01 deleted the fneg_fabs branch February 26, 2025 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants