[X86] Enhance FABS/FNEG lowering for scalar _Float16 with bitwise operations #128637

StarOne01 · 2025-02-25T06:29:38Z

This patch optimizes _Float16 FABS and FNEG operations by using direct
bitwise operations instead of converting to/from float (#126892).

Before this patch, f16 FABS/FNEG would:

Convert f16->f32 (vcvtph2ps)
Perform operation on f32
Convert f32->f16 (vcvtps2ph)

With this patch, we now:

Bitcast f16->i16
Use AND/XOR with appropriate mask
Bitcast i16->f16

Test Plan:

Added test cases for scalar f16 FABS/FNEG
Verified generated assembly shows direct bitwise operations
Ran X86 regression tests

Performance Impact:

Eliminates f16<->f32 conversion overhead
Reduces instruction count from 3 to 1

…rations

… (format)

llvmbot · 2025-02-25T06:30:11Z

@llvm/pr-subscribers-backend-x86

Author: Prashanth (StarOne01)

Changes

This patch optimizes _Float16 FABS and FNEG operations by using direct
bitwise operations instead of converting to/from float (#126892).

Before this patch, f16 FABS/FNEG would:

Convert f16->f32 (vcvtph2ps)
Perform operation on f32
Convert f32->f16 (vcvtps2ph)

With this patch, we now:

Bitcast f16->i16
Use AND/XOR with appropriate mask
Bitcast i16->f16

Test Plan:

Added test cases for scalar f16 FABS/FNEG
Verified generated assembly shows direct bitwise operations
Ran X86 regression tests

Performance Impact:

Eliminates f16<->f32 conversion overhead
Reduces instruction count from 3 to 1

Full diff: https://github.com/llvm/llvm-project/pull/128637.diff

1 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+37-6)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index a4357197e2843..e77dd30b3c367 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -10,7 +10,6 @@
 // selection DAG.
 //
 //===----------------------------------------------------------------------===//
-
 #include "X86ISelLowering.h"
 #include "MCTargetDesc/X86ShuffleDecode.h"
 #include "X86.h"
@@ -21987,8 +21986,6 @@ SDValue X86TargetLowering::LowerFP_EXTEND(SDValue Op, SelectionDAG &DAG) const {
       return DAG.getNode(X86ISD::STRICT_VFPEXT, DL, {VT, MVT::Other},
                          {Op->getOperand(0), Res});
     return DAG.getNode(X86ISD::VFPEXT, DL, VT, Res);
-  } else if (VT == MVT::v4f64 || VT == MVT::v8f64) {
-    return Op;
   }
 
   assert(SVT == MVT::v2f32 && "Only customize MVT::v2f32 type legalization!");
@@ -22285,10 +22282,45 @@ static SDValue LowerFROUND(SDValue Op, SelectionDAG &DAG) {
 /// The only differences between FABS and FNEG are the mask and the logic op.
 /// FNEG also has a folding opportunity for FNEG(FABS(x)).
 static SDValue LowerFABSorFNEG(SDValue Op, SelectionDAG &DAG) {
+
   assert((Op.getOpcode() == ISD::FABS || Op.getOpcode() == ISD::FNEG) &&
          "Wrong opcode for lowering FABS or FNEG.");
 
   bool IsFABS = (Op.getOpcode() == ISD::FABS);
+  SDLoc dl(Op);
+  MVT VT = Op.getSimpleValueType();
+
+  // Handle scalar _Float16 (f16) directly via integer bitwise operations.
+  if (VT == MVT::f16) {
+    SDValue Op0 = Op.getOperand(0);
+    bool IsFNABS = !IsFABS && Op0.getOpcode() == ISD::FABS;
+
+    // For FNABS (FNEG of FABS), bypass FABS.
+    if (IsFNABS)
+      Op0 = Op0.getOperand(0);
+
+    // Bitcast f16 to i16 for bitwise operations.
+    SDValue IntVal = DAG.getNode(ISD::BITCAST, dl, MVT::i16, Op0);
+
+    APInt MaskVal;
+    unsigned LogicOp;
+    if (IsFABS) {
+      MaskVal = APInt(16, 0x7FFF); // Clear sign bit.
+      LogicOp = ISD::AND;
+    } else if (IsFNABS) {
+      MaskVal = APInt(16, 0x8000); // Combine masks via OR.
+      LogicOp = ISD::OR;
+    } else {
+      MaskVal = APInt(16, 0x8000); // Flip sign bit.
+      LogicOp = ISD::XOR;
+    }
+
+    SDValue Mask = DAG.getConstant(MaskVal, dl, MVT::i16);
+    SDValue LogicNode = DAG.getNode(LogicOp, dl, MVT::i16, IntVal, Mask);
+
+    // Bitcast back to f16.
+    return DAG.getNode(ISD::BITCAST, dl, MVT::f16, LogicNode);
+  }
 
   // If this is a FABS and it has an FNEG user, bail out to fold the combination
   // into an FNABS. We'll lower the FABS after that if it is still in use.
@@ -22297,8 +22329,7 @@ static SDValue LowerFABSorFNEG(SDValue Op, SelectionDAG &DAG) {
       if (User->getOpcode() == ISD::FNEG)
         return Op;
 
-  SDLoc dl(Op);
-  MVT VT = Op.getSimpleValueType();
+  VT = Op.getSimpleValueType();
 
   bool IsF128 = (VT == MVT::f128);
   assert(VT.isFloatingPoint() && VT != MVT::f80 &&
@@ -22313,7 +22344,7 @@ static SDValue LowerFABSorFNEG(SDValue Op, SelectionDAG &DAG) {
   // generate a 16-byte vector constant and logic op even for the scalar case.
   // Using a 16-byte mask allows folding the load of the mask with
   // the logic op, so it can save (~4 bytes) on code size.
-  bool IsFakeVector = !VT.isVector() && !IsF128;
+  bool IsFakeVector = !VT.isVector() && !IsF128 && VT != MVT::f16;
   MVT LogicVT = VT;
   if (IsFakeVector)
     LogicVT = (VT == MVT::f64)   ? MVT::v2f64

RKSimon · 2025-02-25T09:00:59Z

llvm/lib/Target/X86/X86ISelLowering.cpp

@@ -22313,7 +22344,7 @@ static SDValue LowerFABSorFNEG(SDValue Op, SelectionDAG &DAG) {
  // generate a 16-byte vector constant and logic op even for the scalar case.
  // Using a 16-byte mask allows folding the load of the mask with
  // the logic op, so it can save (~4 bytes) on code size.
-  bool IsFakeVector = !VT.isVector() && !IsF128;
+  bool IsFakeVector = !VT.isVector() && !IsF128 && VT != MVT::f16;


Looks like the IsFakeVector/LogicVT code already handles MVT::f16 cases - so my guess is the ISD::FABS/FNEG/FCOPYSIGN actions aren't set to Custom for SSE2+ targets?

Umm... not very sure about it.

The SSE2 setF16Action(MVT::f16, Promote) call is setting the default values - these need to set to Custom for ISD::FABS/FNEG/FCOPYSIGN afterward.

Could you suggest me something where i could read about the whole structure?

We make the MVT::f16 type legal for all SSE2+ targets:

addRegisterClass(MVT::f16, Subtarget.hasAVX512() ? &X86::FR16XRegClass : &X86::FR16RegClass);

Later on we setup default actions for all the F16 opcodes with the setF16Action helper (including FNEG/FABS/FCOPYSIGN), and then override some of them:

setF16Action(MVT::f16, Promote); setOperationAction(ISD::FADD, MVT::f16, Promote); setOperationAction(ISD::FSUB, MVT::f16, Promote); setOperationAction(ISD::FMUL, MVT::f16, Promote); setOperationAction(ISD::FDIV, MVT::f16, Promote); setOperationAction(ISD::FP_ROUND, MVT::f16, Custom); setOperationAction(ISD::FP_EXTEND, MVT::f32, Custom); setOperationAction(ISD::FP_EXTEND, MVT::f64, Custom);

For MVT::v8f16 we do this again, this time remembering to reset FNEG/FABS/FCOPYSIGN to Custom:

setF16Action(MVT::v8f16, Expand); setOperationAction(ISD::FADD, MVT::v8f16, Expand); setOperationAction(ISD::FSUB, MVT::v8f16, Expand); setOperationAction(ISD::FMUL, MVT::v8f16, Expand); setOperationAction(ISD::FDIV, MVT::v8f16, Expand); setOperationAction(ISD::FNEG, MVT::v8f16, Custom); setOperationAction(ISD::FABS, MVT::v8f16, Custom); setOperationAction(ISD::FCOPYSIGN, MVT::v8f16, Custom);

We should have done something similar for the MVT::f16 cases earlier.

Hey, thanks for explaining. that's it ? No need of the current changes ?

I don't think so - it ok to reuse this PR and force push or start a new one - no strong preference.

StarOne01 added 2 commits February 25, 2025 11:44

[X86] Enhance FABS/FNEG lowering for scalar _Float16 with bitwise ope…

dca9106

…rations

[X86] Refactor whitespace in LowerFABSorFNEG for improved readability…

0613d36

… (format)

llvmbot added the backend:X86 label Feb 25, 2025

RKSimon reviewed Feb 25, 2025

View reviewed changes

RKSimon requested a review from phoebewang February 25, 2025 09:01

RKSimon closed this Feb 26, 2025

StarOne01 deleted the fneg_fabs branch February 26, 2025 14:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] Enhance FABS/FNEG lowering for scalar _Float16 with bitwise operations #128637

[X86] Enhance FABS/FNEG lowering for scalar _Float16 with bitwise operations #128637

Uh oh!

StarOne01 commented Feb 25, 2025

Uh oh!

llvmbot commented Feb 25, 2025

Uh oh!

RKSimon Feb 25, 2025

Uh oh!

StarOne01 Feb 25, 2025

Uh oh!

RKSimon Feb 25, 2025

Uh oh!

StarOne01 Feb 26, 2025 •

edited

Loading

Uh oh!

RKSimon Feb 26, 2025

Uh oh!

StarOne01 Feb 26, 2025 •

edited

Loading

Uh oh!

RKSimon Feb 26, 2025

Uh oh!

Uh oh!

[X86] Enhance FABS/FNEG lowering for scalar _Float16 with bitwise operations #128637

[X86] Enhance FABS/FNEG lowering for scalar _Float16 with bitwise operations #128637

Uh oh!

Conversation

StarOne01 commented Feb 25, 2025

Uh oh!

llvmbot commented Feb 25, 2025

Uh oh!

RKSimon Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

StarOne01 Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

StarOne01 Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RKSimon Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

StarOne01 Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RKSimon Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

StarOne01 Feb 26, 2025 •

edited

Loading

StarOne01 Feb 26, 2025 •

edited

Loading