AMDGPU: Add round-to-odd rounding during f64 to bf16 conversion #133995

changpeng · 2025-04-01T22:01:46Z

f64 -> bf16 conversion can be lowered to f64 -> f32 followed by f32 -> bf16:
v_cvt_f32_f64_e32 v0, v[0:1]
v_cvt_pk_bf16_f32 v0, v0, s0
Both conversion instructions will do round-to-even rounding, and thus we will have double rounding issue which may generate incorrect result in some data range. We need to add round-to-odd rounding during f64 -> f32 to avoid double rounding,.

NOTE: we are having the same issue with f64 -> f16 conversion. Will add round-to-odd rounding for it in a separate patch, which fixes SWDEV-523856

This is to upstream gfx950 change: 9dd15535ec. In f64 -> bf16 conversion, the following piece of code does double roundings, and will generate incorrect result in some data range: v_cvt_f32_f64_e32 v0, v[0:1] v_cvt_pk_bf16_f32 v0, v0, s0 We need to add round-to-odd rounding during f64 -> f32. Here is the description from commit 9dd15535ec to gfx950: In f64 to bf16 conversion, we do f64 -> f32, followed by f32 -> bf16. We should do round-to-odd rounding in f64 -> f32, and then nearest-even rounding in f32 -> bf16. Refer to llvm#82399 for details. On hardwares with f32 -> bf16 instructions, round-to-odd was missed in rounding in f64 -> f32. This patches adds this rounding support by custom lowering of FP_ROUND. NOTE: we are having the same issue with f64 -> f16 conversion. Will add round-to-odd rounding for it in a separate patch, which fixes SWDEV-523856

github-actions · 2025-04-01T22:04:13Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff HEAD~1 HEAD --extensions cpp -- llvm/lib/Target/AMDGPU/SIISelLowering.cpp

View the diff from clang-format here.

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 21f8c7cfe..e8d6b76a4 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -6910,8 +6910,8 @@ SDValue SITargetLowering::lowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const {
 
   // Round-inexact-to-odd f64 to f32, then do the final rounding using the
   // hardware f32 -> bf16 instruction.
-  EVT F32VT = SrcVT.isVector() ? SrcVT.changeVectorElementType(MVT::f32) :
-                                 MVT::f32;
+  EVT F32VT =
+      SrcVT.isVector() ? SrcVT.changeVectorElementType(MVT::f32) : MVT::f32;
   SDValue Rod = expandRoundInexactToOdd(F32VT, Src, DL, DAG);
   return DAG.getNode(ISD::FP_ROUND, DL, DstVT, Rod,
                      DAG.getTargetConstant(0, DL, MVT::i32));

arsenm

Can you clean up the description to refer to just the changes here, without referring to the out of tree commit context

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

arsenm · 2025-04-02T04:31:14Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+  // hardware f32 -> bf16 instruction.
+  EVT F32VT = SrcVT.isVector() ? SrcVT.changeVectorElementType(MVT::f32) :
+                                 MVT::f32;
+  SDValue Rod = expandRoundInexactToOdd(F32VT, Src, DL, DAG);


Do we need to custom lower here at all? This seems like the default action. Can you just return the original node?

No. If we return the original node, it will generate the two conversions and have the double rounding issue.
Also, if we setOperationAction to "Expand", it will do the Round-inexact-to-odd, but won't generate the desired v_cvt_pk_bf16_f32 instruction for f32->bf16.

Then the default expansion is buggy? We shouldn't have unique bf16 legalization needs

The default expansion for bf16 is not optimal at least. I can take a look in a following patch.

changpeng · 2025-04-02T04:58:06Z

Can you clean up the description to refer to just the changes here, without referring to the out of tree commit context

Okay. Thanks.

Co-authored-by: Matt Arsenault <[email protected]>

llvmbot · 2025-04-02T05:03:53Z

@llvm/pr-subscribers-backend-amdgpu

Author: Changpeng Fang (changpeng)

Changes

f64 -> bf16 conversion can be lowered to f64 -> f32 followed by f32 -> bf16:
v_cvt_f32_f64_e32 v0, v[0:1]
v_cvt_pk_bf16_f32 v0, v0, s0
Both conversion instructions will do round-to-even rounding, and thus we will have double rounding issue which may generate incorrect result in some data range. We need to add round-to-odd rounding during f64 -> f32 to avoid double rounding,.

NOTE: we are having the same issue with f64 -> f16 conversion. Will add round-to-odd rounding for it in a separate patch, which fixes SWDEV-523856

Full diff: https://github.com/llvm/llvm-project/pull/133995.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+22-13)
(modified) llvm/test/CodeGen/AMDGPU/bf16-conversions.ll (+66-6)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 96c113cc5d24c..e9aef744abcd9 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -911,8 +911,7 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM,
   setOperationAction(ISD::MUL, MVT::i1, Promote);
 
   if (Subtarget->hasBF16ConversionInsts()) {
-    setOperationAction(ISD::FP_ROUND, MVT::v2bf16, Legal);
-    setOperationAction(ISD::FP_ROUND, MVT::bf16, Legal);
+    setOperationAction(ISD::FP_ROUND, {MVT::bf16, MVT::v2bf16}, Custom);
     setOperationAction(ISD::BUILD_VECTOR, MVT::v2bf16, Legal);
   }
 
@@ -6888,23 +6887,33 @@ SDValue SITargetLowering::getFPExtOrFPRound(SelectionDAG &DAG, SDValue Op,
 }
 
 SDValue SITargetLowering::lowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const {
-  assert(Op.getValueType() == MVT::f16 &&
-         "Do not know how to custom lower FP_ROUND for non-f16 type");
-
   SDValue Src = Op.getOperand(0);
   EVT SrcVT = Src.getValueType();
-  if (SrcVT != MVT::f64)
-    return Op;
-
-  // TODO: Handle strictfp
-  if (Op.getOpcode() != ISD::FP_ROUND)
+  if (SrcVT.getScalarType() != MVT::f64)
     return Op;
 
+  EVT DstVT = Op.getValueType();
   SDLoc DL(Op);
+  if (DstVT == MVT::f16) {
+    // TODO: Handle strictfp
+    if (Op.getOpcode() != ISD::FP_ROUND)
+      return Op;
+
+    SDValue FpToFp16 = DAG.getNode(ISD::FP_TO_FP16, DL, MVT::i32, Src);
+    SDValue Trunc = DAG.getNode(ISD::TRUNCATE, DL, MVT::i16, FpToFp16);
+    return DAG.getNode(ISD::BITCAST, DL, MVT::f16, Trunc);
+  }
+
+  assert(DstVT.getScalarType() == MVT::bf16 &&
+          "custom lower FP_ROUND for f16 or bf16");
+  assert(Subtarget->hasBF16ConversionInsts() && "f32 -> bf16 is legal");
 
-  SDValue FpToFp16 = DAG.getNode(ISD::FP_TO_FP16, DL, MVT::i32, Src);
-  SDValue Trunc = DAG.getNode(ISD::TRUNCATE, DL, MVT::i16, FpToFp16);
-  return DAG.getNode(ISD::BITCAST, DL, MVT::f16, Trunc);
+  // Round-inexact-to-odd f64 to f32, then do the final rounding using the
+  // hardware f32 -> bf16 instruction.
+  EVT F32VT = SrcVT.isVector() ? SrcVT.changeVectorElementType(MVT::f32) :
+                                 MVT::f32;
+  SDValue Rod = expandRoundInexactToOdd(F32VT, Src, DL, DAG);
+  return getFPExtOrFPRound(DAG, Rod, DL, DstVT);
 }
 
 SDValue SITargetLowering::lowerFMINNUM_FMAXNUM(SDValue Op,
diff --git a/llvm/test/CodeGen/AMDGPU/bf16-conversions.ll b/llvm/test/CodeGen/AMDGPU/bf16-conversions.ll
index 4c01e583713a7..3be911ab9e7f4 100644
--- a/llvm/test/CodeGen/AMDGPU/bf16-conversions.ll
+++ b/llvm/test/CodeGen/AMDGPU/bf16-conversions.ll
@@ -153,9 +153,34 @@ define amdgpu_ps float @v_test_cvt_v2f64_v2bf16_v(<2 x double> %src) {
 ;
 ; GFX-950-LABEL: v_test_cvt_v2f64_v2bf16_v:
 ; GFX-950:       ; %bb.0:
-; GFX-950-NEXT:    v_cvt_f32_f64_e32 v2, v[2:3]
-; GFX-950-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
-; GFX-950-NEXT:    v_cvt_pk_bf16_f32 v0, v0, v2
+; GFX-950-NEXT:    v_mov_b32_e32 v4, v3
+; GFX-950-NEXT:    v_and_b32_e32 v3, 0x7fffffff, v4
+; GFX-950-NEXT:    v_mov_b32_e32 v5, v1
+; GFX-950-NEXT:    v_cvt_f32_f64_e32 v1, v[2:3]
+; GFX-950-NEXT:    v_cvt_f64_f32_e32 v[6:7], v1
+; GFX-950-NEXT:    v_and_b32_e32 v8, 1, v1
+; GFX-950-NEXT:    v_cmp_gt_f64_e64 s[2:3], v[2:3], v[6:7]
+; GFX-950-NEXT:    v_cmp_nlg_f64_e32 vcc, v[2:3], v[6:7]
+; GFX-950-NEXT:    v_cmp_eq_u32_e64 s[0:1], 1, v8
+; GFX-950-NEXT:    v_cndmask_b32_e64 v2, -1, 1, s[2:3]
+; GFX-950-NEXT:    v_add_u32_e32 v2, v1, v2
+; GFX-950-NEXT:    s_or_b64 vcc, vcc, s[0:1]
+; GFX-950-NEXT:    v_cndmask_b32_e32 v1, v2, v1, vcc
+; GFX-950-NEXT:    s_brev_b32 s4, 1
+; GFX-950-NEXT:    v_and_or_b32 v4, v4, s4, v1
+; GFX-950-NEXT:    v_and_b32_e32 v1, 0x7fffffff, v5
+; GFX-950-NEXT:    v_cvt_f32_f64_e32 v6, v[0:1]
+; GFX-950-NEXT:    v_cvt_f64_f32_e32 v[2:3], v6
+; GFX-950-NEXT:    v_and_b32_e32 v7, 1, v6
+; GFX-950-NEXT:    v_cmp_gt_f64_e64 s[2:3], v[0:1], v[2:3]
+; GFX-950-NEXT:    v_cmp_nlg_f64_e32 vcc, v[0:1], v[2:3]
+; GFX-950-NEXT:    v_cmp_eq_u32_e64 s[0:1], 1, v7
+; GFX-950-NEXT:    v_cndmask_b32_e64 v0, -1, 1, s[2:3]
+; GFX-950-NEXT:    v_add_u32_e32 v0, v6, v0
+; GFX-950-NEXT:    s_or_b64 vcc, vcc, s[0:1]
+; GFX-950-NEXT:    v_cndmask_b32_e32 v0, v0, v6, vcc
+; GFX-950-NEXT:    v_and_or_b32 v0, v5, s4, v0
+; GFX-950-NEXT:    v_cvt_pk_bf16_f32 v0, v0, v4
 ; GFX-950-NEXT:    ; return to shader part epilog
   %res = fptrunc <2 x double> %src to <2 x bfloat>
   %cast = bitcast <2 x bfloat> %res to float
@@ -347,7 +372,18 @@ define amdgpu_ps void @fptrunc_f64_to_bf16(double %a, ptr %out) {
 ;
 ; GFX-950-LABEL: fptrunc_f64_to_bf16:
 ; GFX-950:       ; %bb.0: ; %entry
-; GFX-950-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX-950-NEXT:    v_cvt_f32_f64_e64 v6, |v[0:1]|
+; GFX-950-NEXT:    v_cvt_f64_f32_e32 v[4:5], v6
+; GFX-950-NEXT:    v_and_b32_e32 v7, 1, v6
+; GFX-950-NEXT:    v_cmp_gt_f64_e64 s[2:3], |v[0:1]|, v[4:5]
+; GFX-950-NEXT:    v_cmp_nlg_f64_e64 s[0:1], |v[0:1]|, v[4:5]
+; GFX-950-NEXT:    v_cmp_eq_u32_e32 vcc, 1, v7
+; GFX-950-NEXT:    v_cndmask_b32_e64 v0, -1, 1, s[2:3]
+; GFX-950-NEXT:    v_add_u32_e32 v0, v6, v0
+; GFX-950-NEXT:    s_or_b64 vcc, s[0:1], vcc
+; GFX-950-NEXT:    v_cndmask_b32_e32 v0, v0, v6, vcc
+; GFX-950-NEXT:    s_brev_b32 s0, 1
+; GFX-950-NEXT:    v_and_or_b32 v0, v1, s0, v0
 ; GFX-950-NEXT:    v_cvt_pk_bf16_f32 v0, v0, s0
 ; GFX-950-NEXT:    flat_store_short v[2:3], v0
 ; GFX-950-NEXT:    s_endpgm
@@ -385,7 +421,19 @@ define amdgpu_ps void @fptrunc_f64_to_bf16_neg(double %a, ptr %out) {
 ;
 ; GFX-950-LABEL: fptrunc_f64_to_bf16_neg:
 ; GFX-950:       ; %bb.0: ; %entry
-; GFX-950-NEXT:    v_cvt_f32_f64_e64 v0, -v[0:1]
+; GFX-950-NEXT:    v_cvt_f32_f64_e64 v7, |v[0:1]|
+; GFX-950-NEXT:    v_cvt_f64_f32_e32 v[4:5], v7
+; GFX-950-NEXT:    v_and_b32_e32 v8, 1, v7
+; GFX-950-NEXT:    v_cmp_gt_f64_e64 s[2:3], |v[0:1]|, v[4:5]
+; GFX-950-NEXT:    v_cmp_nlg_f64_e64 s[0:1], |v[0:1]|, v[4:5]
+; GFX-950-NEXT:    v_cmp_eq_u32_e32 vcc, 1, v8
+; GFX-950-NEXT:    v_cndmask_b32_e64 v0, -1, 1, s[2:3]
+; GFX-950-NEXT:    v_add_u32_e32 v0, v7, v0
+; GFX-950-NEXT:    s_or_b64 vcc, s[0:1], vcc
+; GFX-950-NEXT:    s_brev_b32 s4, 1
+; GFX-950-NEXT:    v_xor_b32_e32 v6, 0x80000000, v1
+; GFX-950-NEXT:    v_cndmask_b32_e32 v0, v0, v7, vcc
+; GFX-950-NEXT:    v_and_or_b32 v0, v6, s4, v0
 ; GFX-950-NEXT:    v_cvt_pk_bf16_f32 v0, v0, s0
 ; GFX-950-NEXT:    flat_store_short v[2:3], v0
 ; GFX-950-NEXT:    s_endpgm
@@ -424,7 +472,19 @@ define amdgpu_ps void @fptrunc_f64_to_bf16_abs(double %a, ptr %out) {
 ;
 ; GFX-950-LABEL: fptrunc_f64_to_bf16_abs:
 ; GFX-950:       ; %bb.0: ; %entry
-; GFX-950-NEXT:    v_cvt_f32_f64_e64 v0, |v[0:1]|
+; GFX-950-NEXT:    v_cvt_f32_f64_e64 v7, |v[0:1]|
+; GFX-950-NEXT:    v_cvt_f64_f32_e32 v[4:5], v7
+; GFX-950-NEXT:    v_and_b32_e32 v8, 1, v7
+; GFX-950-NEXT:    v_cmp_gt_f64_e64 s[2:3], |v[0:1]|, v[4:5]
+; GFX-950-NEXT:    v_cmp_nlg_f64_e64 s[0:1], |v[0:1]|, v[4:5]
+; GFX-950-NEXT:    v_cmp_eq_u32_e32 vcc, 1, v8
+; GFX-950-NEXT:    v_cndmask_b32_e64 v0, -1, 1, s[2:3]
+; GFX-950-NEXT:    v_add_u32_e32 v0, v7, v0
+; GFX-950-NEXT:    s_or_b64 vcc, s[0:1], vcc
+; GFX-950-NEXT:    v_and_b32_e32 v6, 0x7fffffff, v1
+; GFX-950-NEXT:    v_cndmask_b32_e32 v0, v0, v7, vcc
+; GFX-950-NEXT:    s_brev_b32 s0, 1
+; GFX-950-NEXT:    v_and_or_b32 v0, v6, s0, v0
 ; GFX-950-NEXT:    v_cvt_pk_bf16_f32 v0, v0, s0
 ; GFX-950-NEXT:    flat_store_short v[2:3], v0
 ; GFX-950-NEXT:    s_endpgm

jayfoad · 2025-04-02T09:42:13Z

Should also delete the incorrect isel patterns in VOP3Instructions.td?

  def : GCNPat<(v2bf16 (bf16_fpround v2f64:$src)),
               (V_CVT_PK_BF16_F32_e64 0, (V_CVT_F32_F64_e64 0, (EXTRACT_SUBREG VReg_128:$src, sub0_sub1)),
                                      0, (V_CVT_F32_F64_e64 0, (EXTRACT_SUBREG VReg_128:$src, sub2_sub3)))>;
...
  def : GCNPat<(bf16 (bf16_fpround (f64 (VOP3Mods f64:$src0, i32:$src0_modifiers)))),
               (V_CVT_PK_BF16_F32_e64 0, (f32 (V_CVT_F32_F64_e64 $src0_modifiers, $src0)), 0, (f32 (IMPLICIT_DEF)))>;

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

changpeng · 2025-04-02T17:54:49Z

GCNPat<(v2bf16 (bf16_fpround v2f64:$src))

Right, we should remove these incorrect patterns. The work is done

Remove incorrect f64->f16 patterns.

changpeng · 2025-04-03T15:42:39Z

Ping. Is there any further suggestions or comment?
This is the patch from the downstream branches, and the logic is needed to fix the double rounding issue.
Thanks.

arsenm

LGTM but we shouldn't really need anything custom here

llvm-ci · 2025-04-05T04:24:59Z

LLVM Buildbot has detected a new failure on builder lldb-x86_64-debian running on lldb-x86_64-debian while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/162/builds/19592

Here is the relevant piece of the build log for the reference

Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: python_api/formatters/TestFormattersSBAPI.py (308 of 2823)
PASS: lldb-api :: types/TestFloatTypesExpr.py (309 of 2823)
PASS: lldb-api :: lang/cpp/unique-types3/TestUniqueTypes3.py (310 of 2823)
PASS: lldb-api :: functionalities/vtable/TestVTableValue.py (311 of 2823)
PASS: lldb-api :: tools/lldb-dap/breakpoint/TestDAP_setExceptionBreakpoints.py (312 of 2823)
PASS: lldb-api :: functionalities/thread/thread_exit/TestThreadExit.py (313 of 2823)
PASS: lldb-shell :: Minidump/fb-dump.test (314 of 2823)
PASS: lldb-api :: lang/cpp/preferred_name/TestPreferredName.py (315 of 2823)
PASS: lldb-api :: commands/expression/call-function/TestCallStopAndContinue.py (316 of 2823)
UNRESOLVED: lldb-api :: driver/batch_mode/TestBatchMode.py (317 of 2823)
******************** TEST 'lldb-api :: driver/batch_mode/TestBatchMode.py' FAILED ********************
Script:
--
/usr/bin/python3 /home/worker/2.0.1/lldb-x86_64-debian/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/worker/2.0.1/lldb-x86_64-debian/build/./lib --env LLVM_INCLUDE_DIR=/home/worker/2.0.1/lldb-x86_64-debian/build/include --env LLVM_TOOLS_DIR=/home/worker/2.0.1/lldb-x86_64-debian/build/./bin --arch x86_64 --build-dir /home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex --lldb-module-cache-dir /home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/worker/2.0.1/lldb-x86_64-debian/build/./bin/lldb --compiler /home/worker/2.0.1/lldb-x86_64-debian/build/./bin/clang --dsymutil /home/worker/2.0.1/lldb-x86_64-debian/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/worker/2.0.1/lldb-x86_64-debian/build/./bin --lldb-obj-root /home/worker/2.0.1/lldb-x86_64-debian/build/tools/lldb --lldb-libs-dir /home/worker/2.0.1/lldb-x86_64-debian/build/./lib -t /home/worker/2.0.1/lldb-x86_64-debian/llvm-project/lldb/test/API/driver/batch_mode -p TestBatchMode.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision f47034cbe5c02b748742c733cf453b3b907687e5)
  clang revision f47034cbe5c02b748742c733cf453b3b907687e5
  llvm revision f47034cbe5c02b748742c733cf453b3b907687e5
�7�[0;23r�8�[1A
(lldb) settings clear -all
�7
�[7m                                                                                �[0m�8(lldb) settings set symbols.enable-external-lookup false
(lldb) settings set target.inherit-tcc true
(lldb) settings set target.disable-aslr false
(lldb) settings set target.detach-on-error false
(lldb) settings set target.auto-apply-fixits false
(lldb) settings set plugin.process.gdb-remote.packet-timeout 60
(lldb) settings set symbols.clang-modules-cache-path "/home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/module-cache-lldb/lldb-api"
(lldb) settings set use-color false
(lldb) settings set show-statusline false
�7�[0;24r�8�[J(lldb) target create "/home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/driver/batch_mode/TestBatchMode.test_batch_mode_attach_exit/a.out"
Current executable set to '/home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/driver/batch_mode/TestBatchMode.test_batch_mode_attach_exit/a.out' (x86_64).
(lldb) process attach -p 1733134
Process 1733134 stopped
* thread #1, name = 'a.out', stop reason = signal SIGSTOP
    frame #0: 0x00007efc3faee443 libc.so.6`clock_nanosleep + 35
libc.so.6`clock_nanosleep:
->  0x7efc3faee443 <+35>: negl   %eax
    0x7efc3faee445 <+37>: retq   
    0x7efc3faee446 <+38>: nopw   %cs:(%rax,%rax)
    0x7efc3faee450 <+48>: subq   $0x28, %rsp
(lldb) breakpoint set --file 'main.c' -p 'Stop here to unset keep_waiting' -N keep_waiting
Breakpoint 1: where = a.out`main + 221 at main.c:29:13, address = 0x0000556cbc76925d
(lldb) continue
Process 1733134 resuming

changpeng · 2025-04-05T05:09:24Z

LGTM but we shouldn't really need anything custom here

Thanks. It seems we have a lot of work to do with f64->f16 (Custom) and f64->bf16 (Expand). Just that we do not actually tahe advantage of the existing f32->f16 (or f32->bf16 for gfx950+) hardware instructions, and generate much longer sequence of code (without unsafe fast math flag set).

changpeng · 2025-04-05T05:11:23Z

GCNPat<(v2bf16 (bf16_fpround v2f64:$src))

Right, we should remove these incorrect patterns. The work is done

Just realized that we should still need these patterns for unsafe fast math. Will take further look and may put them back. Thanks.

changpeng requested review from rampitec, b-sumner, scchan and arsenm April 1, 2025 22:02

arsenm reviewed Apr 2, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIISelLowering.cpp Outdated Show resolved Hide resolved

arsenm reviewed Apr 2, 2025

View reviewed changes

Update llvm/lib/Target/AMDGPU/SIISelLowering.cpp

56edadf

Co-authored-by: Matt Arsenault <[email protected]>

llvmbot added the backend:AMDGPU label Apr 2, 2025

jayfoad reviewed Apr 2, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIISelLowering.cpp Outdated Show resolved Hide resolved

changpeng added 2 commits April 2, 2025 11:04

Merge branch 'llvm:main' into main

b9bdfea

AMDGPU: Add round-to-odd rounding during f64 to bf16 conversion

c74dd30

Remove incorrect f64->f16 patterns.

arsenm approved these changes Apr 5, 2025

View reviewed changes

changpeng merged commit f47034c into llvm:main Apr 5, 2025
10 of 11 checks passed

AMDGPU: Add round-to-odd rounding during f64 to bf16 conversion #133995

AMDGPU: Add round-to-odd rounding during f64 to bf16 conversion #133995

Uh oh!

Conversation

changpeng commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arsenm Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

changpeng Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

changpeng Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

changpeng commented Apr 2, 2025

Uh oh!

llvmbot commented Apr 2, 2025

Uh oh!

jayfoad commented Apr 2, 2025

Uh oh!

Uh oh!

changpeng commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changpeng commented Apr 3, 2025

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Apr 5, 2025

Uh oh!

changpeng commented Apr 5, 2025

Uh oh!

changpeng commented Apr 5, 2025

Uh oh!

Uh oh!

changpeng commented Apr 1, 2025 •

edited

Loading

github-actions bot commented Apr 1, 2025 •

edited

Loading

changpeng commented Apr 2, 2025 •

edited

Loading