-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] Fix FMA Combine #119217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Fix FMA Combine #119217
Conversation
Update the check in the FMA combine to check dot10-insts instead of dot7-insts. The target of the combine, v_dot2_f32_f16, is available only if dot10-insts target feature is enabled. The issue probably dates back to the change that split out dot10-insts out of dot7-insts. As far as I can see, this does not affect any current targets, but if a future target has dot7-insts, but not dot10-insts that would cause a crash ("cannot select") for the input ir in the test.
@llvm/pr-subscribers-backend-amdgpu Author: Piotr Sobczak (piotrAMD) ChangesUpdate the check in the FMA combine to check dot10-insts instead of dot7-insts. The target of the combine, v_dot2_f32_f16, is available only if dot10-insts target feature is enabled. The issue probably dates back to the change that split out dot10-insts out of dot7-insts. As far as I can see, this does not affect any current targets, but if a future target has dot7-insts, but not dot10-insts that would cause a crash ("cannot select") for the input ir in the test. Full diff: https://github.com/llvm/llvm-project/pull/119217.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index f89fe8faa600ba..8dfebd36a962e1 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -14696,7 +14696,7 @@ SDValue SITargetLowering::performFMACombine(SDNode *N,
EVT VT = N->getValueType(0);
SDLoc SL(N);
- if (!Subtarget->hasDot7Insts() || VT != MVT::f32)
+ if (!Subtarget->hasDot10Insts() || VT != MVT::f32)
return SDValue();
// FMA((F32)S0.x, (F32)S1. x, FMA((F32)S0.y, (F32)S1.y, (F32)z)) ->
diff --git a/llvm/test/CodeGen/AMDGPU/fma_combine.ll b/llvm/test/CodeGen/AMDGPU/fma_combine.ll
new file mode 100644
index 00000000000000..0414018862d071
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fma_combine.ll
@@ -0,0 +1,70 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr="+dot7-insts,-dot10-insts,-dot5-insts" < %s | FileCheck %s -check-prefix=DISABLED
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr="+dot7-insts,+dot10-insts,-dot5-insts" < %s | FileCheck %s -check-prefix=ENABLED
+
+; Test that FMACombine is disabled for a target without dot10-insts, and enabled for a target with dot10-insts.
+
+define amdgpu_kernel void @func() {
+; DISABLED-LABEL: func:
+; DISABLED: ; %bb.0: ; %bb
+; DISABLED-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0
+; DISABLED-NEXT: v_mov_b32_e32 v2, 0
+; DISABLED-NEXT: s_mov_b32 vcc_lo, exec_lo
+; DISABLED-NEXT: .LBB0_1: ; %main
+; DISABLED-NEXT: ; =>This Inner Loop Header: Depth=1
+; DISABLED-NEXT: ds_load_b128 v[3:6], v1
+; DISABLED-NEXT: ds_store_b32 v1, v0
+; DISABLED-NEXT: s_waitcnt lgkmcnt(1)
+; DISABLED-NEXT: v_fma_mix_f32 v2, v4, v3, v2 op_sel_hi:[1,1,0]
+; DISABLED-NEXT: s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; DISABLED-NEXT: v_fma_mix_f32 v2, v4, v3, v2 op_sel:[1,1,0] op_sel_hi:[1,1,0]
+; DISABLED-NEXT: v_add_f32_e32 v2, 0, v2
+; DISABLED-NEXT: s_cbranch_vccnz .LBB0_1
+; DISABLED-NEXT: ; %bb.2: ; %DummyReturnBlock
+; DISABLED-NEXT: s_endpgm
+;
+; ENABLED-LABEL: func:
+; ENABLED: ; %bb.0: ; %bb
+; ENABLED-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0
+; ENABLED-NEXT: v_mov_b32_e32 v2, 0
+; ENABLED-NEXT: s_mov_b32 vcc_lo, exec_lo
+; ENABLED-NEXT: .LBB0_1: ; %main
+; ENABLED-NEXT: ; =>This Inner Loop Header: Depth=1
+; ENABLED-NEXT: ds_load_b128 v[3:6], v1
+; ENABLED-NEXT: ds_store_b32 v1, v0
+; ENABLED-NEXT: s_waitcnt lgkmcnt(1)
+; ENABLED-NEXT: v_dot2_f32_f16 v2, v4, v3, v2
+; ENABLED-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; ENABLED-NEXT: v_add_f32_e32 v2, 0, v2
+; ENABLED-NEXT: s_cbranch_vccnz .LBB0_1
+; ENABLED-NEXT: ; %bb.2: ; %DummyReturnBlock
+; ENABLED-NEXT: s_endpgm
+bb:
+ br label %main
+
+main: ; preds = %main, %bb
+ %.sroa.3 = phi float [ 0.000000e+00, %bb ], [ %i17, %main ]
+ %.sroa.4 = phi float [ 0.000000e+00, %bb ], [ %.sroa.3, %main ]
+ %i = load <8 x half>, ptr addrspace(3) null, align 16
+ %i1 = bitcast <8 x half> %i to i128
+ %i2 = trunc i128 %i1 to i32
+ %i3 = bitcast i32 %i2 to <2 x half>
+ %.sroa.1 = extractelement <2 x half> %i3, i64 0
+ %.sroa.2 = extractelement <2 x half> %i3, i64 1
+ %i4 = shufflevector <8 x half> %i, <8 x half> zeroinitializer, <2 x i32> <i32 2, i32 3>
+ %i5 = fpext <2 x half> %i4 to <2 x float>
+ %i6 = fpext half %.sroa.1 to float
+ %i7 = fpext half %.sroa.2 to float
+ %i8 = extractelement <2 x float> %i5, i64 0
+ %i9 = fmul contract float %i8, %i6
+ %i10 = fadd contract float %.sroa.3, %i9
+ %i11 = extractelement <2 x float> %i5, i64 1
+ %i12 = fmul contract float %i11, %i7
+ %i13 = fadd contract float %i12, %i10
+ %i14 = insertelement <2 x float> zeroinitializer, float %i13, i64 0
+ %i15 = insertelement <2 x float> %i14, float %.sroa.4, i64 1
+ %i16 = fadd <2 x float> %i15, zeroinitializer
+ store <2 x half> zeroinitializer, ptr addrspace(3) null, align 4
+ %i17 = extractelement <2 x float> %i16, i64 0
+ br label %main
+}
|
I haven't actually checked that, but I think this may be related to https://reviews.llvm.org/D142507 that split out dot10-insts out of dot7-insts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code change looks good.
@@ -14696,7 +14696,7 @@ SDValue SITargetLowering::performFMACombine(SDNode *N, | |||
EVT VT = N->getValueType(0); | |||
SDLoc SL(N); | |||
|
|||
if (!Subtarget->hasDot7Insts() || VT != MVT::f32) | |||
if (!Subtarget->hasDot10Insts() || VT != MVT::f32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have no idea what these numbers mean
Update the check in the FMA combine to check dot10-insts instead of dot7-insts.
The target of the combine, v_dot2_f32_f16, is available only if dot10-insts target feature is enabled.
The issue probably dates back to the change that split out dot10-insts out of dot7-insts.
As far as I can see, this does not affect any current targets, but if a future target has dot7-insts, but not dot10-insts that would cause a crash ("cannot select") for the input ir in the test.