Skip to content

AMDGPU: Make v2f32 -> v2f16 legal when target supports v_cvt_pk_f16_f32 #139956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion llvm/lib/Target/AMDGPU/SIISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -919,6 +919,9 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM,
setOperationAction(ISD::BUILD_VECTOR, MVT::v2bf16, Legal);
}

if (Subtarget->hasCvtPkF16F32Inst())
setOperationAction(ISD::FP_ROUND, MVT::v2f16, Custom);

setTargetDAGCombine({ISD::ADD,
ISD::UADDO_CARRY,
ISD::SUB,
Expand Down Expand Up @@ -6899,10 +6902,16 @@ SDValue SITargetLowering::getFPExtOrFPRound(SelectionDAG &DAG, SDValue Op,
SDValue SITargetLowering::lowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const {
SDValue Src = Op.getOperand(0);
EVT SrcVT = Src.getValueType();
EVT DstVT = Op.getValueType();

if (DstVT == MVT::v2f16) {
assert(Subtarget->hasCvtPkF16F32Inst() && "support v_cvt_pk_f16_f32");
Comment on lines +6907 to +6908
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same should apply to all vector types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same should apply to all vector types.

I don't think so. You can not make v3f32 and v4f32 legal because there is no corresponding instructions. Instead, we have to "Expand" (split) for wider vector types. It is another subject how to split the wider vector to take advantage of the packed conversion.

return SrcVT == MVT::v2f32 ? Op : SDValue();
}

if (SrcVT.getScalarType() != MVT::f64)
return Op;

EVT DstVT = Op.getValueType();
SDLoc DL(Op);
if (DstVT == MVT::f16) {
// TODO: Handle strictfp
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,20 @@ define <2 x half> @v_test_cvt_v2f32_v2f16(<2 x float> %src) {
ret <2 x half> %res
}

define half @fptrunc_v2f32_v2f16_then_extract(<2 x float> %src) {
; GFX950-LABEL: fptrunc_v2f32_v2f16_then_extract:
; GFX950: ; %bb.0:
; GFX950-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX950-NEXT: v_cvt_pk_f16_f32 v0, v0, v1
; GFX950-NEXT: v_add_f16_sdwa v0, v0, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
; GFX950-NEXT: s_setpc_b64 s[30:31]
%vec_half = fptrunc <2 x float> %src to <2 x half>
%first = extractelement <2 x half> %vec_half, i64 1
%second = extractelement <2 x half> %vec_half, i64 0
%res = fadd half %first, %second
ret half %res
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we really missing test coverage for this and wider vector types? Can you add that? In particular we need to make sure we get 2 x pk instructions in the 4 x case, and 1 pk and one scalar op in the 3 x case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This change only takes care of v2f16. It is beyond this work to test wider vectors.
Additional work is needed for wider vector splitting (in order to generate the packed conversions)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not beyond this to test wider vectors. Please add the missing tests in a follow up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fadd use here should also be unnecessary

Copy link
Contributor Author

@changpeng changpeng May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not beyond this to test wider vectors. Please add the missing tests in a follow up

Do you want to simply add the wider vector tests, or also add the logic to split the wider vector wisely to form the packed conversion?

Copy link
Contributor Author

@changpeng changpeng May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fadd use here should also be unnecessary

Then what to do after extracts? I think it is the extracts that prevent instcombining. We need two uses of the extracts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the extract use is interesting, should be another test in addition to the base one with a direct return or store of the vector value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the extract use is interesting, should be another test in addition to the base one with a direct return or store of the vector value

The base test was already there in the same file. In the base test, the expected v_cvt_pk_f16_f32 will always be generated. It is the extract use case that inspired this patch to make v2f32->v2f16 lega.

define <2 x half> @v_test_cvt_v2f64_v2f16(<2 x double> %src) {
; GFX950-SDAG-LABEL: v_test_cvt_v2f64_v2f16:
; GFX950-SDAG: ; %bb.0:
Expand Down