Skip to content

AMDGPU: Make v2f64 -> v2f16 conversion Legal only when unsafe fast math is set #134738

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 29 additions & 4 deletions llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1053,10 +1053,10 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_,
}

auto &FPTruncActions = getActionDefinitionsBuilder(G_FPTRUNC);
if (ST.hasCvtPkF16F32Inst())
FPTruncActions.legalFor(
{{S32, S64}, {S16, S32}, {V2S16, V2S32}, {V2S16, V2S64}});
else
if (ST.hasCvtPkF16F32Inst()) {
FPTruncActions.legalFor({{S32, S64}, {S16, S32}, {V2S16, V2S32}})
.customFor({V2S16, V2S64});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's suspect that we need custom handling for the v2 case, but not the regular scalar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's suspect that we need custom handling for the v2 case, but not the regular scalar

Previously only v2f16 was made Legal in order to generate v_cvt_pk_f16_f32, I think.

} else
FPTruncActions.legalFor({{S32, S64}, {S16, S32}});
FPTruncActions.scalarize(0).lower();

Expand Down Expand Up @@ -2155,6 +2155,8 @@ bool AMDGPULegalizerInfo::legalizeCustom(
case TargetOpcode::G_FMINNUM_IEEE:
case TargetOpcode::G_FMAXNUM_IEEE:
return legalizeMinNumMaxNum(Helper, MI);
case TargetOpcode::G_FPTRUNC:
return legalizeFPTrunc(Helper, MI, MRI);
case TargetOpcode::G_EXTRACT_VECTOR_ELT:
return legalizeExtractVectorElt(MI, MRI, B);
case TargetOpcode::G_INSERT_VECTOR_ELT:
Expand Down Expand Up @@ -2741,6 +2743,29 @@ bool AMDGPULegalizerInfo::legalizeMinNumMaxNum(LegalizerHelper &Helper,
return Helper.lowerFMinNumMaxNum(MI) == LegalizerHelper::Legalized;
}

bool AMDGPULegalizerInfo::legalizeFPTrunc(LegalizerHelper &Helper,
MachineInstr &MI,
MachineRegisterInfo &MRI) const {
// TODO: We should only use fast math flag. But the global option is
// still used here to be consistent, especially when the fast math flag is
// not working for FP_ROUND on the SelectDAG path at this moment.
MachineFunction &MF = Helper.MIRBuilder.getMF();
bool AllowInaccurateFPTRUNC = MI.getFlag(MachineInstr::FmAfn) ||
MF.getTarget().Options.UnsafeFPMath;

if (AllowInaccurateFPTRUNC) {
// Use the tablegen pattern to select native instructions.
return true;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handling logic doesn't make sense. The decision to scalarize or not has nothing to do with the FP properties? The tablegen pattern also handles the build_vector case, which is the MIR I would expect to select from. Can we just drop the direct vector handling, and always scalarize?

The tablegen patterns should also be guarded with the fast checks if we're going to have them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handling logic doesn't make sense. The decision to scalarize or not has nothing to do with the FP properties? The tablegen pattern also handles the build_vector case, which is the MIR I would expect to select from. Can we just drop the direct vector handling, and always scalarize?

The tablegen patterns should also be guarded with the fast checks if we're going to have them.

The logic to generate instructions based on the pattern, but this is only correct when the unsafe fpmath is set.
Yes, we may always scalarize. Should we do instcombine to generate v_cvt_pk_f16_f32?

Register DstReg = MI.getOperand(0).getReg();
LLT DstTy = MRI.getType(DstReg);

// Scalarize the vector and fall through to lower f64 -> f16.
return Helper.fewerElementsVector(MI, 0, DstTy.getElementType()) ==
LegalizerHelper::Legalized;
Comment on lines +2749 to +2766
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should be a combine and not a modification of legalization rules

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should be a combine and not a modification of legalization rules

I am looking at instcombine to generate v_cvt_pk_f16_f32 instructions. Should we implement in AMDGPU because it is target dependent? Thanks.

}

bool AMDGPULegalizerInfo::legalizeExtractVectorElt(
MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &B) const {
Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ class AMDGPULegalizerInfo final : public LegalizerInfo {
bool legalizeFPTOI(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &B, bool Signed) const;
bool legalizeMinNumMaxNum(LegalizerHelper &Helper, MachineInstr &MI) const;
bool legalizeFPTrunc(LegalizerHelper &Helper, MachineInstr &MI,
MachineRegisterInfo &MRI) const;
bool legalizeExtractVectorElt(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &B) const;
bool legalizeInsertVectorElt(MachineInstr &MI, MachineRegisterInfo &MRI,
Expand Down
15 changes: 14 additions & 1 deletion llvm/lib/Target/AMDGPU/SIISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -916,7 +916,7 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM,
}

if (Subtarget->hasCvtPkF16F32Inst()) {
setOperationAction(ISD::FP_ROUND, MVT::v2f16, Legal);
setOperationAction(ISD::FP_ROUND, MVT::v2f16, Custom);
}

setTargetDAGCombine({ISD::ADD,
Expand Down Expand Up @@ -6893,6 +6893,19 @@ SDValue SITargetLowering::lowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const {
return Op;

EVT DstVT = Op.getValueType();

if (DstVT == MVT::v2f16) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as the globalisel path. The vectoriness shouldn't matter, we don't need to directly consume the operation

Copy link
Contributor Author

@changpeng changpeng Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as the globalisel path. The vectoriness shouldn't matter, we don't need to directly consume the operation

Okay, we can do as #135659, which essentially scalarize v2f16. We may do instcombine to generate the vector cvt as long as the target supports. Thanks.

// FIXME: We should only use fast math flag here. However, the fast math
// flag is lost during fptrunc to fp_round lowering. In addition, the flag
// is not propagated during subsequent lowering.
bool AllowInaccurateFP_ROUND = Op->getFlags().hasApproximateFuncs() ||
DAG.getTarget().Options.UnsafeFPMath;
// With fast math, the tablegen pattern is used to select native
// instructions. Otherwise, the vector will be scalarized and custom lowered
// to preserve the precision.
return AllowInaccurateFP_ROUND ? Op : SDValue();
}

SDLoc DL(Op);
if (DstVT == MVT::f16) {
// TODO: Handle strictfp
Expand Down
Loading
Loading