-
Notifications
You must be signed in to change notification settings - Fork 14.3k
AMDGPU: Make v2f64 -> v2f16 conversion Legal only when unsafe fast math is set #134738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
9bdaf1d
6d970b0
62078af
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1053,10 +1053,10 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_, | |
} | ||
|
||
auto &FPTruncActions = getActionDefinitionsBuilder(G_FPTRUNC); | ||
if (ST.hasCvtPkF16F32Inst()) | ||
FPTruncActions.legalFor( | ||
{{S32, S64}, {S16, S32}, {V2S16, V2S32}, {V2S16, V2S64}}); | ||
else | ||
if (ST.hasCvtPkF16F32Inst()) { | ||
FPTruncActions.legalFor({{S32, S64}, {S16, S32}, {V2S16, V2S32}}) | ||
.customFor({V2S16, V2S64}); | ||
} else | ||
FPTruncActions.legalFor({{S32, S64}, {S16, S32}}); | ||
FPTruncActions.scalarize(0).lower(); | ||
|
||
|
@@ -2155,6 +2155,8 @@ bool AMDGPULegalizerInfo::legalizeCustom( | |
case TargetOpcode::G_FMINNUM_IEEE: | ||
case TargetOpcode::G_FMAXNUM_IEEE: | ||
return legalizeMinNumMaxNum(Helper, MI); | ||
case TargetOpcode::G_FPTRUNC: | ||
return legalizeFPTrunc(Helper, MI, MRI); | ||
case TargetOpcode::G_EXTRACT_VECTOR_ELT: | ||
return legalizeExtractVectorElt(MI, MRI, B); | ||
case TargetOpcode::G_INSERT_VECTOR_ELT: | ||
|
@@ -2741,6 +2743,29 @@ bool AMDGPULegalizerInfo::legalizeMinNumMaxNum(LegalizerHelper &Helper, | |
return Helper.lowerFMinNumMaxNum(MI) == LegalizerHelper::Legalized; | ||
} | ||
|
||
bool AMDGPULegalizerInfo::legalizeFPTrunc(LegalizerHelper &Helper, | ||
MachineInstr &MI, | ||
MachineRegisterInfo &MRI) const { | ||
// TODO: We should only use fast math flag. But the global option is | ||
// still used here to be consistent, especially when the fast math flag is | ||
// not working for FP_ROUND on the SelectDAG path at this moment. | ||
MachineFunction &MF = Helper.MIRBuilder.getMF(); | ||
bool AllowInaccurateFPTRUNC = MI.getFlag(MachineInstr::FmAfn) || | ||
MF.getTarget().Options.UnsafeFPMath; | ||
|
||
if (AllowInaccurateFPTRUNC) { | ||
// Use the tablegen pattern to select native instructions. | ||
return true; | ||
} | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This handling logic doesn't make sense. The decision to scalarize or not has nothing to do with the FP properties? The tablegen pattern also handles the build_vector case, which is the MIR I would expect to select from. Can we just drop the direct vector handling, and always scalarize? The tablegen patterns should also be guarded with the fast checks if we're going to have them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The logic to generate instructions based on the pattern, but this is only correct when the unsafe fpmath is set. |
||
Register DstReg = MI.getOperand(0).getReg(); | ||
LLT DstTy = MRI.getType(DstReg); | ||
|
||
// Scalarize the vector and fall through to lower f64 -> f16. | ||
return Helper.fewerElementsVector(MI, 0, DstTy.getElementType()) == | ||
LegalizerHelper::Legalized; | ||
Comment on lines
+2749
to
+2766
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like it should be a combine and not a modification of legalization rules There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I am looking at instcombine to generate v_cvt_pk_f16_f32 instructions. Should we implement in AMDGPU because it is target dependent? Thanks. |
||
} | ||
|
||
bool AMDGPULegalizerInfo::legalizeExtractVectorElt( | ||
MachineInstr &MI, MachineRegisterInfo &MRI, | ||
MachineIRBuilder &B) const { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -916,7 +916,7 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM, | |
} | ||
|
||
if (Subtarget->hasCvtPkF16F32Inst()) { | ||
setOperationAction(ISD::FP_ROUND, MVT::v2f16, Legal); | ||
setOperationAction(ISD::FP_ROUND, MVT::v2f16, Custom); | ||
} | ||
|
||
setTargetDAGCombine({ISD::ADD, | ||
|
@@ -6893,6 +6893,19 @@ SDValue SITargetLowering::lowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const { | |
return Op; | ||
|
||
EVT DstVT = Op.getValueType(); | ||
|
||
if (DstVT == MVT::v2f16) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comments as the globalisel path. The vectoriness shouldn't matter, we don't need to directly consume the operation There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Okay, we can do as #135659, which essentially scalarize v2f16. We may do instcombine to generate the vector cvt as long as the target supports. Thanks. |
||
// FIXME: We should only use fast math flag here. However, the fast math | ||
// flag is lost during fptrunc to fp_round lowering. In addition, the flag | ||
// is not propagated during subsequent lowering. | ||
bool AllowInaccurateFP_ROUND = Op->getFlags().hasApproximateFuncs() || | ||
DAG.getTarget().Options.UnsafeFPMath; | ||
// With fast math, the tablegen pattern is used to select native | ||
// instructions. Otherwise, the vector will be scalarized and custom lowered | ||
// to preserve the precision. | ||
return AllowInaccurateFP_ROUND ? Op : SDValue(); | ||
} | ||
|
||
SDLoc DL(Op); | ||
if (DstVT == MVT::f16) { | ||
// TODO: Handle strictfp | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's suspect that we need custom handling for the v2 case, but not the regular scalar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously only v2f16 was made Legal in order to generate v_cvt_pk_f16_f32, I think.