-
Notifications
You must be signed in to change notification settings - Fork 14.3k
AMDGPU: Make v2f32 -> v2f16 legal when target supports v_cvt_pk_f16_f32 #139956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,20 @@ define <2 x half> @v_test_cvt_v2f32_v2f16(<2 x float> %src) { | |
ret <2 x half> %res | ||
} | ||
|
||
define half @fptrunc_v2f32_v2f16_then_extract(<2 x float> %src) { | ||
; GFX950-LABEL: fptrunc_v2f32_v2f16_then_extract: | ||
; GFX950: ; %bb.0: | ||
; GFX950-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) | ||
; GFX950-NEXT: v_cvt_pk_f16_f32 v0, v0, v1 | ||
; GFX950-NEXT: v_add_f16_sdwa v0, v0, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD | ||
; GFX950-NEXT: s_setpc_b64 s[30:31] | ||
%vec_half = fptrunc <2 x float> %src to <2 x half> | ||
%first = extractelement <2 x half> %vec_half, i64 1 | ||
%second = extractelement <2 x half> %vec_half, i64 0 | ||
%res = fadd half %first, %second | ||
ret half %res | ||
} | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we really missing test coverage for this and wider vector types? Can you add that? In particular we need to make sure we get 2 x pk instructions in the 4 x case, and 1 pk and one scalar op in the 3 x case There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No. This change only takes care of v2f16. It is beyond this work to test wider vectors. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is not beyond this to test wider vectors. Please add the missing tests in a follow up There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The fadd use here should also be unnecessary There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Do you want to simply add the wider vector tests, or also add the logic to split the wider vector wisely to form the packed conversion? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Then what to do after extracts? I think it is the extracts that prevent instcombining. We need two uses of the extracts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the extract use is interesting, should be another test in addition to the base one with a direct return or store of the vector value There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The base test was already there in the same file. In the base test, the expected v_cvt_pk_f16_f32 will always be generated. It is the extract use case that inspired this patch to make v2f32->v2f16 lega. |
||
define <2 x half> @v_test_cvt_v2f64_v2f16(<2 x double> %src) { | ||
; GFX950-SDAG-LABEL: v_test_cvt_v2f64_v2f16: | ||
; GFX950-SDAG: ; %bb.0: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same should apply to all vector types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. You can not make v3f32 and v4f32 legal because there is no corresponding instructions. Instead, we have to "Expand" (split) for wider vector types. It is another subject how to split the wider vector to take advantage of the packed conversion.