Skip to content

[AMDGPU] Set total VGPRs to 1536 for gfx12 #96272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion llvm/lib/Target/AMDGPU/AMDGPU.td
Original file line number Diff line number Diff line change
Expand Up @@ -1572,7 +1572,8 @@ def FeatureISAVersion12 : FeatureSet<
FeatureVGPRSingleUseHintInsts,
FeatureScalarDwordx3Loads,
FeatureDPPSrc1SGPR,
FeatureMaxHardClauseLength32]>;
FeatureMaxHardClauseLength32,
Feature1_5xVGPRs]>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true for all GFX12? The SGP seems to say something else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is true for gfx1200 and gfx1201.


def FeatureISAVersion12_Generic: FeatureSet<
!listconcat(FeatureISAVersion12.Features,
Expand Down
98 changes: 48 additions & 50 deletions llvm/test/CodeGen/AMDGPU/llvm.maximum.f64.ll
Original file line number Diff line number Diff line change
Expand Up @@ -2868,74 +2868,72 @@ define <16 x double> @v_maximum_v16f64(<16 x double> %src0, <16 x double> %src1)
; GFX12-NEXT: s_wait_samplecnt 0x0
; GFX12-NEXT: s_wait_bvhcnt 0x0
; GFX12-NEXT: s_wait_kmcnt 0x0
; GFX12-NEXT: s_clause 0x1b
; GFX12-NEXT: s_clause 0x1f
; GFX12-NEXT: scratch_load_b32 v31, off, s32
; GFX12-NEXT: scratch_load_b32 v33, off, s32 offset:8
; GFX12-NEXT: scratch_load_b32 v32, off, s32 offset:4
; GFX12-NEXT: scratch_load_b32 v35, off, s32 offset:16
; GFX12-NEXT: scratch_load_b32 v34, off, s32 offset:12
; GFX12-NEXT: scratch_load_b32 v31, off, s32
; GFX12-NEXT: scratch_load_b32 v37, off, s32 offset:120
; GFX12-NEXT: scratch_load_b32 v39, off, s32 offset:104
; GFX12-NEXT: scratch_load_b32 v49, off, s32 offset:24
; GFX12-NEXT: scratch_load_b32 v48, off, s32 offset:20
; GFX12-NEXT: scratch_load_b32 v51, off, s32 offset:32
; GFX12-NEXT: scratch_load_b32 v50, off, s32 offset:28
; GFX12-NEXT: scratch_load_b32 v53, off, s32 offset:40
; GFX12-NEXT: scratch_load_b32 v52, off, s32 offset:36
; GFX12-NEXT: scratch_load_b32 v55, off, s32 offset:48
; GFX12-NEXT: scratch_load_b32 v54, off, s32 offset:44
; GFX12-NEXT: scratch_load_b32 v65, off, s32 offset:56
; GFX12-NEXT: scratch_load_b32 v64, off, s32 offset:52
; GFX12-NEXT: scratch_load_b32 v67, off, s32 offset:64
; GFX12-NEXT: scratch_load_b32 v66, off, s32 offset:60
; GFX12-NEXT: scratch_load_b32 v69, off, s32 offset:72
; GFX12-NEXT: scratch_load_b32 v68, off, s32 offset:68
; GFX12-NEXT: scratch_load_b32 v71, off, s32 offset:80
; GFX12-NEXT: scratch_load_b32 v70, off, s32 offset:76
; GFX12-NEXT: scratch_load_b32 v81, off, s32 offset:88
; GFX12-NEXT: scratch_load_b32 v80, off, s32 offset:84
; GFX12-NEXT: scratch_load_b32 v83, off, s32 offset:96
; GFX12-NEXT: scratch_load_b32 v82, off, s32 offset:92
; GFX12-NEXT: scratch_load_b32 v38, off, s32 offset:100
; GFX12-NEXT: s_wait_loadcnt 0x1a
; GFX12-NEXT: scratch_load_b32 v37, off, s32 offset:24
; GFX12-NEXT: scratch_load_b32 v36, off, s32 offset:20
; GFX12-NEXT: scratch_load_b32 v39, off, s32 offset:32
; GFX12-NEXT: scratch_load_b32 v38, off, s32 offset:28
; GFX12-NEXT: scratch_load_b32 v49, off, s32 offset:40
; GFX12-NEXT: scratch_load_b32 v48, off, s32 offset:36
; GFX12-NEXT: scratch_load_b32 v51, off, s32 offset:48
; GFX12-NEXT: scratch_load_b32 v50, off, s32 offset:44
; GFX12-NEXT: scratch_load_b32 v53, off, s32 offset:56
; GFX12-NEXT: scratch_load_b32 v52, off, s32 offset:52
; GFX12-NEXT: scratch_load_b32 v55, off, s32 offset:64
; GFX12-NEXT: scratch_load_b32 v54, off, s32 offset:60
; GFX12-NEXT: scratch_load_b32 v65, off, s32 offset:72
; GFX12-NEXT: scratch_load_b32 v64, off, s32 offset:68
; GFX12-NEXT: scratch_load_b32 v67, off, s32 offset:80
; GFX12-NEXT: scratch_load_b32 v66, off, s32 offset:76
; GFX12-NEXT: scratch_load_b32 v69, off, s32 offset:88
; GFX12-NEXT: scratch_load_b32 v68, off, s32 offset:84
; GFX12-NEXT: scratch_load_b32 v71, off, s32 offset:96
; GFX12-NEXT: scratch_load_b32 v70, off, s32 offset:92
; GFX12-NEXT: scratch_load_b32 v81, off, s32 offset:104
; GFX12-NEXT: scratch_load_b32 v80, off, s32 offset:100
; GFX12-NEXT: scratch_load_b32 v83, off, s32 offset:112
; GFX12-NEXT: scratch_load_b32 v82, off, s32 offset:108
; GFX12-NEXT: scratch_load_b32 v85, off, s32 offset:120
; GFX12-NEXT: scratch_load_b32 v84, off, s32 offset:116
; GFX12-NEXT: scratch_load_b32 v87, off, s32 offset:128
; GFX12-NEXT: scratch_load_b32 v86, off, s32 offset:124
; GFX12-NEXT: s_wait_loadcnt 0x1e
; GFX12-NEXT: v_maximum_f64 v[0:1], v[0:1], v[32:33]
; GFX12-NEXT: s_clause 0x2
; GFX12-NEXT: scratch_load_b32 v33, off, s32 offset:112
; GFX12-NEXT: scratch_load_b32 v32, off, s32 offset:108
; GFX12-NEXT: scratch_load_b32 v36, off, s32 offset:116
; GFX12-NEXT: s_wait_loadcnt 0x1b
; GFX12-NEXT: s_wait_loadcnt 0x1c
; GFX12-NEXT: v_maximum_f64 v[2:3], v[2:3], v[34:35]
; GFX12-NEXT: s_clause 0x1
; GFX12-NEXT: scratch_load_b32 v35, off, s32 offset:128
; GFX12-NEXT: scratch_load_b32 v34, off, s32 offset:124
; GFX12-NEXT: s_wait_loadcnt 0x1a
; GFX12-NEXT: v_maximum_f64 v[4:5], v[4:5], v[36:37]
; GFX12-NEXT: s_wait_loadcnt 0x18
; GFX12-NEXT: v_maximum_f64 v[4:5], v[4:5], v[48:49]
; GFX12-NEXT: v_maximum_f64 v[6:7], v[6:7], v[38:39]
; GFX12-NEXT: s_wait_loadcnt 0x16
; GFX12-NEXT: v_maximum_f64 v[6:7], v[6:7], v[50:51]
; GFX12-NEXT: v_maximum_f64 v[8:9], v[8:9], v[48:49]
; GFX12-NEXT: s_wait_loadcnt 0x14
; GFX12-NEXT: v_maximum_f64 v[8:9], v[8:9], v[52:53]
; GFX12-NEXT: v_maximum_f64 v[10:11], v[10:11], v[50:51]
; GFX12-NEXT: s_wait_loadcnt 0x12
; GFX12-NEXT: v_maximum_f64 v[10:11], v[10:11], v[54:55]
; GFX12-NEXT: v_maximum_f64 v[12:13], v[12:13], v[52:53]
; GFX12-NEXT: s_wait_loadcnt 0x10
; GFX12-NEXT: v_maximum_f64 v[12:13], v[12:13], v[64:65]
; GFX12-NEXT: v_maximum_f64 v[14:15], v[14:15], v[54:55]
; GFX12-NEXT: s_wait_loadcnt 0xe
; GFX12-NEXT: v_maximum_f64 v[14:15], v[14:15], v[66:67]
; GFX12-NEXT: v_maximum_f64 v[16:17], v[16:17], v[64:65]
; GFX12-NEXT: s_wait_loadcnt 0xc
; GFX12-NEXT: v_maximum_f64 v[16:17], v[16:17], v[68:69]
; GFX12-NEXT: v_maximum_f64 v[18:19], v[18:19], v[66:67]
; GFX12-NEXT: s_wait_loadcnt 0xa
; GFX12-NEXT: v_maximum_f64 v[18:19], v[18:19], v[70:71]
; GFX12-NEXT: v_maximum_f64 v[20:21], v[20:21], v[68:69]
; GFX12-NEXT: s_wait_loadcnt 0x8
; GFX12-NEXT: v_maximum_f64 v[20:21], v[20:21], v[80:81]
; GFX12-NEXT: v_maximum_f64 v[22:23], v[22:23], v[70:71]
; GFX12-NEXT: s_wait_loadcnt 0x6
; GFX12-NEXT: v_maximum_f64 v[22:23], v[22:23], v[82:83]
; GFX12-NEXT: s_wait_loadcnt 0x5
; GFX12-NEXT: v_maximum_f64 v[24:25], v[24:25], v[38:39]
; GFX12-NEXT: s_wait_loadcnt 0x3
; GFX12-NEXT: v_maximum_f64 v[26:27], v[26:27], v[32:33]
; GFX12-NEXT: v_maximum_f64 v[24:25], v[24:25], v[80:81]
; GFX12-NEXT: s_wait_loadcnt 0x4
; GFX12-NEXT: v_maximum_f64 v[26:27], v[26:27], v[82:83]
; GFX12-NEXT: s_wait_loadcnt 0x2
; GFX12-NEXT: v_maximum_f64 v[28:29], v[28:29], v[36:37]
; GFX12-NEXT: v_maximum_f64 v[28:29], v[28:29], v[84:85]
; GFX12-NEXT: s_wait_loadcnt 0x0
; GFX12-NEXT: v_maximum_f64 v[30:31], v[30:31], v[34:35]
; GFX12-NEXT: v_maximum_f64 v[30:31], v[30:31], v[86:87]
; GFX12-NEXT: s_setpc_b64 s[30:31]
%op = call <16 x double> @llvm.maximum.v16f64(<16 x double> %src0, <16 x double> %src1)
ret <16 x double> %op
Expand Down
98 changes: 48 additions & 50 deletions llvm/test/CodeGen/AMDGPU/llvm.minimum.f64.ll
Original file line number Diff line number Diff line change
Expand Up @@ -2868,74 +2868,72 @@ define <16 x double> @v_minimum_v16f64(<16 x double> %src0, <16 x double> %src1)
; GFX12-NEXT: s_wait_samplecnt 0x0
; GFX12-NEXT: s_wait_bvhcnt 0x0
; GFX12-NEXT: s_wait_kmcnt 0x0
; GFX12-NEXT: s_clause 0x1b
; GFX12-NEXT: s_clause 0x1f
; GFX12-NEXT: scratch_load_b32 v31, off, s32
; GFX12-NEXT: scratch_load_b32 v33, off, s32 offset:8
; GFX12-NEXT: scratch_load_b32 v32, off, s32 offset:4
; GFX12-NEXT: scratch_load_b32 v35, off, s32 offset:16
; GFX12-NEXT: scratch_load_b32 v34, off, s32 offset:12
; GFX12-NEXT: scratch_load_b32 v31, off, s32
; GFX12-NEXT: scratch_load_b32 v37, off, s32 offset:120
; GFX12-NEXT: scratch_load_b32 v39, off, s32 offset:104
; GFX12-NEXT: scratch_load_b32 v49, off, s32 offset:24
; GFX12-NEXT: scratch_load_b32 v48, off, s32 offset:20
; GFX12-NEXT: scratch_load_b32 v51, off, s32 offset:32
; GFX12-NEXT: scratch_load_b32 v50, off, s32 offset:28
; GFX12-NEXT: scratch_load_b32 v53, off, s32 offset:40
; GFX12-NEXT: scratch_load_b32 v52, off, s32 offset:36
; GFX12-NEXT: scratch_load_b32 v55, off, s32 offset:48
; GFX12-NEXT: scratch_load_b32 v54, off, s32 offset:44
; GFX12-NEXT: scratch_load_b32 v65, off, s32 offset:56
; GFX12-NEXT: scratch_load_b32 v64, off, s32 offset:52
; GFX12-NEXT: scratch_load_b32 v67, off, s32 offset:64
; GFX12-NEXT: scratch_load_b32 v66, off, s32 offset:60
; GFX12-NEXT: scratch_load_b32 v69, off, s32 offset:72
; GFX12-NEXT: scratch_load_b32 v68, off, s32 offset:68
; GFX12-NEXT: scratch_load_b32 v71, off, s32 offset:80
; GFX12-NEXT: scratch_load_b32 v70, off, s32 offset:76
; GFX12-NEXT: scratch_load_b32 v81, off, s32 offset:88
; GFX12-NEXT: scratch_load_b32 v80, off, s32 offset:84
; GFX12-NEXT: scratch_load_b32 v83, off, s32 offset:96
; GFX12-NEXT: scratch_load_b32 v82, off, s32 offset:92
; GFX12-NEXT: scratch_load_b32 v38, off, s32 offset:100
; GFX12-NEXT: s_wait_loadcnt 0x1a
; GFX12-NEXT: scratch_load_b32 v37, off, s32 offset:24
; GFX12-NEXT: scratch_load_b32 v36, off, s32 offset:20
; GFX12-NEXT: scratch_load_b32 v39, off, s32 offset:32
; GFX12-NEXT: scratch_load_b32 v38, off, s32 offset:28
; GFX12-NEXT: scratch_load_b32 v49, off, s32 offset:40
; GFX12-NEXT: scratch_load_b32 v48, off, s32 offset:36
; GFX12-NEXT: scratch_load_b32 v51, off, s32 offset:48
; GFX12-NEXT: scratch_load_b32 v50, off, s32 offset:44
; GFX12-NEXT: scratch_load_b32 v53, off, s32 offset:56
; GFX12-NEXT: scratch_load_b32 v52, off, s32 offset:52
; GFX12-NEXT: scratch_load_b32 v55, off, s32 offset:64
; GFX12-NEXT: scratch_load_b32 v54, off, s32 offset:60
; GFX12-NEXT: scratch_load_b32 v65, off, s32 offset:72
; GFX12-NEXT: scratch_load_b32 v64, off, s32 offset:68
; GFX12-NEXT: scratch_load_b32 v67, off, s32 offset:80
; GFX12-NEXT: scratch_load_b32 v66, off, s32 offset:76
; GFX12-NEXT: scratch_load_b32 v69, off, s32 offset:88
; GFX12-NEXT: scratch_load_b32 v68, off, s32 offset:84
; GFX12-NEXT: scratch_load_b32 v71, off, s32 offset:96
; GFX12-NEXT: scratch_load_b32 v70, off, s32 offset:92
; GFX12-NEXT: scratch_load_b32 v81, off, s32 offset:104
; GFX12-NEXT: scratch_load_b32 v80, off, s32 offset:100
; GFX12-NEXT: scratch_load_b32 v83, off, s32 offset:112
; GFX12-NEXT: scratch_load_b32 v82, off, s32 offset:108
; GFX12-NEXT: scratch_load_b32 v85, off, s32 offset:120
; GFX12-NEXT: scratch_load_b32 v84, off, s32 offset:116
; GFX12-NEXT: scratch_load_b32 v87, off, s32 offset:128
; GFX12-NEXT: scratch_load_b32 v86, off, s32 offset:124
; GFX12-NEXT: s_wait_loadcnt 0x1e
; GFX12-NEXT: v_minimum_f64 v[0:1], v[0:1], v[32:33]
; GFX12-NEXT: s_clause 0x2
; GFX12-NEXT: scratch_load_b32 v33, off, s32 offset:112
; GFX12-NEXT: scratch_load_b32 v32, off, s32 offset:108
; GFX12-NEXT: scratch_load_b32 v36, off, s32 offset:116
; GFX12-NEXT: s_wait_loadcnt 0x1b
; GFX12-NEXT: s_wait_loadcnt 0x1c
; GFX12-NEXT: v_minimum_f64 v[2:3], v[2:3], v[34:35]
; GFX12-NEXT: s_clause 0x1
; GFX12-NEXT: scratch_load_b32 v35, off, s32 offset:128
; GFX12-NEXT: scratch_load_b32 v34, off, s32 offset:124
; GFX12-NEXT: s_wait_loadcnt 0x1a
; GFX12-NEXT: v_minimum_f64 v[4:5], v[4:5], v[36:37]
; GFX12-NEXT: s_wait_loadcnt 0x18
; GFX12-NEXT: v_minimum_f64 v[4:5], v[4:5], v[48:49]
; GFX12-NEXT: v_minimum_f64 v[6:7], v[6:7], v[38:39]
; GFX12-NEXT: s_wait_loadcnt 0x16
; GFX12-NEXT: v_minimum_f64 v[6:7], v[6:7], v[50:51]
; GFX12-NEXT: v_minimum_f64 v[8:9], v[8:9], v[48:49]
; GFX12-NEXT: s_wait_loadcnt 0x14
; GFX12-NEXT: v_minimum_f64 v[8:9], v[8:9], v[52:53]
; GFX12-NEXT: v_minimum_f64 v[10:11], v[10:11], v[50:51]
; GFX12-NEXT: s_wait_loadcnt 0x12
; GFX12-NEXT: v_minimum_f64 v[10:11], v[10:11], v[54:55]
; GFX12-NEXT: v_minimum_f64 v[12:13], v[12:13], v[52:53]
; GFX12-NEXT: s_wait_loadcnt 0x10
; GFX12-NEXT: v_minimum_f64 v[12:13], v[12:13], v[64:65]
; GFX12-NEXT: v_minimum_f64 v[14:15], v[14:15], v[54:55]
; GFX12-NEXT: s_wait_loadcnt 0xe
; GFX12-NEXT: v_minimum_f64 v[14:15], v[14:15], v[66:67]
; GFX12-NEXT: v_minimum_f64 v[16:17], v[16:17], v[64:65]
; GFX12-NEXT: s_wait_loadcnt 0xc
; GFX12-NEXT: v_minimum_f64 v[16:17], v[16:17], v[68:69]
; GFX12-NEXT: v_minimum_f64 v[18:19], v[18:19], v[66:67]
; GFX12-NEXT: s_wait_loadcnt 0xa
; GFX12-NEXT: v_minimum_f64 v[18:19], v[18:19], v[70:71]
; GFX12-NEXT: v_minimum_f64 v[20:21], v[20:21], v[68:69]
; GFX12-NEXT: s_wait_loadcnt 0x8
; GFX12-NEXT: v_minimum_f64 v[20:21], v[20:21], v[80:81]
; GFX12-NEXT: v_minimum_f64 v[22:23], v[22:23], v[70:71]
; GFX12-NEXT: s_wait_loadcnt 0x6
; GFX12-NEXT: v_minimum_f64 v[22:23], v[22:23], v[82:83]
; GFX12-NEXT: s_wait_loadcnt 0x5
; GFX12-NEXT: v_minimum_f64 v[24:25], v[24:25], v[38:39]
; GFX12-NEXT: s_wait_loadcnt 0x3
; GFX12-NEXT: v_minimum_f64 v[26:27], v[26:27], v[32:33]
; GFX12-NEXT: v_minimum_f64 v[24:25], v[24:25], v[80:81]
; GFX12-NEXT: s_wait_loadcnt 0x4
; GFX12-NEXT: v_minimum_f64 v[26:27], v[26:27], v[82:83]
; GFX12-NEXT: s_wait_loadcnt 0x2
; GFX12-NEXT: v_minimum_f64 v[28:29], v[28:29], v[36:37]
; GFX12-NEXT: v_minimum_f64 v[28:29], v[28:29], v[84:85]
; GFX12-NEXT: s_wait_loadcnt 0x0
; GFX12-NEXT: v_minimum_f64 v[30:31], v[30:31], v[34:35]
; GFX12-NEXT: v_minimum_f64 v[30:31], v[30:31], v[86:87]
; GFX12-NEXT: s_setpc_b64 s[30:31]
%op = call <16 x double> @llvm.minimum.v16f64(<16 x double> %src0, <16 x double> %src1)
ret <16 x double> %op
Expand Down
Loading
Loading