Skip to content

[AMDGPU] Fix using wrong register in frame index shift #101649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

PankajDwivedi-25
Copy link
Contributor

@PankajDwivedi-25 PankajDwivedi-25 commented Aug 2, 2024

In case of v_mad we have materialized the offset in vgpr and mad is performed in wave space, later vgpr have to be shifted back in lane space. #99556 introduces a bug.

@llvmbot
Copy link
Member

llvmbot commented Aug 2, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Pankaj Dwivedi (PankajDwivedi-25)

Changes

In case of v_mad we have materialized the offset in vgpr and mad is performed in wave space, later vgpr have to be shifted back in lane space.


Full diff: https://github.com/llvm/llvm-project/pull/101649.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-mov-b32.mir (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll (+4-4)
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index e824e95610a65..ba49f4a309ebb 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -2591,7 +2591,7 @@ bool SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
                 BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_LSHRREV_B32_e64),
                         TmpResultReg)
                     .addImm(ST.getWavefrontSizeLog2())
-                    .addReg(FrameReg);
+                    .addReg(TmpResultReg);
               }
 
               Register NewDest = IsCopy ? ResultReg
diff --git a/llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-mov-b32.mir b/llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-mov-b32.mir
index 78fb25a76d25e..08c2904c601ad 100644
--- a/llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-mov-b32.mir
+++ b/llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-mov-b32.mir
@@ -708,7 +708,7 @@ body:   |
     ; GFX8-NEXT: S_CMP_EQ_I32 $sgpr4, $sgpr5, implicit-def $scc
     ; GFX8-NEXT: $vgpr0 = V_MOV_B32_e32 64, implicit $exec
     ; GFX8-NEXT: $vgpr0 = V_MAD_U32_U24_e64 killed $vgpr0, 64, $sgpr32, 0, implicit $exec
-    ; GFX8-NEXT: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr32, implicit $exec
+    ; GFX8-NEXT: $vgpr0 = V_LSHRREV_B32_e64 6, $vgpr0, implicit $exec
     ; GFX8-NEXT: $sgpr4 = V_READFIRSTLANE_B32 $vgpr0, implicit $exec
     ; GFX8-NEXT: S_NOP 0, implicit $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7
     ; GFX8-NEXT: S_NOP 0, implicit $sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15
@@ -903,7 +903,7 @@ body:             |
     ; GFX8-NEXT: S_CMP_EQ_I32 $sgpr4, $sgpr5, implicit-def $scc
     ; GFX8-NEXT: $vgpr0 = V_MOV_B32_e32 68, implicit $exec
     ; GFX8-NEXT: $vgpr0 = V_MAD_U32_U24_e64 killed $vgpr0, 64, $sgpr32, 0, implicit $exec
-    ; GFX8-NEXT: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr32, implicit $exec
+    ; GFX8-NEXT: $vgpr0 = V_LSHRREV_B32_e64 6, $vgpr0, implicit $exec
     ; GFX8-NEXT: $sgpr4 = V_READFIRSTLANE_B32 $vgpr0, implicit $exec
     ; GFX8-NEXT: S_NOP 0, implicit $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7
     ; GFX8-NEXT: S_NOP 0, implicit $sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15
diff --git a/llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll b/llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll
index 9cd92dcd5c94d..87cfaec208897 100644
--- a/llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll
@@ -77,7 +77,7 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs() #0
 ; GFX7-NEXT:    buffer_store_dword v0, off, s[0:3], s32
 ; GFX7-NEXT:    v_mov_b32_e32 v0, 0x4040
 ; GFX7-NEXT:    v_mad_u32_u24 v0, v0, 64, s32
-; GFX7-NEXT:    v_lshr_b32_e64 v0, s32, 6
+; GFX7-NEXT:    v_lshrrev_b32_e32 v0, 6, v0
 ; GFX7-NEXT:    v_writelane_b32 v23, s59, 28
 ; GFX7-NEXT:    v_readfirstlane_b32 s59, v0
 ; GFX7-NEXT:    buffer_load_dword v0, off, s[0:3], s32
@@ -168,7 +168,7 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs() #0
 ; GFX8-NEXT:    buffer_store_dword v0, off, s[0:3], s32
 ; GFX8-NEXT:    v_mov_b32_e32 v0, 0x4040
 ; GFX8-NEXT:    v_mad_u32_u24 v0, v0, 64, s32
-; GFX8-NEXT:    v_lshrrev_b32_e64 v0, 6, s32
+; GFX8-NEXT:    v_lshrrev_b32_e32 v0, 6, v0
 ; GFX8-NEXT:    v_writelane_b32 v23, s59, 28
 ; GFX8-NEXT:    v_readfirstlane_b32 s59, v0
 ; GFX8-NEXT:    buffer_load_dword v0, off, s[0:3], s32
@@ -841,7 +841,7 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs__lowe
 ; GFX7-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc
 ; GFX7-NEXT:    ;;#ASMEND
 ; GFX7-NEXT:    v_mad_u32_u24 v22, v22, 64, s32
-; GFX7-NEXT:    v_lshr_b32_e64 v22, s32, 6
+; GFX7-NEXT:    v_lshrrev_b32_e32 v22, 6, v22
 ; GFX7-NEXT:    v_writelane_b32 v21, s59, 28
 ; GFX7-NEXT:    v_readfirstlane_b32 s59, v22
 ; GFX7-NEXT:    ;;#ASMSTART
@@ -924,7 +924,7 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs__lowe
 ; GFX8-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc
 ; GFX8-NEXT:    ;;#ASMEND
 ; GFX8-NEXT:    v_mad_u32_u24 v22, v22, 64, s32
-; GFX8-NEXT:    v_lshrrev_b32_e64 v22, 6, s32
+; GFX8-NEXT:    v_lshrrev_b32_e32 v22, 6, v22
 ; GFX8-NEXT:    v_writelane_b32 v21, s59, 28
 ; GFX8-NEXT:    v_readfirstlane_b32 s59, v22
 ; GFX8-NEXT:    ;;#ASMSTART

@cdevadas
Copy link
Collaborator

cdevadas commented Aug 2, 2024

Avoid such a long name for the PR. Change it to something like:
[AMDGPU] Fixed the wrong register used in the shift operation.

Indicate the PR in the description where this bug was introduced (#99556).
Why is this PR marked WIP?

@PankajDwivedi-25 PankajDwivedi-25 changed the title [WIP] v_lshrrev uses vector register operand in wave space after scaling offset materialization in v_mad [AMDGPU] Fixed the wrong register used in the shift operation Aug 2, 2024
@arsenm arsenm changed the title [AMDGPU] Fixed the wrong register used in the shift operation [AMDGPU] Fixed the wrong register used in frame index shift Aug 2, 2024
@arsenm arsenm changed the title [AMDGPU] Fixed the wrong register used in frame index shift [AMDGPU] Fixed using wrong register in frame index shift Aug 2, 2024
@arsenm arsenm changed the title [AMDGPU] Fixed using wrong register in frame index shift [AMDGPU] Fix using wrong register in frame index shift Aug 2, 2024
@PankajDwivedi-25 PankajDwivedi-25 merged commit adac04f into llvm:main Aug 2, 2024
9 checks passed
jrbyrnes pushed a commit to jrbyrnes/llvm-project that referenced this pull request Aug 16, 2024
In case of v_mad we have materialized the offset in vgpr and mad is
performed in wave space, later vgpr have to be shifted back in lane
space. [llvm#99556](llvm#99556)
introduces a bug.

Co-authored-by: Pankajdwivedi-25 <[email protected]>
Change-Id: Ic02338cf087cb3c95964b7273609a73a3e7b2d03
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Sep 11, 2024
In case of v_mad we have materialized the offset in vgpr and mad is
performed in wave space, later vgpr have to be shifted back in lane
space. [llvm#99556](llvm#99556)
introduces a bug.

Co-authored-by: Pankajdwivedi-25 <[email protected]>
Change-Id: Ic02338cf087cb3c95964b7273609a73a3e7b2d03
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Nov 22, 2024
In case of v_mad we have materialized the offset in vgpr and mad is
performed in wave space, later vgpr have to be shifted back in lane
space. [llvm#99556](llvm#99556)
introduces a bug.

Co-authored-by: Pankajdwivedi-25 <[email protected]>
Change-Id: I23db2a321c9bf3e28b5e173ddc95fa98c4d16fd6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants