Skip to content
This repository was archived by the owner on Mar 28, 2020. It is now read-only.

Commit 46ba685

Browse files
committed
[AMDGPU] Fix CS scratch setup on pre-GCN3 ASICs
Summary: Prior to GCN3 s_load_dword offsets are in dwords rather than bytes. Thus the scratch buffer descriptor offset must be adjusted for pre-GCN3 ASICs. Reviewers: nhaehnle, tpr Reviewed By: nhaehnle Subscribers: sheredom, arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D56496 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353530 91177308-0d34-0410-b5e6-96231b3b80d8
1 parent 9274280 commit 46ba685

File tree

2 files changed

+8
-3
lines changed

2 files changed

+8
-3
lines changed

lib/Target/AMDGPU/SIFrameLowering.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -422,9 +422,11 @@ void SIFrameLowering::emitEntryFunctionScratchSetup(const GCNSubtarget &ST,
422422
MachineMemOperand::MODereferenceable,
423423
16, 4);
424424
unsigned Offset = Fn.getCallingConv() == CallingConv::AMDGPU_CS ? 16 : 0;
425+
const GCNSubtarget &Subtarget = MF.getSubtarget<GCNSubtarget>();
426+
unsigned EncodedOffset = AMDGPU::getSMRDEncodedOffset(Subtarget, Offset);
425427
BuildMI(MBB, I, DL, LoadDwordX4, ScratchRsrcReg)
426428
.addReg(Rsrc01)
427-
.addImm(Offset) // offset
429+
.addImm(EncodedOffset) // offset
428430
.addImm(0) // glc
429431
.addReg(ScratchRsrcReg, RegState::ImplicitDefine)
430432
.addMemOperand(MMO);

test/CodeGen/AMDGPU/amdpal.ll

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
; RUN: llc < %s -mtriple=amdgcn--amdpal -mcpu=tahiti | FileCheck --check-prefix=PAL --enable-var-scope %s
1+
; RUN: llc < %s -mtriple=amdgcn--amdpal -mcpu=tahiti | FileCheck --check-prefixes=PAL,CI --enable-var-scope %s
2+
; RUN: llc < %s -mtriple=amdgcn--amdpal -mcpu=tonga | FileCheck --check-prefixes=PAL,VI --enable-var-scope %s
23

34
; PAL-NOT: .AMDGPU.config
45
; PAL-LABEL: {{^}}simple:
@@ -55,11 +56,13 @@ entry:
5556
; Check code sequence for amdpal use of scratch for alloca in a compute shader.
5657
; The scratch descriptor is loaded from offset 0x10 of the GIT, rather than offset
5758
; 0 in a graphics shader.
59+
; Prior to GCN3 s_load_dword offsets are dwords, so the offset will be 0x4.
5860

5961
; PAL-LABEL: {{^}}scratch2_cs:
6062
; PAL: s_movk_i32 s{{[0-9]+}}, 0x1234
6163
; PAL: s_mov_b32 s[[GITPTR:[0-9]+]], s0
62-
; PAL: s_load_dwordx4 s{{\[}}[[SCRATCHDESC:[0-9]+]]:{{[0-9]+]}}, s{{\[}}[[GITPTR]]:{{[0-9]+\]}}, 0x10
64+
; CI: s_load_dwordx4 s{{\[}}[[SCRATCHDESC:[0-9]+]]:{{[0-9]+]}}, s{{\[}}[[GITPTR]]:{{[0-9]+\]}}, 0x4
65+
; VI: s_load_dwordx4 s{{\[}}[[SCRATCHDESC:[0-9]+]]:{{[0-9]+]}}, s{{\[}}[[GITPTR]]:{{[0-9]+\]}}, 0x10
6366
; PAL: buffer_store{{.*}}, s{{\[}}[[SCRATCHDESC]]:
6467

6568
define amdgpu_cs void @scratch2_cs(i32 inreg, i32 inreg, i32 inreg, <3 x i32> inreg, i32 inreg, <3 x i32> %coord, <2 x i32> %in, i32 %extra, i32 %idx) #0 {

0 commit comments

Comments
 (0)