Skip to content

Commit ab1d3a9

Browse files
committed
AMDGPU/SI: Insert wait states required after v_readfirstlane on SI
Summary: We will be able to handle this case much better once the hazard recognizer is finished, but this conservative implementation fixes a hang with the piglit test: spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra Reviewers: arsenm, nhaehnle Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18988 llvm-svn: 266105
1 parent 3b08238 commit ab1d3a9

File tree

3 files changed

+8
-0
lines changed

3 files changed

+8
-0
lines changed

llvm/lib/Target/AMDGPU/SIInsertWaits.cpp

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -601,6 +601,12 @@ bool SIInsertWaits::runOnMachineFunction(MachineFunction &MF) {
601601
insertDPPWaitStates(I);
602602
}
603603

604+
// Insert required wait states for SMRD reading an SGPR written by a VALU
605+
// instruction.
606+
if (ST.getGeneration() <= AMDGPUSubtarget::SOUTHERN_ISLANDS &&
607+
I->getOpcode() == AMDGPU::V_READFIRSTLANE_B32)
608+
TII->insertWaitStates(MBB, std::next(I), 4);
609+
604610
// Wait for everything before a barrier.
605611
if (I->getOpcode() == AMDGPU::S_BARRIER)
606612
Changes |= insertWait(MBB, I, LastIssued);

llvm/test/CodeGen/AMDGPU/missing-store.ll

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
; SI: buffer_store_dword
1111
; SI: v_readfirstlane_b32 s[[PTR_LO:[0-9]+]], v{{[0-9]+}}
1212
; SI: v_readfirstlane_b32 s[[PTR_HI:[0-9]+]], v{{[0-9]+}}
13+
; SI-NEXT: s_nop
1314
; SI: s_load_dword s{{[0-9]+}}, s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}
1415
; SI: buffer_store_dword
1516
; SI: s_endpgm

llvm/test/CodeGen/AMDGPU/salu-to-valu.ll

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ done: ; preds = %loop
5656
; SI: s_movk_i32 [[OFFSET:s[0-9]+]], 0x2ee0
5757
; GCN: v_readfirstlane_b32 s[[PTR_LO:[0-9]+]], v{{[0-9]+}}
5858
; GCN: v_readfirstlane_b32 s[[PTR_HI:[0-9]+]], v{{[0-9]+}}
59+
; SI-NEXT: s_nop
5960
; SI: s_load_dword [[OUT:s[0-9]+]], s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, [[OFFSET]]
6061
; CI: s_load_dword [[OUT:s[0-9]+]], s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, 0xbb8
6162
; GCN: v_mov_b32_e32 [[V_OUT:v[0-9]+]], [[OUT]]

0 commit comments

Comments
 (0)