Skip to content

Commit 6540f16

Browse files
authored
[AMDGPU] Add IR-level pass to rewrite away address space 7 (#77952)
This commit adds the -lower-buffer-fat-pointers pass, which is applicable to all AMDGCN compilations. The purpose of this pass is to remove the type `ptr addrspace(7)` from incoming IR. This must be done at the LLVM IR level because `ptr addrspace(7)`, as a 160-bit primitive type, cannot be correctly handled by SelectionDAG. The detailed operation of the pass is described in comments, but, in summary, the removal proceeds by: 1. Rewriting loads and stores of ptr addrspace(7) to loads and stores of i160 (including vectors and aggregates). This is needed because the in-register representation of these pointers will stop matching their in-memory representation in step 2, and so ptrtoint/inttoptr operations are used to preserve the expected memory layout 2. Mutating the IR to replace all occurrences of `ptr addrspace(7)` with the type `{ptr addrspace(8), ptr addrspace(6) }`, which makes the two parts of a buffer fat pointer (the 128-bit address space 8 resource and the 32-bit address space 6 offset) visible in the IR. This also impacts the argument and return types of functions. 3. *Splitting* the resource and offset parts. All instructions that produce or consume buffer fat pointers (like GEP or load) are rewritten to produce or consume the resource and offset parts separately. For example, GEP updates the offset part of the result and a load uses the resource and offset parts to populate the relevant llvm.amdgcn.raw.ptr.buffer.load intrinsic call. At the end of this process, the original mutated instructions are replaced by their new split counterparts, ensuring no invalidly-typed IR escapes this pass. (For operations like call, where the struct form is needed, insertelement operations are inserted). Compared to LGC's PatchBufferOp ( https://github.com/GPUOpen-Drivers/llpc/blob/32cda89776980202597d5bf4ed4447a1bae64047/lgc/patch/PatchBufferOp.cpp ): this pass - Also handles vectors of ptr addrspace(7)s - Also handles function boundaries - Includes the same uniform buffer optimization for loops and conditionals - Does *not* handle memcpy() and friends (this is future work) - Does *not* break up large loads and stores into smaller parts. This should be handled by extending the legalization of *.buffer.{load,store} to handle larger types by producing multiple instructions (the same way ordinary LOAD and STORE are legalized). That work is planned for a followup commit. - Does *not* have special logic for handling divergent buffer descriptors. The logic in LGC is, as far as I can tell, incorrect in general, and, per discussions with @nhaehnle, isn't widely used. Therefore, divergent descriptors are handled with waterfall loops later in legalization. As a final matter, this commit updates atomic expansion to treat buffer operations analogously to global ones. (One question for reviewers: is the new pass is the right place? Should it be later in the pipeline?) Differential Revision: https://reviews.llvm.org/D158463
1 parent 3711329 commit 6540f16

16 files changed

+3898
-14
lines changed

llvm/lib/Target/AMDGPU/AMDGPU.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ FunctionPass *createAMDGPUMachineCFGStructurizerPass();
5959
FunctionPass *createAMDGPURewriteOutArgumentsPass();
6060
ModulePass *
6161
createAMDGPULowerModuleLDSLegacyPass(const AMDGPUTargetMachine *TM = nullptr);
62+
ModulePass *createAMDGPULowerBufferFatPointersPass();
6263
FunctionPass *createSIModeRegisterPass();
6364
FunctionPass *createGCNPreRAOptimizationsPass();
6465

@@ -136,6 +137,18 @@ struct AMDGPULowerModuleLDSPass : PassInfoMixin<AMDGPULowerModuleLDSPass> {
136137
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
137138
};
138139

140+
void initializeAMDGPULowerBufferFatPointersPass(PassRegistry &);
141+
extern char &AMDGPULowerBufferFatPointersID;
142+
143+
struct AMDGPULowerBufferFatPointersPass
144+
: PassInfoMixin<AMDGPULowerBufferFatPointersPass> {
145+
AMDGPULowerBufferFatPointersPass(const TargetMachine &TM) : TM(TM) {}
146+
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
147+
148+
private:
149+
const TargetMachine &TM;
150+
};
151+
139152
void initializeAMDGPURewriteOutArgumentsPass(PassRegistry &);
140153
extern char &AMDGPURewriteOutArgumentsID;
141154

llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp

Lines changed: 2012 additions & 0 deletions
Large diffs are not rendered by default.

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
#include "TargetInfo/AMDGPUTargetInfo.h"
3535
#include "Utils/AMDGPUBaseInfo.h"
3636
#include "llvm/Analysis/CGSCCPassManager.h"
37+
#include "llvm/Analysis/CallGraphSCCPass.h"
3738
#include "llvm/CodeGen/GlobalISel/CSEInfo.h"
3839
#include "llvm/CodeGen/GlobalISel/IRTranslator.h"
3940
#include "llvm/CodeGen/GlobalISel/InstructionSelect.h"
@@ -420,6 +421,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
420421
initializeAMDGPULateCodeGenPreparePass(*PR);
421422
initializeAMDGPURemoveIncompatibleFunctionsPass(*PR);
422423
initializeAMDGPULowerModuleLDSLegacyPass(*PR);
424+
initializeAMDGPULowerBufferFatPointersPass(*PR);
423425
initializeAMDGPURewriteOutArgumentsPass(*PR);
424426
initializeAMDGPURewriteUndefForPHILegacyPass(*PR);
425427
initializeAMDGPUUnifyMetadataPass(*PR);
@@ -654,6 +656,10 @@ void AMDGPUTargetMachine::registerPassBuilderCallbacks(
654656
PM.addPass(AMDGPULowerModuleLDSPass(*this));
655657
return true;
656658
}
659+
if (PassName == "amdgpu-lower-buffer-fat-pointers") {
660+
PM.addPass(AMDGPULowerBufferFatPointersPass(*this));
661+
return true;
662+
}
657663
if (PassName == "amdgpu-lower-ctor-dtor") {
658664
PM.addPass(AMDGPUCtorDtorLoweringPass());
659665
return true;
@@ -1121,6 +1127,29 @@ void AMDGPUPassConfig::addCodeGenPrepare() {
11211127
EnableLowerKernelArguments)
11221128
addPass(createAMDGPULowerKernelArgumentsPass());
11231129

1130+
if (TM->getTargetTriple().getArch() == Triple::amdgcn) {
1131+
// This lowering has been placed after codegenprepare to take advantage of
1132+
// address mode matching (which is why it isn't put with the LDS lowerings).
1133+
// It could be placed anywhere before uniformity annotations (an analysis
1134+
// that it changes by splitting up fat pointers into their components)
1135+
// but has been put before switch lowering and CFG flattening so that those
1136+
// passes can run on the more optimized control flow this pass creates in
1137+
// many cases.
1138+
//
1139+
// FIXME: This should ideally be put after the LoadStoreVectorizer.
1140+
// However, due to some annoying facts about ResourceUsageAnalysis,
1141+
// (especially as exercised in the resource-usage-dead-function test),
1142+
// we need all the function passes codegenprepare all the way through
1143+
// said resource usage analysis to run on the call graph produced
1144+
// before codegenprepare runs (because codegenprepare will knock some
1145+
// nodes out of the graph, which leads to function-level passes not
1146+
// being run on them, which causes crashes in the resource usage analysis).
1147+
addPass(createAMDGPULowerBufferFatPointersPass());
1148+
// In accordance with the above FIXME, manually force all the
1149+
// function-level passes into a CGSCCPassManager.
1150+
addPass(new DummyCGSCCPass());
1151+
}
1152+
11241153
TargetPassConfig::addCodeGenPrepare();
11251154

11261155
if (isPassEnabled(EnableLoadStoreVectorizer))

llvm/lib/Target/AMDGPU/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@ add_llvm_target(AMDGPUCodeGen
6969
AMDGPULibCalls.cpp
7070
AMDGPUImageIntrinsicOptimizer.cpp
7171
AMDGPULibFunc.cpp
72+
AMDGPULowerBufferFatPointers.cpp
7273
AMDGPULowerKernelArguments.cpp
7374
AMDGPULowerKernelAttributes.cpp
7475
AMDGPULowerModuleLDSPass.cpp

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16013,7 +16013,8 @@ SITargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const {
1601316013
if (!Ty->isFloatTy() && (!Subtarget->hasGFX90AInsts() || !Ty->isDoubleTy()))
1601416014
return AtomicExpansionKind::CmpXChg;
1601516015

16016-
if (AMDGPU::isFlatGlobalAddrSpace(AS) &&
16016+
if ((AMDGPU::isFlatGlobalAddrSpace(AS) ||
16017+
AS == AMDGPUAS::BUFFER_FAT_POINTER) &&
1601716018
Subtarget->hasAtomicFaddNoRtnInsts()) {
1601816019
if (Subtarget->hasGFX940Insts())
1601916020
return AtomicExpansionKind::None;
@@ -16025,11 +16026,13 @@ SITargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const {
1602516026
if (HasSystemScope)
1602616027
return AtomicExpansionKind::CmpXChg;
1602716028

16028-
if (AS == AMDGPUAS::GLOBAL_ADDRESS && Ty->isFloatTy()) {
16029-
// global atomic fadd f32 no-rtn: gfx908, gfx90a, gfx940, gfx11+.
16029+
if ((AS == AMDGPUAS::GLOBAL_ADDRESS ||
16030+
AS == AMDGPUAS::BUFFER_FAT_POINTER) &&
16031+
Ty->isFloatTy()) {
16032+
// global/buffer atomic fadd f32 no-rtn: gfx908, gfx90a, gfx940, gfx11+.
1603016033
if (RMW->use_empty() && Subtarget->hasAtomicFaddNoRtnInsts())
1603116034
return ReportUnsafeHWInst(AtomicExpansionKind::None);
16032-
// global atomic fadd f32 rtn: gfx90a, gfx940, gfx11+.
16035+
// global/buffer atomic fadd f32 rtn: gfx90a, gfx940, gfx11+.
1603316036
if (!RMW->use_empty() && Subtarget->hasAtomicFaddRtnInsts())
1603416037
return ReportUnsafeHWInst(AtomicExpansionKind::None);
1603516038
}
@@ -16084,7 +16087,8 @@ SITargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const {
1608416087
case AtomicRMWInst::Max:
1608516088
case AtomicRMWInst::UMin:
1608616089
case AtomicRMWInst::UMax: {
16087-
if (AMDGPU::isFlatGlobalAddrSpace(AS)) {
16090+
if (AMDGPU::isFlatGlobalAddrSpace(AS) ||
16091+
AS == AMDGPUAS::BUFFER_FAT_POINTER) {
1608816092
if (RMW->getType()->isFloatTy() &&
1608916093
unsafeFPAtomicsDisabled(RMW->getFunction()))
1609016094
return AtomicExpansionKind::CmpXChg;
Lines changed: 63 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,73 @@
1-
; RUN: not --crash llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -o - -stop-after=irtranslator < %s
2-
; REQUIRES: asserts
3-
4-
; Confirm that no one's gotten vectors of addrspace(7) pointers to go through the
5-
; IR translater incidentally.
1+
; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2
2+
; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -o - -stop-after=irtranslator < %s | FileCheck %s
63

74
define <2 x ptr addrspace(7)> @no_auto_constfold_gep_vector() {
5+
; CHECK-LABEL: name: no_auto_constfold_gep_vector
6+
; CHECK: bb.1 (%ir-block.0):
7+
; CHECK-NEXT: [[C:%[0-9]+]]:_(p8) = G_CONSTANT i128 0
8+
; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x p8>) = G_BUILD_VECTOR [[C]](p8), [[C]](p8)
9+
; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 123
10+
; CHECK-NEXT: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[C1]](s32), [[C1]](s32)
11+
; CHECK-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32), [[UV4:%[0-9]+]]:_(s32), [[UV5:%[0-9]+]]:_(s32), [[UV6:%[0-9]+]]:_(s32), [[UV7:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[BUILD_VECTOR]](<2 x p8>)
12+
; CHECK-NEXT: [[UV8:%[0-9]+]]:_(s32), [[UV9:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[BUILD_VECTOR1]](<2 x s32>)
13+
; CHECK-NEXT: $vgpr0 = COPY [[UV]](s32)
14+
; CHECK-NEXT: $vgpr1 = COPY [[UV1]](s32)
15+
; CHECK-NEXT: $vgpr2 = COPY [[UV2]](s32)
16+
; CHECK-NEXT: $vgpr3 = COPY [[UV3]](s32)
17+
; CHECK-NEXT: $vgpr4 = COPY [[UV4]](s32)
18+
; CHECK-NEXT: $vgpr5 = COPY [[UV5]](s32)
19+
; CHECK-NEXT: $vgpr6 = COPY [[UV6]](s32)
20+
; CHECK-NEXT: $vgpr7 = COPY [[UV7]](s32)
21+
; CHECK-NEXT: $vgpr8 = COPY [[UV8]](s32)
22+
; CHECK-NEXT: $vgpr9 = COPY [[UV9]](s32)
23+
; CHECK-NEXT: SI_RETURN implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3, implicit $vgpr4, implicit $vgpr5, implicit $vgpr6, implicit $vgpr7, implicit $vgpr8, implicit $vgpr9
824
%gep = getelementptr i8, <2 x ptr addrspace(7)> zeroinitializer, <2 x i32> <i32 123, i32 123>
925
ret <2 x ptr addrspace(7)> %gep
1026
}
1127

1228
define <2 x ptr addrspace(7)> @gep_vector_splat(<2 x ptr addrspace(7)> %ptrs, i64 %idx) {
29+
; CHECK-LABEL: name: gep_vector_splat
30+
; CHECK: bb.1 (%ir-block.0):
31+
; CHECK-NEXT: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8, $vgpr9, $vgpr10, $vgpr11
32+
; CHECK-NEXT: {{ $}}
33+
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
34+
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
35+
; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
36+
; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr3
37+
; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $vgpr4
38+
; CHECK-NEXT: [[COPY5:%[0-9]+]]:_(s32) = COPY $vgpr5
39+
; CHECK-NEXT: [[COPY6:%[0-9]+]]:_(s32) = COPY $vgpr6
40+
; CHECK-NEXT: [[COPY7:%[0-9]+]]:_(s32) = COPY $vgpr7
41+
; CHECK-NEXT: [[MV:%[0-9]+]]:_(p8) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32)
42+
; CHECK-NEXT: [[MV1:%[0-9]+]]:_(p8) = G_MERGE_VALUES [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
43+
; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x p8>) = G_BUILD_VECTOR [[MV]](p8), [[MV1]](p8)
44+
; CHECK-NEXT: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr8
45+
; CHECK-NEXT: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr9
46+
; CHECK-NEXT: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32)
47+
; CHECK-NEXT: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr10
48+
; CHECK-NEXT: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr11
49+
; CHECK-NEXT: [[MV2:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY10]](s32), [[COPY11]](s32)
50+
; CHECK-NEXT: [[DEF:%[0-9]+]]:_(<2 x s64>) = G_IMPLICIT_DEF
51+
; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0
52+
; CHECK-NEXT: [[DEF1:%[0-9]+]]:_(<2 x p8>) = G_IMPLICIT_DEF
53+
; CHECK-NEXT: [[DEF2:%[0-9]+]]:_(<2 x s32>) = G_IMPLICIT_DEF
54+
; CHECK-NEXT: [[IVEC:%[0-9]+]]:_(<2 x s64>) = G_INSERT_VECTOR_ELT [[DEF]], [[MV2]](s64), [[C]](s64)
55+
; CHECK-NEXT: [[SHUF:%[0-9]+]]:_(<2 x s64>) = G_SHUFFLE_VECTOR [[IVEC]](<2 x s64>), [[DEF]], shufflemask(0, 0)
56+
; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(<2 x s32>) = G_TRUNC [[SHUF]](<2 x s64>)
57+
; CHECK-NEXT: [[ADD:%[0-9]+]]:_(<2 x s32>) = G_ADD [[BUILD_VECTOR1]], [[TRUNC]]
58+
; CHECK-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32), [[UV4:%[0-9]+]]:_(s32), [[UV5:%[0-9]+]]:_(s32), [[UV6:%[0-9]+]]:_(s32), [[UV7:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[BUILD_VECTOR]](<2 x p8>)
59+
; CHECK-NEXT: [[UV8:%[0-9]+]]:_(s32), [[UV9:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[ADD]](<2 x s32>)
60+
; CHECK-NEXT: $vgpr0 = COPY [[UV]](s32)
61+
; CHECK-NEXT: $vgpr1 = COPY [[UV1]](s32)
62+
; CHECK-NEXT: $vgpr2 = COPY [[UV2]](s32)
63+
; CHECK-NEXT: $vgpr3 = COPY [[UV3]](s32)
64+
; CHECK-NEXT: $vgpr4 = COPY [[UV4]](s32)
65+
; CHECK-NEXT: $vgpr5 = COPY [[UV5]](s32)
66+
; CHECK-NEXT: $vgpr6 = COPY [[UV6]](s32)
67+
; CHECK-NEXT: $vgpr7 = COPY [[UV7]](s32)
68+
; CHECK-NEXT: $vgpr8 = COPY [[UV8]](s32)
69+
; CHECK-NEXT: $vgpr9 = COPY [[UV9]](s32)
70+
; CHECK-NEXT: SI_RETURN implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3, implicit $vgpr4, implicit $vgpr5, implicit $vgpr6, implicit $vgpr7, implicit $vgpr8, implicit $vgpr9
1371
%gep = getelementptr i8, <2 x ptr addrspace(7)> %ptrs, i64 %idx
1472
ret <2 x ptr addrspace(7)> %gep
1573
}

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-non-integral-address-spaces.ll

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,14 @@
55
define ptr addrspace(7) @no_auto_constfold_gep() {
66
; CHECK-LABEL: name: no_auto_constfold_gep
77
; CHECK: bb.1 (%ir-block.0):
8-
; CHECK-NEXT: [[C:%[0-9]+]]:_(p7) = G_CONSTANT i160 0
8+
; CHECK-NEXT: [[C:%[0-9]+]]:_(p8) = G_CONSTANT i128 0
99
; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 123
10-
; CHECK-NEXT: [[PTR_ADD:%[0-9]+]]:_(p7) = G_PTR_ADD [[C]], [[C1]](s32)
11-
; CHECK-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32), [[UV4:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[PTR_ADD]](p7)
10+
; CHECK-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[C]](p8)
1211
; CHECK-NEXT: $vgpr0 = COPY [[UV]](s32)
1312
; CHECK-NEXT: $vgpr1 = COPY [[UV1]](s32)
1413
; CHECK-NEXT: $vgpr2 = COPY [[UV2]](s32)
1514
; CHECK-NEXT: $vgpr3 = COPY [[UV3]](s32)
16-
; CHECK-NEXT: $vgpr4 = COPY [[UV4]](s32)
15+
; CHECK-NEXT: $vgpr4 = COPY [[C1]](s32)
1716
; CHECK-NEXT: SI_RETURN implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3, implicit $vgpr4
1817
%gep = getelementptr i8, ptr addrspace(7) null, i32 123
1918
ret ptr addrspace(7) %gep

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,11 @@
5151
; GCN-O0-NEXT: AMDGPU Annotate Kernel Features
5252
; GCN-O0-NEXT: FunctionPass Manager
5353
; GCN-O0-NEXT: AMDGPU Lower Kernel Arguments
54+
; GCN-O0-NEXT: Lower buffer fat pointer operations to buffer resources
55+
; GCN-O0-NEXT: CallGraph Construction
56+
; GCN-O0-NEXT: Call Graph SCC Pass Manager
57+
; GCN-O0-NEXT: DummyCGSCCPass
58+
; GCN-O0-NEXT: FunctionPass Manager
5459
; GCN-O0-NEXT: Lazy Value Information Analysis
5560
; GCN-O0-NEXT: Lower SwitchInst's to branches
5661
; GCN-O0-NEXT: Lower invoke and unwind, for unwindless code generators
@@ -229,6 +234,11 @@
229234
; GCN-O1-NEXT: AMDGPU Annotate Kernel Features
230235
; GCN-O1-NEXT: FunctionPass Manager
231236
; GCN-O1-NEXT: AMDGPU Lower Kernel Arguments
237+
; GCN-O1-NEXT: Lower buffer fat pointer operations to buffer resources
238+
; GCN-O1-NEXT: CallGraph Construction
239+
; GCN-O1-NEXT: Call Graph SCC Pass Manager
240+
; GCN-O1-NEXT: DummyCGSCCPass
241+
; GCN-O1-NEXT: FunctionPass Manager
232242
; GCN-O1-NEXT: Dominator Tree Construction
233243
; GCN-O1-NEXT: Natural Loop Information
234244
; GCN-O1-NEXT: CodeGen Prepare
@@ -513,6 +523,11 @@
513523
; GCN-O1-OPTS-NEXT: AMDGPU Annotate Kernel Features
514524
; GCN-O1-OPTS-NEXT: FunctionPass Manager
515525
; GCN-O1-OPTS-NEXT: AMDGPU Lower Kernel Arguments
526+
; GCN-O1-OPTS-NEXT: Lower buffer fat pointer operations to buffer resources
527+
; GCN-O1-OPTS-NEXT: CallGraph Construction
528+
; GCN-O1-OPTS-NEXT: Call Graph SCC Pass Manager
529+
; GCN-O1-OPTS-NEXT: DummyCGSCCPass
530+
; GCN-O1-OPTS-NEXT: FunctionPass Manager
516531
; GCN-O1-OPTS-NEXT: Dominator Tree Construction
517532
; GCN-O1-OPTS-NEXT: Natural Loop Information
518533
; GCN-O1-OPTS-NEXT: CodeGen Prepare
@@ -815,6 +830,11 @@
815830
; GCN-O2-NEXT: AMDGPU Annotate Kernel Features
816831
; GCN-O2-NEXT: FunctionPass Manager
817832
; GCN-O2-NEXT: AMDGPU Lower Kernel Arguments
833+
; GCN-O2-NEXT: Lower buffer fat pointer operations to buffer resources
834+
; GCN-O2-NEXT: CallGraph Construction
835+
; GCN-O2-NEXT: Call Graph SCC Pass Manager
836+
; GCN-O2-NEXT: DummyCGSCCPass
837+
; GCN-O2-NEXT: FunctionPass Manager
818838
; GCN-O2-NEXT: Dominator Tree Construction
819839
; GCN-O2-NEXT: Natural Loop Information
820840
; GCN-O2-NEXT: CodeGen Prepare
@@ -1131,6 +1151,11 @@
11311151
; GCN-O3-NEXT: AMDGPU Annotate Kernel Features
11321152
; GCN-O3-NEXT: FunctionPass Manager
11331153
; GCN-O3-NEXT: AMDGPU Lower Kernel Arguments
1154+
; GCN-O3-NEXT: Lower buffer fat pointer operations to buffer resources
1155+
; GCN-O3-NEXT: CallGraph Construction
1156+
; GCN-O3-NEXT: Call Graph SCC Pass Manager
1157+
; GCN-O3-NEXT: DummyCGSCCPass
1158+
; GCN-O3-NEXT: FunctionPass Manager
11341159
; GCN-O3-NEXT: Dominator Tree Construction
11351160
; GCN-O3-NEXT: Natural Loop Information
11361161
; GCN-O3-NEXT: CodeGen Prepare

0 commit comments

Comments
 (0)