Skip to content

Commit 1cb0e20

Browse files
committed
[AMDGPU] Strengthen preload intrinsics to noundef and nonnull
The various preloaded registers (workitem IDs, workgroup IDs, various implicit pointers) always have a finite, inviriant, well-defined value throughout a well-defined program. In cases where the compiler infers or the user declares that some implicit input will not be used (ex. via amdgcn-no-workitem-id-y), the behavior of the entire program is undefined, since that misdeclaration can cause arbitrary other preloaded-register intrinsics to access the wrong register. This case is not expected to arise in practice, but could occur when the no implicit argument attributes were not cleared correctly in the prenence of external functions, indrect calls, or other means of executing un-analyzable code. Failure to detect that case would be a bug in the attributor. This commit updates the documentation to reflect this long-standing reality. Then, on the basis that all implicit arguments are defined in all correct programs, the intrinsics that return those values are annototated with `noundef` and, in the case of implicit pointers, `nonnull`. This will prevent spurious calls to `freeze` in front-end optimizations that destroy user-provided ranges on built-in IDs. (While I'm here, this commit adds a test for `noundef` on kernel arguments which is currently unimplemented)
1 parent 83de21d commit 1cb0e20

File tree

7 files changed

+269
-239
lines changed

7 files changed

+269
-239
lines changed

llvm/docs/AMDGPUUsage.rst

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1395,8 +1395,11 @@ The AMDGPU backend supports the following LLVM IR attributes.
13951395

13961396
"amdgpu-no-workitem-id-x" Indicates the function does not depend on the value of the
13971397
llvm.amdgcn.workitem.id.x intrinsic. If a function is marked with this
1398-
attribute, or reached through a call site marked with this attribute,
1399-
the value returned by the intrinsic is undefined. The backend can
1398+
attribute, or reached through a call site marked with this attribute, and
1399+
that intrinsic is called, the behavior of the program is undefined. (Whole-program
1400+
undefined behavior is used here because, for example, the absence of a required workitem
1401+
ID in the preloaded register set can mean that all other preloaded registers
1402+
are earlier than the compilation assumed they would be.) The backend can
14001403
generally infer this during code generation, so typically there is no
14011404
benefit to frontends marking functions with this.
14021405

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

Lines changed: 27 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,16 @@
1212

1313
def global_ptr_ty : LLVMQualPointerType<1>;
1414

15+
// The amdgpu-no-* attributes (ex amdgpu-no-workitem-id-z) typically inferred
16+
// by the backend cause whole-program undefined behavior when violated, such as
17+
// by causing all other preload register intrinsics to return arbitrarily incorrect
18+
// values. Outside of such IR-level UB, these preloaded registers are always set
19+
// to a well-defined value and are thus `noundef`.
1520
class AMDGPUReadPreloadRegisterIntrinsic
16-
: DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
21+
: DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;
1722

1823
class AMDGPUReadPreloadRegisterIntrinsicNamed<string name>
19-
: DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>, ClangBuiltin<name>;
24+
: DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>, ClangBuiltin<name>;
2025

2126
// Used to tag image and resource intrinsics with information used to generate
2227
// mem operands.
@@ -56,7 +61,7 @@ def int_r600_group_barrier : ClangBuiltin<"__builtin_r600_group_barrier">,
5661
def int_r600_implicitarg_ptr :
5762
ClangBuiltin<"__builtin_r600_implicitarg_ptr">,
5863
DefaultAttrsIntrinsic<[LLVMQualPointerType<7>], [],
59-
[IntrNoMem, IntrSpeculatable]>;
64+
[NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;
6065

6166
def int_r600_rat_store_typed :
6267
// 1st parameter: Data
@@ -144,39 +149,44 @@ defm int_amdgcn_workgroup_id : AMDGPUReadPreloadRegisterIntrinsic_xyz_named
144149

145150
def int_amdgcn_dispatch_ptr :
146151
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
147-
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
152+
[Align<RetIndex, 4>, NoUndef<RetIndex>, NonNull<RetIndex>, IntrNoMem, IntrSpeculatable]>;
148153

149154
def int_amdgcn_queue_ptr :
150155
ClangBuiltin<"__builtin_amdgcn_queue_ptr">,
151156
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
152-
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
157+
[Align<RetIndex, 4>, NoUndef<RetIndex>, NonNull<RetIndex>, IntrNoMem, IntrSpeculatable]>;
153158

154159
def int_amdgcn_kernarg_segment_ptr :
155160
ClangBuiltin<"__builtin_amdgcn_kernarg_segment_ptr">,
156161
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
157-
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
162+
[Align<RetIndex, 4>, NoUndef<RetIndex>, NonNull<RetIndex>, IntrNoMem, IntrSpeculatable]>;
158163

159164
def int_amdgcn_implicitarg_ptr :
160165
ClangBuiltin<"__builtin_amdgcn_implicitarg_ptr">,
161166
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
162-
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
167+
[Align<RetIndex, 4>, NoUndef<RetIndex>, NonNull<RetIndex>, IntrNoMem, IntrSpeculatable]>;
163168

169+
// Returns the amount of LDS statically allocated for this program.
170+
// This is no longer guaranteed to be a compile-time constant due to linking
171+
// support.
164172
def int_amdgcn_groupstaticsize :
165173
ClangBuiltin<"__builtin_amdgcn_groupstaticsize">,
166-
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
174+
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;
167175

168176
def int_amdgcn_dispatch_id :
169177
ClangBuiltin<"__builtin_amdgcn_dispatch_id">,
170-
DefaultAttrsIntrinsic<[llvm_i64_ty], [], [IntrNoMem, IntrSpeculatable]>;
178+
DefaultAttrsIntrinsic<[llvm_i64_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;
171179

172180
// For internal use. Coordinates LDS lowering between IR transform and backend.
173181
def int_amdgcn_lds_kernel_id :
174-
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
182+
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;
175183

176184
def int_amdgcn_implicit_buffer_ptr :
177185
ClangBuiltin<"__builtin_amdgcn_implicit_buffer_ptr">,
178186
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
179-
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
187+
[Align<RetIndex, 4>, Dereferenceable<RetIndex, 16>,
188+
NoUndef<RetIndex>, NonNull<RetIndex>,
189+
IntrNoMem, IntrSpeculatable]>;
180190

181191
// Set EXEC to the 64-bit value given.
182192
// This is always moved to the beginning of the basic block.
@@ -199,7 +209,7 @@ def int_amdgcn_init_exec_from_input : Intrinsic<[],
199209

200210
def int_amdgcn_wavefrontsize :
201211
ClangBuiltin<"__builtin_amdgcn_wavefrontsize">,
202-
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
212+
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;
203213

204214
// Represent a relocation constant.
205215
def int_amdgcn_reloc_constant : DefaultAttrsIntrinsic<
@@ -1923,8 +1933,8 @@ def int_amdgcn_s_setreg :
19231933
// s_getpc_b64 instruction returns a zero-extended value.
19241934
def int_amdgcn_s_getpc :
19251935
ClangBuiltin<"__builtin_amdgcn_s_getpc">,
1926-
DefaultAttrsIntrinsic<[llvm_i64_ty], [], [IntrNoMem, IntrSpeculatable,
1927-
IntrWillReturn]>;
1936+
DefaultAttrsIntrinsic<[llvm_i64_ty], [], [NoUndef<RetIndex>, IntrNoMem,
1937+
IntrSpeculatable, IntrWillReturn]>;
19281938

19291939
// __builtin_amdgcn_interp_mov <param>, <attr_chan>, <attr>, <m0>
19301940
// param values: 0 = P10, 1 = P20, 2 = P0
@@ -2044,7 +2054,7 @@ def int_amdgcn_ps_live : DefaultAttrsIntrinsic <
20442054
// Query currently live lanes.
20452055
// Returns true if lane is live (and not a helper lane).
20462056
def int_amdgcn_live_mask : DefaultAttrsIntrinsic <[llvm_i1_ty],
2047-
[], [IntrReadMem, IntrInaccessibleMemOnly]
2057+
[], [NoUndef<RetIndex>, IntrReadMem, IntrInaccessibleMemOnly]
20482058
>;
20492059

20502060
def int_amdgcn_mbcnt_lo :
@@ -2517,7 +2527,7 @@ def int_amdgcn_mov_dpp8 :
25172527
def int_amdgcn_s_get_waveid_in_workgroup :
25182528
ClangBuiltin<"__builtin_amdgcn_s_get_waveid_in_workgroup">,
25192529
Intrinsic<[llvm_i32_ty], [],
2520-
[IntrNoMem, IntrHasSideEffects, IntrWillReturn, IntrNoCallback, IntrNoFree]>;
2530+
[NoUndef<RetIndex>, IntrNoMem, IntrHasSideEffects, IntrWillReturn, IntrNoCallback, IntrNoFree]>;
25212531

25222532
class AMDGPUAtomicRtn<LLVMType vt, LLVMType pt = llvm_anyptr_ty> : Intrinsic <
25232533
[vt],
@@ -2751,7 +2761,7 @@ def int_amdgcn_global_load_tr_b128 : AMDGPULoadIntrinsic<global_ptr_ty>;
27512761

27522762
// i32 @llvm.amdgcn.wave.id()
27532763
def int_amdgcn_wave_id :
2754-
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
2764+
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;
27552765

27562766
//===----------------------------------------------------------------------===//
27572767
// Deep learning intrinsics.

llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -120,8 +120,6 @@ static bool lowerKernelArguments(Function &F, const TargetMachine &TM) {
120120
CallInst *KernArgSegment =
121121
Builder.CreateIntrinsic(Intrinsic::amdgcn_kernarg_segment_ptr, {}, {},
122122
nullptr, F.getName() + ".kernarg.segment");
123-
124-
KernArgSegment->addRetAttr(Attribute::NonNull);
125123
KernArgSegment->addRetAttr(
126124
Attribute::getWithDereferenceableBytes(Ctx, TotalKernArgSize));
127125

0 commit comments

Comments
 (0)