Skip to content

[AMDGPU] Strengthen preload intrinsics to noundef and nonnull #92801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions llvm/docs/AMDGPUUsage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1395,8 +1395,11 @@ The AMDGPU backend supports the following LLVM IR attributes.

"amdgpu-no-workitem-id-x" Indicates the function does not depend on the value of the
llvm.amdgcn.workitem.id.x intrinsic. If a function is marked with this
attribute, or reached through a call site marked with this attribute,
the value returned by the intrinsic is undefined. The backend can
attribute, or reached through a call site marked with this attribute, and
that intrinsic is called, the behavior of the program is undefined. (Whole-program
undefined behavior is used here because, for example, the absence of a required workitem
ID in the preloaded register set can mean that all other preloaded registers
are earlier than the compilation assumed they would be.) The backend can
generally infer this during code generation, so typically there is no
benefit to frontends marking functions with this.

Expand Down
45 changes: 28 additions & 17 deletions llvm/include/llvm/IR/IntrinsicsAMDGPU.td
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,18 @@

def global_ptr_ty : LLVMQualPointerType<1>;

// The amdgpu-no-* attributes (ex amdgpu-no-workitem-id-z) typically inferred
// by the backend cause whole-program undefined behavior when violated, such as
// by causing all other preload register intrinsics to return arbitrarily incorrect
// values. In non-entry-point functions, attempting to call a function that needs
// some preloaded register from a function that is known to not need it is a violation
// of the calling convention and also program-level UB. Outside of such IR-level UB,
// these preloaded registers are always set to a well-defined value and are thus `noundef`.
class AMDGPUReadPreloadRegisterIntrinsic
: DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
: DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;

class AMDGPUReadPreloadRegisterIntrinsicNamed<string name>
: DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>, ClangBuiltin<name>;
: DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>, ClangBuiltin<name>;

// Used to tag image and resource intrinsics with information used to generate
// mem operands.
Expand Down Expand Up @@ -56,7 +63,7 @@ def int_r600_group_barrier : ClangBuiltin<"__builtin_r600_group_barrier">,
def int_r600_implicitarg_ptr :
ClangBuiltin<"__builtin_r600_implicitarg_ptr">,
DefaultAttrsIntrinsic<[LLVMQualPointerType<7>], [],
[IntrNoMem, IntrSpeculatable]>;
[NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;

def int_r600_rat_store_typed :
// 1st parameter: Data
Expand Down Expand Up @@ -144,39 +151,43 @@ defm int_amdgcn_workgroup_id : AMDGPUReadPreloadRegisterIntrinsic_xyz_named

def int_amdgcn_dispatch_ptr :
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
[Align<RetIndex, 4>, NoUndef<RetIndex>, NonNull<RetIndex>, IntrNoMem, IntrSpeculatable]>;

def int_amdgcn_queue_ptr :
ClangBuiltin<"__builtin_amdgcn_queue_ptr">,
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
[Align<RetIndex, 4>, NoUndef<RetIndex>, NonNull<RetIndex>, IntrNoMem, IntrSpeculatable]>;

def int_amdgcn_kernarg_segment_ptr :
ClangBuiltin<"__builtin_amdgcn_kernarg_segment_ptr">,
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
[Align<RetIndex, 4>, NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;

def int_amdgcn_implicitarg_ptr :
ClangBuiltin<"__builtin_amdgcn_implicitarg_ptr">,
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
[Align<RetIndex, 4>, NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;

// Returns the amount of LDS statically allocated for this program.
// This is no longer guaranteed to be a compile-time constant due to linking
// support.
def int_amdgcn_groupstaticsize :
ClangBuiltin<"__builtin_amdgcn_groupstaticsize">,
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;

def int_amdgcn_dispatch_id :
ClangBuiltin<"__builtin_amdgcn_dispatch_id">,
DefaultAttrsIntrinsic<[llvm_i64_ty], [], [IntrNoMem, IntrSpeculatable]>;
DefaultAttrsIntrinsic<[llvm_i64_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;

// For internal use. Coordinates LDS lowering between IR transform and backend.
def int_amdgcn_lds_kernel_id :
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;

def int_amdgcn_implicit_buffer_ptr :
ClangBuiltin<"__builtin_amdgcn_implicit_buffer_ptr">,
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
[Align<RetIndex, 4>, NoUndef<RetIndex>,
IntrNoMem, IntrSpeculatable]>;

// Set EXEC to the 64-bit value given.
// This is always moved to the beginning of the basic block.
Expand All @@ -199,7 +210,7 @@ def int_amdgcn_init_exec_from_input : Intrinsic<[],

def int_amdgcn_wavefrontsize :
ClangBuiltin<"__builtin_amdgcn_wavefrontsize">,
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;

// Represent a relocation constant.
def int_amdgcn_reloc_constant : DefaultAttrsIntrinsic<
Expand Down Expand Up @@ -1923,8 +1934,8 @@ def int_amdgcn_s_setreg :
// s_getpc_b64 instruction returns a zero-extended value.
def int_amdgcn_s_getpc :
ClangBuiltin<"__builtin_amdgcn_s_getpc">,
DefaultAttrsIntrinsic<[llvm_i64_ty], [], [IntrNoMem, IntrSpeculatable,
IntrWillReturn]>;
DefaultAttrsIntrinsic<[llvm_i64_ty], [], [NoUndef<RetIndex>, IntrNoMem,
IntrSpeculatable, IntrWillReturn]>;

// __builtin_amdgcn_interp_mov <param>, <attr_chan>, <attr>, <m0>
// param values: 0 = P10, 1 = P20, 2 = P0
Expand Down Expand Up @@ -2044,7 +2055,7 @@ def int_amdgcn_ps_live : DefaultAttrsIntrinsic <
// Query currently live lanes.
// Returns true if lane is live (and not a helper lane).
def int_amdgcn_live_mask : DefaultAttrsIntrinsic <[llvm_i1_ty],
[], [IntrReadMem, IntrInaccessibleMemOnly]
[], [NoUndef<RetIndex>, IntrReadMem, IntrInaccessibleMemOnly]
>;

def int_amdgcn_mbcnt_lo :
Expand Down Expand Up @@ -2517,7 +2528,7 @@ def int_amdgcn_mov_dpp8 :
def int_amdgcn_s_get_waveid_in_workgroup :
ClangBuiltin<"__builtin_amdgcn_s_get_waveid_in_workgroup">,
Intrinsic<[llvm_i32_ty], [],
[IntrNoMem, IntrHasSideEffects, IntrWillReturn, IntrNoCallback, IntrNoFree]>;
[NoUndef<RetIndex>, IntrNoMem, IntrHasSideEffects, IntrWillReturn, IntrNoCallback, IntrNoFree]>;

class AMDGPUAtomicRtn<LLVMType vt, LLVMType pt = llvm_anyptr_ty> : Intrinsic <
[vt],
Expand Down Expand Up @@ -2751,7 +2762,7 @@ def int_amdgcn_global_load_tr_b128 : AMDGPULoadIntrinsic<global_ptr_ty>;

// i32 @llvm.amdgcn.wave.id()
def int_amdgcn_wave_id :
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;

//===----------------------------------------------------------------------===//
// Deep learning intrinsics.
Expand Down
1 change: 0 additions & 1 deletion llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,6 @@ static bool lowerKernelArguments(Function &F, const TargetMachine &TM) {
CallInst *KernArgSegment =
Builder.CreateIntrinsic(Intrinsic::amdgcn_kernarg_segment_ptr, {}, {},
nullptr, F.getName() + ".kernarg.segment");

KernArgSegment->addRetAttr(Attribute::NonNull);
KernArgSegment->addRetAttr(
Attribute::getWithDereferenceableBytes(Ctx, TotalKernArgSize));
Expand Down
29 changes: 24 additions & 5 deletions llvm/test/CodeGen/AMDGPU/lower-kernargs.ll
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals all
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals
; RUN: opt -mtriple=amdgcn-amd-amdhsa -S -o - -passes=amdgpu-lower-kernel-arguments %s | FileCheck -check-prefixes=GCN,HSA %s
; RUN: opt -mtriple=amdgcn-- -S -o - -passes=amdgpu-lower-kernel-arguments %s | FileCheck -check-prefixes=GCN,MESA %s

Expand Down Expand Up @@ -1041,14 +1041,14 @@ define amdgpu_kernel void @kern_global_ptr_dereferencable_or_null(ptr addrspace(
; HSA-LABEL: @kern_global_ptr_dereferencable_or_null(
; HSA-NEXT: [[KERN_GLOBAL_PTR_DEREFERENCABLE_OR_NULL_KERNARG_SEGMENT:%.*]] = call nonnull align 16 dereferenceable(264) ptr addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
; HSA-NEXT: [[PTR_KERNARG_OFFSET:%.*]] = getelementptr inbounds i8, ptr addrspace(4) [[KERN_GLOBAL_PTR_DEREFERENCABLE_OR_NULL_KERNARG_SEGMENT]], i64 0
; HSA-NEXT: [[PTR_LOAD:%.*]] = load ptr addrspace(1), ptr addrspace(4) [[PTR_KERNARG_OFFSET]], align 16, !invariant.load [[META1]], !dereferenceable_or_null !3
; HSA-NEXT: [[PTR_LOAD:%.*]] = load ptr addrspace(1), ptr addrspace(4) [[PTR_KERNARG_OFFSET]], align 16, !invariant.load [[META1]], !dereferenceable_or_null [[META3:![0-9]+]]
; HSA-NEXT: store volatile ptr addrspace(1) [[PTR_LOAD]], ptr addrspace(1) undef, align 8
; HSA-NEXT: ret void
;
; MESA-LABEL: @kern_global_ptr_dereferencable_or_null(
; MESA-NEXT: [[KERN_GLOBAL_PTR_DEREFERENCABLE_OR_NULL_KERNARG_SEGMENT:%.*]] = call nonnull align 16 dereferenceable(264) ptr addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
; MESA-NEXT: [[PTR_KERNARG_OFFSET:%.*]] = getelementptr inbounds i8, ptr addrspace(4) [[KERN_GLOBAL_PTR_DEREFERENCABLE_OR_NULL_KERNARG_SEGMENT]], i64 36
; MESA-NEXT: [[PTR_LOAD:%.*]] = load ptr addrspace(1), ptr addrspace(4) [[PTR_KERNARG_OFFSET]], align 4, !invariant.load [[META1]], !dereferenceable_or_null !3
; MESA-NEXT: [[PTR_LOAD:%.*]] = load ptr addrspace(1), ptr addrspace(4) [[PTR_KERNARG_OFFSET]], align 4, !invariant.load [[META1]], !dereferenceable_or_null [[META3:![0-9]+]]
; MESA-NEXT: store volatile ptr addrspace(1) [[PTR_LOAD]], ptr addrspace(1) undef, align 8
; MESA-NEXT: ret void
;
Expand Down Expand Up @@ -1116,6 +1116,25 @@ define amdgpu_kernel void @kern_noalias_global_ptr_x2(ptr addrspace(1) noalias %
ret void
}

define amdgpu_kernel void @kern_noundef_global_ptr(ptr addrspace(1) noundef %ptr) #0 {
; HSA-LABEL: @kern_noundef_global_ptr(
; HSA-NEXT: [[KERN_NOUNDEF_GLOBAL_PTR_KERNARG_SEGMENT:%.*]] = call nonnull align 16 dereferenceable(264) ptr addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
; HSA-NEXT: [[PTR_KERNARG_OFFSET:%.*]] = getelementptr inbounds i8, ptr addrspace(4) [[KERN_NOUNDEF_GLOBAL_PTR_KERNARG_SEGMENT]], i64 0
; HSA-NEXT: [[PTR_LOAD:%.*]] = load ptr addrspace(1), ptr addrspace(4) [[PTR_KERNARG_OFFSET]], align 16, !invariant.load [[META1]]
; HSA-NEXT: store volatile ptr addrspace(1) [[PTR_LOAD]], ptr addrspace(1) null, align 8
; HSA-NEXT: ret void
;
; MESA-LABEL: @kern_noundef_global_ptr(
; MESA-NEXT: [[KERN_NOUNDEF_GLOBAL_PTR_KERNARG_SEGMENT:%.*]] = call nonnull align 16 dereferenceable(264) ptr addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
; MESA-NEXT: [[PTR_KERNARG_OFFSET:%.*]] = getelementptr inbounds i8, ptr addrspace(4) [[KERN_NOUNDEF_GLOBAL_PTR_KERNARG_SEGMENT]], i64 36
; MESA-NEXT: [[PTR_LOAD:%.*]] = load ptr addrspace(1), ptr addrspace(4) [[PTR_KERNARG_OFFSET]], align 4, !invariant.load [[META1]]
; MESA-NEXT: store volatile ptr addrspace(1) [[PTR_LOAD]], ptr addrspace(1) null, align 8
; MESA-NEXT: ret void
;
store volatile ptr addrspace(1) %ptr, ptr addrspace(1) null
ret void
}

define amdgpu_kernel void @struct_i8_i8_arg({i8, i8} %in) #0 {
; HSA-LABEL: @struct_i8_i8_arg(
; HSA-NEXT: entry:
Expand Down Expand Up @@ -1711,12 +1730,12 @@ attributes #2 = { nounwind "target-cpu"="tahiti" }
; HSA: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 500}
; HSA: [[META1]] = !{}
; HSA: [[META2]] = !{i64 42}
; HSA: [[META3:![0-9]+]] = !{i64 128}
; HSA: [[META3]] = !{i64 128}
; HSA: [[META4]] = !{i64 1024}
;.
; MESA: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 500}
; MESA: [[META1]] = !{}
; MESA: [[META2]] = !{i64 42}
; MESA: [[META3:![0-9]+]] = !{i64 128}
; MESA: [[META3]] = !{i64 128}
; MESA: [[META4]] = !{i64 1024}
;.
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ attributes #0 = { noinline }
; CHECK: declare void @llvm.donothing() #2

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
; CHECK: declare i32 @llvm.amdgcn.lds.kernel.id() #3
; CHECK: declare noundef i32 @llvm.amdgcn.lds.kernel.id() #3

; CHECK: attributes #0 = { noinline }
; CHECK: attributes #1 = { "amdgpu-lds-size"="4,4" }
Expand Down
Loading
Loading