Skip to content

Commit fbab374

Browse files
[SYCL] Add new kernel-arg-runtime-aligned metadata. (#5111)
In this PR, if a kernel pointer argument comes from a global accessor, we generate a new metadata(kernel_arg_runtime_aligned) to the kernel to indicate that this pointer has runtime allocated alignment. If this information is available to the FPGA backend and if the accessor has no offset (e.g. through the user applying the [no_offset](#4920) property to their accessor), improvements to area of loads and stores can be made by using aligned LSUs. Without this enhancement we will continue to experience excess area. The SYCL spec already guarantees that accessors are aligned to some runtime-specific alignment. So the user's source doesn't need to change to provide the backend with this guarantee, we simply need to allow this information to propagate to the backend. Current IR implementation for kernel pointer argument from accessor looks like: `define dso_local spir_kernel void @_ZTSZ4mainE8kernel_A(i32 addrspace(1)* %_arg_, %"struct.cl::sycl::range"* byval(%"struct.cl::sycl::range") align 4 %_arg_1, %"struct.cl::sycl::range"* byval(%"struct.cl::sycl::range") align 4 %_arg_2, %"struct.cl::sycl::id"* byval(%"struct.cl::sycl::id") align 4 %_arg_3) #0 !kernel_arg_buffer_location !4 {` The new implementation will look like: `define dso_local spir_kernel void @_ZTSZ4mainE8kernel_A(i32 addrspace(1)* %_arg_, %"struct.cl::sycl::range"* byval(%"struct.cl::sycl::range") align 4 %_arg_1, %"struct.cl::sycl::range"* byval(%"struct.cl::sycl::range") align 4 %_arg_2, %"struct.cl::sycl::id"* byval(%"struct.cl::sycl::id") align 4 %_arg_3) #0 !kernel_arg_buffer_location !4 !kernel_arg_runtime_aligned !5 {` `!5 = !{i1 true, i1 false, i1 false, i1 false}` The metadata is applied to the kernel but really carries data about the kernel’s arguments. The first element of the metadata maps to the first kernel argument, the second to the second and so on. For this particular metadata the request is that the value of any metadata element is 'true' for any kernel arguments that corresponds to the base pointer of an accessor and 'false' otherwise. Accessors are handled specially by the frontend (because they are marked with sycl_special_class) and when a user captures an accessor in their SYCL kernel the FE splits up the single accessor into 4 separate kernel arguments. The first of those 4 arguments is a pointer and is the base pointer of the accessor. That pointer is known to have runtime-specific alignment and thus the element of the kernel-arg-runtime metadata that corresponds to that argument will have a value of “true”.
1 parent 2eed402 commit fbab374

File tree

5 files changed

+199
-15
lines changed

5 files changed

+199
-15
lines changed

clang/include/clang/Basic/Attr.td

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1280,15 +1280,17 @@ def SYCLRegisterNum : InheritableAttr {
12801280
let Documentation = [SYCLRegisterNumDocs];
12811281
}
12821282

1283-
// Used by FE to mark ESIMD kernel pointer parameters which correspond to the
1283+
// Used by FE to mark SYCL kernel pointer parameters which correspond to the
12841284
// original lambda's captured accessors. FE turns the attribute to some metadata
1285-
// required by the ESIMD Back-End.
1286-
// Not supposed to be used directly in the source - SYCL device compiler FE
1287-
// automatically adds it for ESIMD kernels, hence undocumented.
1288-
def SYCLSimdAccessorPtr : InheritableAttr {
1289-
// No spelling, as this attribute can't be created in the source code.
1285+
// required by the device back-end.
1286+
// This attribute does not require custom semantic handling
1287+
// hence we set the SemaHandler field to 0.
1288+
// The attribute is not for public consumption, and is an implicitly-created attribute
1289+
// that has no visible spelling, hence undocumented.
1290+
def SYCLAccessorPtr : Attr {
1291+
// This attribute has no spellings as it is only ever created implicitly.
12901292
let Spellings = [];
1291-
let Subjects = SubjectList<[ParmVar]>;
1293+
let SemaHandler = 0;
12921294
let Documentation = [Undocumented];
12931295
}
12941296

clang/lib/CodeGen/CodeGenModule.cpp

Lines changed: 31 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1728,10 +1728,16 @@ void CodeGenModule::GenOpenCLArgMetadata(llvm::Function *Fn,
17281728
// MDNode for the intel_buffer_location attribute.
17291729
SmallVector<llvm::Metadata *, 8> argSYCLBufferLocationAttr;
17301730

1731+
// MDNode for listing SYCL kernel pointer arguments originating from
1732+
// accessors.
1733+
SmallVector<llvm::Metadata *, 8> argSYCLKernelRuntimeAligned;
1734+
17311735
// MDNode for listing ESIMD kernel pointer arguments originating from
1732-
// accessors
1736+
// accessors.
17331737
SmallVector<llvm::Metadata *, 8> argESIMDAccPtrs;
17341738

1739+
bool isKernelArgAnAccessor = false;
1740+
17351741
if (FD && CGF)
17361742
for (unsigned i = 0, e = FD->getNumParams(); i != e; ++i) {
17371743
const ParmVarDecl *parm = FD->getParamDecl(i);
@@ -1835,17 +1841,38 @@ void CodeGenModule::GenOpenCLArgMetadata(llvm::Function *Fn,
18351841
SYCLBufferLocationAttr->getLocationID()))
18361842
: llvm::ConstantAsMetadata::get(CGF->Builder.getInt32(-1)));
18371843

1844+
// If a kernel pointer argument comes from an accessor, we generate
1845+
// a new metadata(kernel_arg_runtime_aligned) to the kernel to indicate
1846+
// that this pointer has runtime allocated alignment. The value of any
1847+
// "kernel_arg_runtime_aligned" metadata element is 'true' for any kernel
1848+
// arguments that corresponds to the base pointer of an accessor and
1849+
// 'false' otherwise.
1850+
if (parm->hasAttr<SYCLAccessorPtrAttr>()) {
1851+
isKernelArgAnAccessor = true;
1852+
argSYCLKernelRuntimeAligned.push_back(
1853+
llvm::ConstantAsMetadata::get(CGF->Builder.getTrue()));
1854+
} else {
1855+
argSYCLKernelRuntimeAligned.push_back(
1856+
llvm::ConstantAsMetadata::get(CGF->Builder.getFalse()));
1857+
}
1858+
18381859
if (FD->hasAttr<SYCLSimdAttr>())
18391860
argESIMDAccPtrs.push_back(llvm::ConstantAsMetadata::get(
1840-
CGF->Builder.getInt1(parm->hasAttr<SYCLSimdAccessorPtrAttr>())));
1861+
CGF->Builder.getInt1(parm->hasAttr<SYCLAccessorPtrAttr>())));
18411862
}
18421863

18431864
bool IsEsimdFunction = FD && FD->hasAttr<SYCLSimdAttr>();
18441865

1845-
if (LangOpts.SYCLIsDevice && !IsEsimdFunction)
1866+
if (LangOpts.SYCLIsDevice && !IsEsimdFunction) {
18461867
Fn->setMetadata("kernel_arg_buffer_location",
18471868
llvm::MDNode::get(VMContext, argSYCLBufferLocationAttr));
1848-
else {
1869+
// Generate this metadata only if atleast one kernel argument is an
1870+
// accessor.
1871+
if (isKernelArgAnAccessor)
1872+
Fn->setMetadata(
1873+
"kernel_arg_runtime_aligned",
1874+
llvm::MDNode::get(VMContext, argSYCLKernelRuntimeAligned));
1875+
} else {
18491876
Fn->setMetadata("kernel_arg_addr_space",
18501877
llvm::MDNode::get(VMContext, addressQuals));
18511878
Fn->setMetadata("kernel_arg_access_qual",

clang/lib/Sema/SemaSYCL.cpp

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1962,10 +1962,6 @@ class SyclKernelDeclCreator : public SyclKernelFieldHandler {
19621962
// Additional processing is required for accessor type.
19631963
void handleAccessorType(const CXXRecordDecl *RecordDecl, SourceLocation Loc) {
19641964
handleAccessorPropertyList(Params.back(), RecordDecl, Loc);
1965-
if (KernelDecl->hasAttr<SYCLSimdAttr>())
1966-
// In ESIMD, the kernels accessor's pointer argument needs to be marked.
1967-
Params.back()->addAttr(
1968-
SYCLSimdAccessorPtrAttr::CreateImplicit(SemaRef.getASTContext()));
19691965
// Get access mode of accessor.
19701966
const auto *AccessorSpecializationDecl =
19711967
cast<ClassTemplateSpecializationDecl>(RecordDecl);
@@ -1977,6 +1973,8 @@ class SyclKernelDeclCreator : public SyclKernelFieldHandler {
19771973
if (isReadOnlyAccessor(AccessModeArg))
19781974
Params.back()->addAttr(
19791975
SYCLAccessorReadonlyAttr::CreateImplicit(SemaRef.getASTContext()));
1976+
Params.back()->addAttr(
1977+
SYCLAccessorPtrAttr::CreateImplicit(SemaRef.getASTContext()));
19801978
}
19811979

19821980
// All special SYCL objects must have __init method. We extract types for
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
// RUN: %clang_cc1 -fsycl-is-device -internal-isystem %S/Inputs -triple spir64-unknown-unknown -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s
2+
3+
// This test checks if the metadata "kernel-arg-runtime-aligned"
4+
// is generated if the kernel captures an accessor.
5+
6+
#include "sycl.hpp"
7+
8+
using namespace cl::sycl;
9+
10+
queue q;
11+
12+
int main() {
13+
14+
using Accessor =
15+
accessor<int, 1, access::mode::read_write, access::target::global_buffer>;
16+
Accessor acc[2];
17+
18+
accessor<int, 1, access::mode::read, access::target::global_buffer> readOnlyAccessor;
19+
20+
accessor<float, 2, access::mode::write,
21+
access::target::local,
22+
access::placeholder::true_t>
23+
acc3;
24+
25+
// kernel_A parameters : int*, sycl::range<1>, sycl::range<1>, sycl::id<1>,
26+
// int*, sycl::range<1>, sycl::range<1>,sycl::id<1>.
27+
q.submit([&](handler &h) {
28+
h.single_task<class kernel_A>([=]() {
29+
acc[1].use();
30+
});
31+
});
32+
33+
// kernel_readOnlyAcc parameters : int*, sycl::range<1>, sycl::range<1>, sycl::id<1>.
34+
q.submit([&](handler &h) {
35+
h.single_task<class kernel_readOnlyAcc>([=]() {
36+
readOnlyAccessor.use();
37+
});
38+
});
39+
40+
// kernel_B parameters : none.
41+
q.submit([&](handler &h) {
42+
h.single_task<class kernel_B>([=]() {
43+
int result = 5;
44+
});
45+
});
46+
47+
int a = 10;
48+
49+
// kernel_C parameters : int.
50+
q.submit([&](handler &h) {
51+
h.single_task<class kernel_C>([=]() {
52+
int x = a;
53+
});
54+
});
55+
56+
// Using raw pointers to represent USM pointers.
57+
// kernel_arg_runtime_aligned is not generated for raw pointers.
58+
int *x;
59+
float *y;
60+
q.submit([&](handler &h) {
61+
h.single_task<class usm_ptr>([=]() {
62+
*x = 42;
63+
*y = 3.14;
64+
});
65+
});
66+
67+
// Using local accessor as a kernel parameter.
68+
// kernel_arg_runtime_aligned is generated for pointers from local accessors.
69+
q.submit([&](handler &h) {
70+
h.single_task<class localAccessor>([=]() {
71+
acc3.use();
72+
});
73+
});
74+
75+
// kernel_acc_raw_ptr parameters : int*, sycl::range<1>, sycl::range<1>, sycl::id<1>, int*.
76+
int *rawPtr;
77+
q.submit([&](handler &h) {
78+
h.single_task<class kernel_acc_raw_ptr>([=]() {
79+
readOnlyAccessor.use();
80+
*rawPtr = 10;
81+
});
82+
});
83+
84+
// Check if kernel_arg_accessor_ptr metadata is generated for ESIMD kernels that capture
85+
// an accessor.
86+
q.submit([&](handler &h) {
87+
h.single_task<class esimd_kernel_with_acc>([=]() __attribute__((sycl_explicit_simd)) {
88+
readOnlyAccessor.use();
89+
});
90+
});
91+
}
92+
93+
// Check kernel_A parameters
94+
// CHECK: define {{.*}}spir_kernel void @{{.*}}kernel_A
95+
// CHECK-SAME: i32 addrspace(1)* [[MEM_ARG1:%[a-zA-Z0-9_]+]],
96+
// CHECK-SAME: %"struct.cl::sycl::range"* byval{{.*}}align 4 [[ACC_RANGE1:%[a-zA-Z0-9_]+_1]],
97+
// CHECK-SAME: %"struct.cl::sycl::range"* byval{{.*}}align 4 [[MEM_RANGE1:%[a-zA-Z0-9_]+_2]],
98+
// CHECK-SAME: %"struct.cl::sycl::id"* byval{{.*}}align 4 [[OFFSET1:%[a-zA-Z0-9_]+_3]],
99+
// CHECK-SAME: i32 addrspace(1)* [[MEM_ARG2:%[a-zA-Z0-9_]+_4]],
100+
// CHECK-SAME: %"struct.cl::sycl::range"* byval{{.*}}align 4 [[ACC_RANGE2:%[a-zA-Z0-9_]+_6]],
101+
// CHECK-SAME: %"struct.cl::sycl::range"* byval{{.*}}align 4 [[MEM_RANGE2:%[a-zA-Z0-9_]+_7]],
102+
// CHECK-SAME: %"struct.cl::sycl::id"* byval{{.*}}align 4 [[OFFSET2:%[a-zA-Z0-9_]+_8]])
103+
// CHECK-SAME: !kernel_arg_runtime_aligned !5
104+
105+
// Check kernel_readOnlyAcc parameters
106+
// CHECK: define {{.*}}spir_kernel void @{{.*}}kernel_readOnlyAcc
107+
// CHECK-SAME: i32 addrspace(1)* readonly [[MEM_ARG1:%[a-zA-Z0-9_]+]],
108+
// CHECK-SAME: %"struct.cl::sycl::range"* byval{{.*}}align 4 [[ACC_RANGE1:%[a-zA-Z0-9_]+_1]],
109+
// CHECK-SAME: %"struct.cl::sycl::range"* byval{{.*}}align 4 [[MEM_RANGE1:%[a-zA-Z0-9_]+_2]],
110+
// CHECK-SAME: %"struct.cl::sycl::id"* byval{{.*}}align 4 [[OFFSET1:%[a-zA-Z0-9_]+_3]]
111+
// CHECK-SAME: !kernel_arg_runtime_aligned !14
112+
113+
// Check kernel_B parameters
114+
// CHECK: define {{.*}}spir_kernel void @{{.*}}kernel_B
115+
// CHECK-NOT: kernel_arg_runtime_aligned
116+
117+
// Check kernel_C parameters
118+
// CHECK: define {{.*}}spir_kernel void @{{.*}}kernel_C
119+
// CHECK-SAME: i32 [[MEM_ARG1:%[a-zA-Z0-9_]+]]
120+
// CHECK-NOT: kernel_arg_runtime_aligned
121+
122+
// Check usm_ptr parameters
123+
// CHECK: define {{.*}}spir_kernel void @{{.*}}usm_ptr
124+
// CHECK-SAME: i32 addrspace(1)* [[MEM_ARG1:%[a-zA-Z0-9_]+]],
125+
// CHECK-SAME: float addrspace(1)* [[MEM_ARG1:%[a-zA-Z0-9_]+]]
126+
// CHECK-NOT: kernel_arg_runtime_aligned
127+
128+
// CHECK: define {{.*}}spir_kernel void @{{.*}}localAccessor
129+
// CHECK-SAME: float addrspace(1)* [[MEM_ARG1:%[a-zA-Z0-9_]+]],
130+
// CHECK-SAME: %"struct.cl::sycl::range.5"* byval{{.*}}align 4 [[ACC_RANGE1:%[a-zA-Z0-9_]+_1]],
131+
// CHECK-SAME: %"struct.cl::sycl::range.5"* byval{{.*}}align 4 [[MEM_RANGE1:%[a-zA-Z0-9_]+_2]],
132+
// CHECK-SAME: %"struct.cl::sycl::id.6"* byval{{.*}}align 4 [[OFFSET1:%[a-zA-Z0-9_]+_3]]
133+
// CHECK-SAME: !kernel_arg_runtime_aligned !14
134+
135+
// Check kernel_acc_raw_ptr parameters
136+
// CHECK: define {{.*}}spir_kernel void @{{.*}}kernel_acc_raw_ptr
137+
// CHECK-SAME: i32 addrspace(1)* readonly [[MEM_ARG1:%[a-zA-Z0-9_]+]],
138+
// CHECK-SAME: %"struct.cl::sycl::range"* byval{{.*}}align 4 [[ACC_RANGE1:%[a-zA-Z0-9_]+_1]],
139+
// CHECK-SAME: %"struct.cl::sycl::range"* byval{{.*}}align 4 [[MEM_RANGE1:%[a-zA-Z0-9_]+_2]],
140+
// CHECK-SAME: %"struct.cl::sycl::id"* byval{{.*}}align 4 [[OFFSET1:%[a-zA-Z0-9_]+_3]]
141+
// CHECK-SAME: i32 addrspace(1)* [[MEM_ARG1:%[a-zA-Z0-9_]+]]
142+
// CHECK-SAME: !kernel_arg_runtime_aligned !26
143+
144+
// Check esimd_kernel_with_acc parameters
145+
// CHECK: define {{.*}}spir_kernel void @{{.*}}esimd_kernel_with_acc
146+
// CHECK-SAME: !kernel_arg_accessor_ptr
147+
148+
// Check kernel-arg-runtime-aligned metadata.
149+
// The value of any metadata element is 1 for any kernel arguments
150+
// that corresponds to the base pointer of an accessor and 0 otherwise.
151+
// CHECK: !5 = !{i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1 false, i1 false}
152+
// CHECK: !14 = !{i1 true, i1 false, i1 false, i1 false}
153+
// CHECK: !26 = !{i1 true, i1 false, i1 false, i1 false, i1 false}

clang/test/SemaSYCL/array-kernel-param.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,10 +114,12 @@ int main() {
114114
// Check Kernel_Accessor parameters
115115
// CHECK: FunctionDecl {{.*}}Kernel_Accessor{{.*}} 'void (__global int *, sycl::range<1>, sycl::range<1>, sycl::id<1>, __global int *, sycl::range<1>, sycl::range<1>, sycl::id<1>)'
116116
// CHECK-NEXT: ParmVarDecl {{.*}} used _arg_ '__global int *'
117+
// CHECK-NEXT: SYCLAccessorPtrAttr
117118
// CHECK-NEXT: ParmVarDecl {{.*}} used _arg_ 'sycl::range<1>'
118119
// CHECK-NEXT: ParmVarDecl {{.*}} used _arg_ 'sycl::range<1>'
119120
// CHECK-NEXT: ParmVarDecl {{.*}} used _arg_ 'sycl::id<1>'
120121
// CHECK-NEXT: ParmVarDecl {{.*}} used _arg_ '__global int *'
122+
// CHECK-NEXT: SYCLAccessorPtrAttr
121123
// CHECK-NEXT: ParmVarDecl {{.*}} used _arg_ 'sycl::range<1>'
122124
// CHECK-NEXT: ParmVarDecl {{.*}} used _arg_ 'sycl::range<1>'
123125
// CHECK-NEXT: ParmVarDecl {{.*}} used _arg_ 'sycl::id<1>'
@@ -165,10 +167,12 @@ int main() {
165167
// Check Kernel_StructAccArray parameters
166168
// CHECK: FunctionDecl {{.*}}Kernel_StructAccArray{{.*}} 'void (__global int *, sycl::range<1>, sycl::range<1>, sycl::id<1>, __global int *, sycl::range<1>, sycl::range<1>, sycl::id<1>)'
167169
// CHECK-NEXT: ParmVarDecl {{.*}} used _arg_member_acc '__global int *'
170+
// CHECK-NEXT: SYCLAccessorPtrAttr
168171
// CHECK-NEXT: ParmVarDecl {{.*}} used _arg_member_acc 'sycl::range<1>'
169172
// CHECK-NEXT: ParmVarDecl {{.*}} used _arg_member_acc 'sycl::range<1>'
170173
// CHECK-NEXT: ParmVarDecl {{.*}} used _arg_member_acc 'sycl::id<1>'
171174
// CHECK-NEXT: ParmVarDecl {{.*}} used _arg_member_acc '__global int *'
175+
// CHECK-NEXT: SYCLAccessorPtrAttr
172176
// CHECK-NEXT: ParmVarDecl {{.*}} used _arg_member_acc 'sycl::range<1>'
173177
// CHECK-NEXT: ParmVarDecl {{.*}} used _arg_member_acc 'sycl::range<1>'
174178
// CHECK-NEXT: ParmVarDecl {{.*}} used _arg_member_acc 'sycl::id<1>'

0 commit comments

Comments
 (0)