Skip to content

[SYCL] Enable useful (not random) output from stream #737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 22, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions clang/lib/Sema/SemaSYCL.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,10 @@ class Util {
/// sampler class.
static bool isSyclSamplerType(const QualType &Ty);

/// Checks whether given clang type is a full specialization of the SYCL
/// stream class.
static bool isSyclStreamType(const QualType &Ty);

/// Checks whether given clang type is a standard SYCL API class with given
/// name.
/// \param Ty the clang type being checked
Expand Down Expand Up @@ -770,7 +774,7 @@ static CompoundStmt *CreateOpenCLKernelBody(Sema &S,
// All special SYCL objects must have __init method
CXXMethodDecl *InitMethod = getInitMethod(CRD);
assert(InitMethod &&
"The accessor/sampler must have the __init method");
"The accessor/sampler/stream must have the __init method");
unsigned NumParams = InitMethod->getNumParams();
llvm::SmallVector<Expr *, 4> ParamDREs(NumParams);
auto KFP = KernelFuncParam;
Expand All @@ -780,7 +784,9 @@ static CompoundStmt *CreateOpenCLKernelBody(Sema &S,
S.Context, NestedNameSpecifierLoc(), SourceLocation(), *KFP,
false, DeclarationNameInfo(), ParamType, VK_LValue);
}
std::advance(KernelFuncParam, NumParams - 1);

if (NumParams)
std::advance(KernelFuncParam, NumParams - 1);

DeclAccessPair FieldDAP = DeclAccessPair::make(Field, AS_none);
// [kernel_obj or wrapper object].special_obj
Expand Down Expand Up @@ -909,6 +915,11 @@ static CompoundStmt *CreateOpenCLKernelBody(Sema &S,
DeclarationNameInfo(Field->getDeclName(), SourceLocation()),
nullptr, Field->getType(), VK_LValue, OK_Ordinary, NOUR_None);
getExprForWrappedAccessorInit(CRD, Lhs);
if (Util::isSyclStreamType(FieldType)) {
// Generate call to the __init method of the stream class after
// initializing accessors wrapped by this stream object
getExprForSpecialSYCLObj(FieldType, Field, CRD, KernelObjCloneRef);
}
}
} else {
llvm_unreachable("Unsupported field type");
Expand Down Expand Up @@ -1714,6 +1725,10 @@ bool Util::isSyclSamplerType(const QualType &Ty) {
return isSyclType(Ty, "sampler");
}

bool Util::isSyclStreamType(const QualType &Ty) {
return isSyclType(Ty, "stream");
}

bool Util::isSyclType(const QualType &Ty, StringRef Name, bool Tmpl) {
Decl::Kind ClassDeclKind =
Tmpl ? Decl::Kind::ClassTemplateSpecialization : Decl::Kind::CXXRecord;
Expand Down
24 changes: 12 additions & 12 deletions sycl/include/CL/sycl/accessor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1111,26 +1111,26 @@ class accessor<DataT, Dimensions, AccessMode, access::target::local,
return getQualifiedPtr()[Index];
}

template <int Dims = Dimensions,
typename = detail::enable_if_t<Dims == 0 &&
AccessMode == access::mode::atomic>>
operator atomic<DataT, AS>() const {
template <int Dims = Dimensions>
operator typename detail::enable_if_t<
Dims == 0 && AccessMode == access::mode::atomic, atomic<DataT, AS>>()
const {
return atomic<DataT, AS>(multi_ptr<DataT, AS>(getQualifiedPtr()));
}

template <int Dims = Dimensions,
typename = detail::enable_if_t<(Dims > 0) &&
AccessMode == access::mode::atomic>>
atomic<DataT, AS> operator[](id<Dimensions> Index) const {
template <int Dims = Dimensions>
typename detail::enable_if_t<(Dims > 0) && AccessMode == access::mode::atomic,
atomic<DataT, AS>>
operator[](id<Dimensions> Index) const {
const size_t LinearIndex = getLinearIndex(Index);
return atomic<DataT, AS>(
multi_ptr<DataT, AS>(getQualifiedPtr() + LinearIndex));
}

template <int Dims = Dimensions,
typename = detail::enable_if_t<Dims == 1 &&
AccessMode == access::mode::atomic>>
atomic<DataT, AS> operator[](size_t Index) const {
template <int Dims = Dimensions>
typename detail::enable_if_t<Dims == 1 && AccessMode == access::mode::atomic,
atomic<DataT, AS>>
operator[](size_t Index) const {
return atomic<DataT, AS>(multi_ptr<DataT, AS>(getQualifiedPtr() + Index));
}

Expand Down
17 changes: 17 additions & 0 deletions sycl/include/CL/sycl/detail/accessor_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,23 @@ class LocalAccessorImplHost {
int MDims;
int MElemSize;
std::vector<char> MMem;

bool PerWI = false;
size_t LocalMemSize;
size_t MaxWGSize;
void resize(size_t LocalSize, size_t GlobalSize) {
if (GlobalSize != 1 && LocalSize != 1) {
// If local size is not specified then work group size is chosen by
// runtime. That is why try to allocate based on max work group size or
// global size. In the worst case allocate 80% of local memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is weird.. Why do we allocate 80% of local size? In what situation we can exceed 80% of local size and what happens in this case? If we lose some data, can a perfectly legal SYCL code hit this?

Copy link
Contributor Author

@againull againull Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I think it is not good to allocate 100% of local mem, then it will not be possible to use local mem for other purposes inside the kernel. If [# work items running in parallel] * [flush buffer size provided by user] exceeds 80% of local memory then data for work items that don't have flush buffer will not be printed. But as far as I know maximum number of work items in work group for each device is defined, so this number could not be too big. Theoretically user can provide any flush buffer size, but I am not sure that any implementation can handle any input provided by user. Probably we can write an error message if user provided too big flush buffer size ( if allocated local memory is not enough).

This design (to use local memory) was discussed with @rolandschulz and @dbabokin.
@rolandschulz @dbabokin Could you please also provide your input regarding this question?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you defer this computation somehow till after the run-time has chosen the wg-size? Or do we always choose max WG-size/GlobalSize so this matches?

We should throw an exception if the flush buffer is too big.

size_t MinEstWGSize = LocalSize ? LocalSize : GlobalSize;
MinEstWGSize = MinEstWGSize > MaxWGSize ? MaxWGSize : MinEstWGSize;
size_t NewSize = MinEstWGSize * MSize[0];
MSize[0] =
NewSize > 8 * LocalMemSize / 10 ? 8 * LocalMemSize / 10 : NewSize;
MMem.resize(NewSize * MElemSize);
}
}
};

class LocalAccessorBaseHost {
Expand Down
Loading