Skip to content

[SYCL][USM][PI] Initial Commit to move all USM implementation detail inside the PI in… #937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jan 9, 2020

Conversation

jbrodman
Copy link
Contributor

…terface.

Removes need for USM Dispatcher. CLUSM is also now unused as CL USM is supported on CPU.

Signed-off-by: James Brodman [email protected]

Copy link
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, just a few code style comments, but I'd like @smaslov-intel to review and maybe address some questions in the code comments.

@jbrodman jbrodman requested a review from bader December 16, 2019 16:27
@jbrodman
Copy link
Contributor Author

@smaslov-intel Ping.

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just few requests mostly for more comments

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 33 files reviewed, 10 unresolved discussions (waiting on @bader, @jbrodman, @MichalMrozek, @romanovvlad, and @smaslov-intel)


sycl/include/CL/sycl/detail/pi.h, line 989 at r3 (raw file):

Previously, jbrodman wrote…
  1. USM allocations aren't reference counted, so I'm not sure what you're asking.

I am asking if we should make them reference counted like other objects in PI.

No - they're raw pointers. We can't shove a wrapper around them. Users can use a std::shared_ptr or std::unique_ptr on the host if they want. Won't work across host and device.

OK, please make a note of this in the documentation comment of the alloc/free interfaces, because this is really different from the rest of PI API.


sycl/include/CL/sycl/detail/pi.h, line 1037 at r4 (raw file):

Previously, jbrodman wrote…

Mirrors the OpenCL extension.

I am OK for now.


sycl/plugins/opencl/pi_opencl.cpp, line 511 at r4 (raw file):

Previously, jbrodman wrote…

Just pushed a new commit that seems to handle this in a thread safe way.

OK


sycl/plugins/opencl/pi_opencl.cpp, line 591 at r4 (raw file):

Previously, jbrodman wrote…

Yes. It's doing extra indirection b/c OpenCL, in all its infinite wisdom, decided that pointer args should be set by value instead of by address + size.

I think the OpenCL runtime might also be doing something special on its end, not sure.

OK. Please make it more general than for USM, e.g. name it piextKernelSetArgPointer


sycl/plugins/opencl/pi_opencl.cpp, line 756 at r4 (raw file):

Previously, jbrodman wrote…

Device defined, which is why they're not enumerated.

No, because it's mean to dynamically change behavior during program execution.

OK, but let's define the enumeration of the hints (OK to be ignored by devices that don't know or don't care)


sycl/plugins/opencl/pi_opencl.cpp, line 36 at r5 (raw file):

template <typename T>
pi_result getExtFuncFromContext(pi_context context, const char *func, T *fptr) {
  thread_local static std::map<pi_context, T> FuncPtrs;

I think this is good for now, but there is still redundancy: every context in every thread will repeat it while it is only really needed once for each PI plugin. Please add a TODO to fix this once we settle the pi_plugin infrastructure.

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 12 files at r4, 1 of 3 files at r5.
Reviewable status: 1 of 33 files reviewed, 10 unresolved discussions (waiting on @bader, @jbrodman, @MichalMrozek, and @romanovvlad)


sycl/include/CL/sycl/detail/pi.h, line 1054 at r4 (raw file):

Previously, smaslov-intel wrote…

Please do add doxygen comments (at least something short initially) so that the new interfaces are clearer.
The exsiting PI interfaces are mostly taken from OpenCL so not having doc for them is kind of OK for now.
But we shall do that for them too.

Please copy doxygen comment to the header too

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 22 files at r3, 1 of 12 files at r4, 1 of 3 files at r5.
Reviewable status: 5 of 33 files reviewed, 10 unresolved discussions (waiting on @bader, @jbrodman, @MichalMrozek, and @romanovvlad)

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 33 files reviewed, 10 unresolved discussions (waiting on @bader, @jbrodman, @MichalMrozek, and @romanovvlad)


sycl/plugins/opencl/pi_opencl.cpp, line 794 at r4 (raw file):

Previously, smaslov-intel wrote…

Please describe different types of info that can be queried

Is this the full list? You should also specify the type of the data that is returned

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 11 files at r1, 18 of 22 files at r3, 8 of 12 files at r4, 1 of 3 files at r5.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @bader, @jbrodman, @MichalMrozek, and @romanovvlad)

@jbrodman
Copy link
Contributor Author

I think this is good for now, but there is still redundancy: every context in every thread will repeat it while it is only really needed once for each PI plugin. Please add a TODO to fix this once we settle the pi_plugin infrastructure.

How does it work now? Say we have two different opencl platforms on a system - How many pi opencl's get loaded? We need to to this at a minimum once per context.

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 6 files at r6, 4 of 4 files at r7.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @bader, @jbrodman, @MichalMrozek, and @romanovvlad)


sycl/include/CL/sycl/detail/pi.def, line 102 at r7 (raw file):

_PI_API(piextUSMSharedAlloc)
_PI_API(piextUSMFree)
_PI_API(piextKernelSetArgPointer)

Please move this to the where piKernelSetArg is (and also in the header)


sycl/plugins/opencl/pi_opencl.cpp, line 622 at r4 (raw file):

clSetKernelExecInfo
Yes, that's what I think we need. Could you change the piextUSMKernelSetIndirectAccess into piKernelSetExecInfo ?
Also we'd move that to where other piKernel* API are.


sycl/plugins/opencl/pi_opencl.cpp, line 794 at r4 (raw file):

Previously, smaslov-intel wrote…

Is this the full list? You should also specify the type of the data that is returned

would you add more info about supported parameters?

@jbrodman
Copy link
Contributor Author

jbrodman commented Jan 6, 2020

_PI_API(piextUSMSharedAlloc)
_PI_API(piextUSMFree)
_PI_API(piextKernelSetArgPointer)

Please move this to the where piKernelSetArg is (and also in the header)

Ok.

sycl/plugins/opencl/pi_opencl.cpp, line 622 at r4 (raw file):

clSetKernelExecInfo
Yes, that's what I think we need. Could you change the piextUSMKernelSetIndirectAccess into piKernelSetExecInfo ?
Also we'd move that to where other piKernel* API are.

No - you're missing my point. KernelSetIndirectAccess is USM-specific behavior. It is not general I cannot make it into piKernelSetExecInfo. You could make a piKernelSetExecInfo by wiring up clSetKernelExecInfo straight up, but this has nothing to do with USM. I could implement KernelSetIndirectAccess by calling KernelSetExeciInfo inside a lot but there's no need.

sycl/plugins/opencl/pi_opencl.cpp, line 794 at r4 (raw file):

Previously, smaslov-intel wrote…
would you add more info about supported parameters?

I mean - they're all listed - what do you want? A link to the Spec in the comments?

Also can we please just use Github comments instead of some 3rd party site for comments? This "reviewable" site is a PITA.

@smaslov-intel
Copy link
Contributor

sycl/plugins/opencl/pi_opencl.cpp, line 622 at r4 (raw file):

clSetKernelExecInfo
Yes, that's what I think we need. Could you change the piextUSMKernelSetIndirectAccess into piKernelSetExecInfo ?
Also we'd move that to where other piKernel* API are.

No - you're missing my point. KernelSetIndirectAccess is USM-specific behavior. It is not general I cannot make it into piKernelSetExecInfo. You could make a piKernelSetExecInfo by wiring up clSetKernelExecInfo straight up, but this has nothing to do with USM. I could implement KernelSetIndirectAccess by calling KernelSetExeciInfo inside a lot but there's no need.

I am asking for a general interface "piKernelSetExecInfo" to be added instead with an extension parameter (not supported by plugins not supporting USM), e.g. PIEXT_USM_INDIRECT_POINTERS.

sycl/plugins/opencl/pi_opencl.cpp, line 794 at r4 (raw file):
Previously, smaslov-intel wrote…
would you add more info about supported parameters?

I mean - they're all listed - what do you want? A link to the Spec in the comments?

The comment currently says "examples", not the full list, which it should be.
The PI spec does not quite exists, so comments is where we should have full info.
Please add information about types of these parameters.

@jbrodman jbrodman requested a review from smaslov-intel January 6, 2020 20:00
Signed-off-by: James Brodman <[email protected]>
Signed-off-by: James Brodman <[email protected]>
smaslov-intel
smaslov-intel previously approved these changes Jan 7, 2020
Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few test improvement suggestions.

for (int i = 0; i < numNodes; i++) {
s_cur->Num = i * 2;
if (dev.get_info<info::device::usm_shared_allocations>()) {
Node *s_head = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Node *s_head = nullptr;
Node *s_head = (Node *)aligned_alloc_shared(alignof(Node), sizeof(Node), dev, ctxt);

}
s_cur = s_head;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s_cur = s_head;
Node *s_cur = s_head;

s_cur->Num = i * 2;
if (dev.get_info<info::device::usm_shared_allocations>()) {
Node *s_head = nullptr;
Node *s_cur = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Node *s_cur = nullptr;

if (s_cur->Num != want) {
return -1;
s_cur = s_head;
int mismatches = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mismatches is declared, but not used.

if (s_cur->Num != want) {
return -1;
s_cur = s_head;
int mismatches = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mismatches is declared, but not used.


event e0 = q.memcpy(d_cur, &h_cur, sizeof(Node));
e0.wait();
Node *d_head = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Node *d_head = nullptr;
Node *d_head = alloc.allocate(1);

event e0 = q.memcpy(d_cur, &h_cur, sizeof(Node));
e0.wait();
Node *d_head = nullptr;
Node *d_cur = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Node *d_cur = nullptr;
Node *d_cur = d_head;


d_cur = h_cur.pNext;
}
d_head = alloc.allocate(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
d_head = alloc.allocate(1);

d_cur = h_cur.pNext;
}
d_head = alloc.allocate(1);
d_cur = d_head;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
d_cur = d_head;

@@ -25,32 +25,33 @@ int main() {
auto dev = q.get_device();
auto ctxt = q.get_context();

usm_allocator<int, usm::alloc::host> alloc(ctxt, dev);
if (dev.get_info<info::device::usm_host_allocations>()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of wrapping test code with if (device supports USM) { ... }, it's better to use early exit.

if (!dev.get_info<info::device::usm_host_allocations>())
  return 0;
... // tests using USM

Signed-off-by: James Brodman <[email protected]>
Copy link
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@bader bader merged commit 807bc9b into intel:sycl Jan 9, 2020
aelovikov-intel added a commit to aelovikov-intel/llvm that referenced this pull request May 13, 2025
…ptors

I tracked the appearance of `_nocheck` down to
intel#937 and James doesn't remember
anything specific requiring that vs "checked" version. Also, some
similar USM-related aspects (those that don't just delegate to
`get_info`) do use "checked" APIs, so it seems logical to unify this
processing.

The main reason I do this is that it would be a bit easier to cache the
values of "checked" interfaces by pre-computing USM support in
`device_impl`'s ctor. Note that we perform at least some of those every
time we allocate USM memory, so doing the caching is desirable (even for
the sake of cleaning up our traces).
aelovikov-intel added a commit that referenced this pull request May 14, 2025
…ptors (#18453)

I tracked the appearance of `_nocheck` down to
#937 and James doesn't remember
anything specific requiring that vs "checked" version. Also, some
similar USM-related aspects (those that don't just delegate to
`get_info`) do use "checked" APIs, so it seems logical to unify this
processing.

The main reason I do this is that it would be a bit easier to cache the
values of "checked" interfaces by pre-computing USM support in
`device_impl`'s ctor. Note that we perform at least some of those every
time we allocate USM memory, so doing the caching is desirable (even for
the sake of cleaning up our traces).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants