Skip to content

[SYCL] Add ONEAPI_DEVICE_SELECTOR implementation #6779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

cperkinsintel
Copy link
Contributor

Functionally complete. Needs tests and possible reorganization.

@gmlueck
Copy link
Contributor

gmlueck commented Sep 14, 2022

I assume we should change EnvironmentVariables.md to document the new envvar?

@cperkinsintel
Copy link
Contributor Author

@gmlueck - good catch. I've updated the EnvironmentVariables.md file.

@cperkinsintel cperkinsintel marked this pull request as ready for review September 16, 2022 22:30
@cperkinsintel cperkinsintel requested review from a team as code owners September 16, 2022 22:30
@cperkinsintel
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1270

@cperkinsintel cperkinsintel changed the title [WIP] ONEAPI_DEVICE_SELECTOR implementation [SYCL]ONEAPI_DEVICE_SELECTOR implementation Sep 20, 2022
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only some minor comments.

@cperkinsintel cperkinsintel marked this pull request as draft September 20, 2022 23:10
@cperkinsintel cperkinsintel changed the title [SYCL]ONEAPI_DEVICE_SELECTOR implementation [WIP]ONEAPI_DEVICE_SELECTOR implementation Sep 20, 2022
@cperkinsintel cperkinsintel marked this pull request as ready for review September 21, 2022 21:57
@cperkinsintel cperkinsintel changed the title [WIP]ONEAPI_DEVICE_SELECTOR implementation [SYCL] ONEAPI_DEVICE_SELECTOR implementation Sep 21, 2022
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cperkinsintel
Copy link
Contributor Author

@gmlueck - I think this is all ready. Needs your ok for any furtherance.

| `ONEAPI_DEVICE_SELECTOR=hip:0,2` | Devices with numeric identifier of 0 and 2 are chosen from the HIP platform. |
| `ONEAPI_DEVICE_SELECTOR=opencl:0.*` | All the sub-devices (aka tiles) from the OpenCL device identified by 0 will be available. |
| `ONEAPI_DEVICE_SELECTOR=opencl:0.2` | The third tile (2 in 0 based counting) of the OpenCL device with numeric identifier 0 will be the sole device available. |
| `ONEAPI_DEVICE_SELECTOR=level_zero:gpu,gpu.*,gpu.*.*` | A wide net is cast. All level_zero gpus will be availabe in the device list, along with all their child partitions (if any), as well as the next level of partitions below them. It would be up to the developer to figure out which device is which, if that is important. (This can be done with `info::device::parent_device`)|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we decided against the gpu.* syntax in the end in favor of *.*.

The part about "This can be done with info::device::parent_device" seems wrong to me. All the devices selected by ONEAPI_DEVICE_SELECTOR will be SYCL root devices, so none of them will have a parent device.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpu.* selects not all the GPUs, but all partitions of any partitionable GPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the devices selected by ONEAPI_DEVICE_SELECTOR will be SYCL root devices, so none of them will have a parent device.

That is definitely not what is happening. The ONEAPI_DEVICE_SELECTOR limits the available choice of devices, and in the case of sub-devices, exposes partitions in the listing of devices gotten. They are not modified in any way. So while a partition of some parent device may now be gotten via device::get_devices() or platform::get_devices(), which is new, the get_info() applied against those devices would not be modified. If a device is a partition, I presume it'll have a parent, which the users code could get.

It's cool and all that ONEAPI_DEVICE_SELECTOR can help free users from the burden of having to go and root around partitioning before they can even run a kernel, but pretending that a partition is not actually a partition seems ill-advised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a gist that demos this (needs this PR for the ONEAPI_DEVICE_SELECTOR support, and a machine with a partitionable GPU): https://gist.github.com/cperkinsintel/a14cdf1666c474f428cbdbc5ab98272b

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpu.* selects not all the GPUs, but all partitions of any partitionable GPU.

This is also what *.* does, which is why we dropped gpu.*.

That is definitely not what is happening. The ONEAPI_DEVICE_SELECTOR limits the available choice of devices, and in the case of sub-devices, exposes partitions in the listing of devices gotten. They are not modified in any way. So while a partition of some parent device may now be gotten via device::get_devices() or platform::get_devices(), which is new, the get_info() applied against those devices would not be modified. If a device is a partition, I presume it'll have a parent, which the users code could get.

This is contrary to the SYCL specification. The specification of device::get_devices() is:

Returns a std::vector containing all the root devices from all SYCL backends available in the system
which have the device type encapsulated by deviceType.

Note that it specifically says "root devices", which is defined in the glossary as:

A device that is not a sub-device. The function device::get_devices() returns a vector of all the root devices.

And the spec says that devices that are not a sub-device do not have any parent. Here is the specification for info::device::parent_device:

Returns the parent SYCL device to which this sub-device is a child if this is a sub-device. Must throw an exception with the errc::invalid error code if this SYCL device is not a sub device.

The intention of the "dot" syntax is to allow programmers pretend that each tile is an independent root device. Therefore, we should maintain this illusion and not expose any parent device.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmlueck *.* is any partitionable device ( caveat affinity doman, next_partitionable) ,whereas gpu.* is any partitionable GPU (same caveats), correct? The fact may be that today those two sets are equal, but in the future, they might not be. For example, it is conceivable that a CPU device could be support such partitioning.

In any case, both are consistent with the BNF, no?

Copy link
Contributor Author

@cperkinsintel cperkinsintel Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention of the "dot" syntax is to allow programmers pretend that each tile is an independent root device. Therefore, we should maintain this illusion and not expose any parent device.

So you are saying that if the user specifies ONEAPI_DEVICE_SELECTOR=level_zero:0,0.0 which would select two exact devices (the root device and one partition of it). That we should NOT let them disambiguate those two devices from each other? Something they would be able to do if they didn't use ONEAPI_DEVICE_SELECTOR ? To my mind, we are treading out of the "enable the programmer" zone by helping them into the "limit the programmer" zone by pretending we know better than them. How on earth are we helping anyone by changing the value of get_info<info::device::parent>() from it's real value to a pretend one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it as a tension between the spec and ONEAPI_DEVICE_SELECTOR which is, obviously, outside the specification. In the strictest sense, what ONEAPI_DEVICE_SELECTOR proposes to do via sub-devices is a violation of the spec. To my mind, that's is it's utility, it's why we are developing it. The boundaries the specification put up now make it difficult and tedious for users to work with partitions. ONEAPI_DEVICE_SELECTOR eases that, but its use exceeds the spec as it exists now, and that seems OK, to me at least.

Copy link
Contributor

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting.
Let's wait for C++20 ranges to generalize to any level of sub-devices. :-)

if (SubSubDeviceStr[0] == '*') {
Target.HasSubSubDeviceWildCard = true;
} else {
std::string SSDS(SubSubDeviceStr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear why this intermediate copy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't do stoi on string_view. But I removed the stringstream for string concatenation.

@bader bader changed the title [SYCL] ONEAPI_DEVICE_SELECTOR implementation [SYCL] Add ONEAPI_DEVICE_SELECTOR implementation Sep 27, 2022
@bader bader requested a review from keryell October 2, 2022 09:12
Signed-off-by: Chris Perkins <[email protected]>
@cperkinsintel cperkinsintel requested review from gmlueck and removed request for keryell October 5, 2022 20:56
Copy link
Contributor

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just one minor thing. The rest of the doc changes LGTM.

Copy link
Contributor

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc changes LGTM.

@cperkinsintel
Copy link
Contributor Author

The ESIMD failures are not from this PR. They are occurring with the main sycl branch as well.

@steffenlarsen steffenlarsen merged commit 28d0cd3 into intel:sycl Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants