Skip to content

[SYCL] SYCL 2020 callable device selectors #6486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

cperkinsintel
Copy link
Contributor

@cperkinsintel cperkinsintel commented Jul 28, 2022

In SYCL 2020 device selectors switch from subclasses to simply being any callable. This allows user defined device selector lambdas and more. This implements them for the queue, platform and device classes.

Tests are updated here: intel/llvm-test-suite#1119

@cperkinsintel cperkinsintel marked this pull request as ready for review July 28, 2022 04:05
@cperkinsintel cperkinsintel requested a review from a team as a code owner July 28, 2022 04:05
@cperkinsintel cperkinsintel requested a review from againull July 28, 2022 04:05
@cperkinsintel
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1119

@cperkinsintel
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1119

Copy link
Contributor

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see at last some real SYCL 2020 device selectors. \o/


} // namespace detail

device device_selector::select_device() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to implement the old SYCL 1.2.1 ugly Java-factory style API ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no fan either, but this is still used if compiling with C++14. Just preserving the functionality that was here before.

explicit queue(const DeviceSelector &deviceSelector,
const async_handler &AsyncHandler,
const property_list &PropList = {})
: queue(detail::select_device(deviceSelector), AsyncHandler, PropList) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
: queue(detail::select_device(deviceSelector), AsyncHandler, PropList) {}
: queue(device { deviceSelector }, AsyncHandler, PropList) {}

?

Copy link
Contributor Author

@cperkinsintel cperkinsintel Aug 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keryell I like this suggestion, and if you prefer it I'll change it.

However, consider that of the four constructors we are adding here, only two are amenable to the change. The other two queue constructors use continue to use detail::select_device because they have the additional context argument. Personally, I prefer all four to be consistent with each other, each calling detail::select_device directly, rather than just passing it to the device constructor in two cases. This may simply be a matter of taste. Let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting you need this detail::select_device at the first place.
Reading again the specification on queue, queue creation throws if the device picked by the selector is not in the context. The device is not the best one picked from the devices in the context but it is picked independently. So it is unclear why this detail::select_device takes all these parametersn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reading was that the device selector would be used to select the best device from the context (assuming an acceptable one could be found). It makes little sense to even provide that constructor otherwise.

FWIW, that is how the analog SYCL 1.2.1 queue constructor is implemented right now, and I used its behavior to model this one ( https://github.com/intel/llvm/blob/sycl/sycl/source/queue.cpp#L23-L30 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting feature, but this is not how the SYCL specification is defined:

Constructs a SYCL queue instance that is associated with the syclContext provided, using the device returned by the device selector provided. Must throw an exception with the errc::invalid error code if syclContext does not encapsulate the SYCL device returned by deviceSelector.

So either the specification has to be changed or DPC++ has to be updated. @bader any direction?
But anyway, this is outside of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @keryell that device selection is supposed to be "context-free". I don't know if logic currently implemented by DPC++ is used, but it can be implemented outside of the library.
To honest, I don't have enough data to recommend any direction here.
At first glace, DPC++ behavior sounds like a user friendly approach. The runtime emits less exceptions for the user to handle and tries to do it best to select the right device. On the other hand, it's just a syntax sugar, which might add an overhead and not sure if it's really used in the real word.
@intel/dpcpp-specification-reviewers, do you have any thoughts on that subject?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at all the places in the SYCL 2020 spec where a device selector is passed. In all other cases, it's clear that the implementation should pass every root device to the device selector, and the selector ranks each one.

This queue constructor is the only place where the set of valid device is limited by a context that the user passes in. It could make sense to redefine the spec to say that the implementation only passes devices from the context to the selector in this case. However, that is not how the spec is worded now, and we have the same wording in SYCL 1.2.1.

I actually wonder if this queue constructor is even useful. If the user is passing a custom context, maybe we should also force them to pass a specific device (and not a device selector). This avoids any ambiguity about which devices the implementation passes to the selector. We already have a form of the queue constructor like this (taking a context and a device). If we were doing this from scratch, I think I would not have added a form of the queue constructor that takes a context and a selector.

My feeling is that DPC++ should implement the existing spec, and maybe the committee should consider deprecating this form of the queue constructor.

// see [filter_selectors not "Callable"] in device_selectors.cpp
explicit queue(const DeviceSelector &deviceSelector,
const property_list &PropList = {})
: queue(detail::select_device(deviceSelector), async_handler{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem?

Comment on lines +71 to +74
<< "SYCL_PI_TRACE[all]: "
<< " platform: " << PlatformName << std::endl
<< "SYCL_PI_TRACE[all]: "
<< " device: " << DeviceName << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear why you need these strings to be split.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's copy/pasta. This is the trace code we had before. I just wrapped it in a function, but preserved it otherwise.

<< " device: " << DeviceName << std::endl;
}
for (const auto &dev : Devices) {
int dev_score = std::invoke(DeviceSelectorInvocable, dev);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int dev_score = std::invoke(DeviceSelectorInvocable, dev);
int dev_score = DeviceSelectorInvocable(dev);

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keryell Since we are using std::is_invocable_r_v when we screen it, I actually prefer the symmetry of using std::invoke to invoke it. Maybe I'm in the minority on that. Would you mind if we kept it as is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you kidding me or are you a member of the Church of the C++ Discouragement and Obfuscation? :-)
Why not rewriting all the code with Operator function objects from https://en.cppreference.com/w/cpp/utility/functional instead of using just + or -? ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to start a holy war. I'll change it.
The hallowed symmetrical beauty of std::is_invocable to std::invoke is perhaps not cherished by the unenlightened.

@cperkinsintel
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1119

Signed-off-by: Chris Perkins <[email protected]>
@againull againull self-requested a review August 5, 2022 05:05
againull
againull previously approved these changes Aug 5, 2022
@againull againull self-requested a review August 5, 2022 20:29
@againull againull merged commit 64f0db7 into intel:sycl Aug 5, 2022
@pvchupin
Copy link
Contributor

pvchupin commented Aug 5, 2022

@cperkinsintel, shared libs build fails in post-commit: https://github.com/intel/llvm/runs/7698490682?check_suite_focus=true

steffenlarsen added a commit to steffenlarsen/llvm that referenced this pull request Aug 8, 2022
intel#6486 added new queue constructors for
the use of callable device selectors. In accordance with SYCL 2020 these
constructors are marked explicit and as such implicit conversion from
initializer lists does not work. These changes fixes the Wait.cpp
unittest to use the explicit constructor.

Signed-off-by: Larsen, Steffen <[email protected]>
@steffenlarsen
Copy link
Contributor

#6540 should fix the build issue.

bader pushed a commit that referenced this pull request Aug 8, 2022
#6486 added new queue constructors for
the use of callable device selectors. In accordance with SYCL 2020 these
constructors are marked explicit and as such implicit conversion from
initializer lists does not work. These changes fixes the Wait.cpp
unittest to use the explicit constructor.

Signed-off-by: Larsen, Steffen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants