-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL] Added ext_oneapi_non_uniform_groups
aspect
#10902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for Non uniform groups. Signed-off-by: Jack Kirk <[email protected]>
Signed-off-by: Jack Kirk <[email protected]>
Signed-off-by: Jack Kirk <[email protected]`>
@@ -150,6 +151,10 @@ get_ballot_group(Group group, bool predicate) { | |||
} else { | |||
return ballot_group<sycl::sub_group>(~mask, predicate); | |||
} | |||
#else | |||
static_assert(false, | |||
"ballot_group is not currently supported on this platform."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a static assertion our best option here? I worry it may be a little to harsh on users that don't intend to run such kernels on other platforms.
@Pennycook - What are your thoughts here? My thoughts are that ideally if we want to restrict the platforms of these, we would want to have an aspect for it and mark related features as requiring it so we can report failure at runtime instead if the user tried to launch a kernel with these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @steffenlarsen. Furthermore, won't these asserts trigger any time the functions are compiled, regardless of whether they're actually used? I don't think we want to be in a position where we can't compile for a specific backend just because there's a usage of one of these non-uniform groups somewhere in a header.
I think something like an aspect, optional feature, or other device queries makes sense here. We didn't include anything like this in the initial proposal because it wasn't clear what should and shouldn't be allowed... For example, should a device be allowed to support only certain fixed_size_group
sizes, or certain scopes? I think we need to get more implementation experience -- across more than just SPIR-V and NVPTX -- before we can properly answer those questions.
I'm not opposed to adding some broad aspects/queries for whether the groups are supported at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, we can add the aspect then tell programmers it is their responsibility to use the aspect as a check in code that is using non uniform groups?
RE implementations on other backends:
For HIP AMD I'm not sure that that any of these groups can be fully implemented on existing hardware. The issue is that amd cards don't have independent forward progress (as it is defined here https://developer.nvidia.com/blog/inside-volta/), and they don't support __syncwarp(mask);
HIP does support some subsets of these group features, (ballot,any, or taking a mask), such that I think it would be possible to implement basically all of ballot_group
and fixed_size_group
, minus barrier
. Although due to the lack of independent forward progress guarantees I guess could mean that even then certain code might not be portable in the sense it doesn't hang etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, we can add the aspect then tell programmers it is their responsibility to use the aspect as a check in code that is using non uniform groups?
Exactly.
For HIP AMD I'm not sure that that any of these groups can be fully implemented on existing hardware. The issue is that amd cards don't have independent forward progress (as it is defined here https://developer.nvidia.com/blog/inside-volta/), and they don't support
__syncwarp(mask);
Intel GPUs don't support "independent forward progress" either, and the extension doesn't require it. Working around the lack of __syncwarp(mask)
probably requires some assumptions and/or compiler smarts... since we only need to be able to synchronize the work-items in the active branch, it might be sufficient to run the equivalent of an unmasked __syncwarp
, or to do nothing at all. For SPIR-V targets we currently just issue a memory barrier, which seems to work (see here).
I'm still not comfortable with this and I think we need more test-cases to prove definitively whether things are working as expected. But that's why the extension is experimental. 😄
Although due to the lack of independent forward progress guarantees I guess could mean that even then certain code might not be portable in the sense it doesn't hang etc.
I agree, but this is always going to be true. I don't think the presence of non-uniform groups makes this any worse than it is already. Any code that makes assumptions about forward progress will be non-portable, and that's why we're defining the sycl_ext_oneapi_forward_progress extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, we can add the aspect then tell programmers it is their responsibility to use the aspect as a check in code that is using non uniform groups?
Exactly.
For HIP AMD I'm not sure that that any of these groups can be fully implemented on existing hardware. The issue is that amd cards don't have independent forward progress (as it is defined here https://developer.nvidia.com/blog/inside-volta/), and they don't support
__syncwarp(mask);
Intel GPUs don't support "independent forward progress" either, and the extension doesn't require it. Working around the lack of
__syncwarp(mask)
probably requires some assumptions and/or compiler smarts... since we only need to be able to synchronize the work-items in the active branch, it might be sufficient to run the equivalent of an unmasked__syncwarp
, or to do nothing at all. For SPIR-V targets we currently just issue a memory barrier, which seems to work (see here).I'm still not comfortable with this and I think we need more test-cases to prove definitively whether things are working as expected. But that's why the extension is experimental. 😄
I don't know for sure how amd devices would behave - we'd just have to experiment and see what happens. I guess that we don't want to support e.g. ballot_group partially in a backend if it can be helped. There is also the point that amd could start supporting these features in the not too distant future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to defer to @gmlueck on this one, because I'm not sure how optional features and aspects actually work in the compiler.
Are you asking me whether all the APIs in "sycl_ext_oneapi_non_uniform_groups" should be "optional kernel features" that are tied to some aspect? Obviously, the extension is easier to use if we can guarantee that it is available for all devices. Do we think that it can be implemented for all devices, and we just haven't completed the implementation on certain backends? Or, do we think these APIs can never be implemented on certain devices?
If it's the former, it would be better not to burden application developers by requiring them to check an aspect. Triggering a static-assert might be reasonable as a short-term solution if we can add the missing support soon.
If it's the later, then we should change the specification to add the aspect and document the APIs as optional device features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure, and I was hoping we could get more implementation experience before having to answer this question. But my guess is that there will be certain devices that cannot implement all of the functionality in the specification as it's written today (or that there might be certain compilers and/or library-only implementations that choose not to do so).
How hard is it for us to add new optional device features? I ask because it's still not clear to me if we would want one aspect (i.e. for non-uniform groups) or multiple aspects (e.g. for each type of non-uniform group, or each combination of non-uniform group and scope). I suppose what I'm asking is: If we started with one aspect today, how angry would people be if we need to change that later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this extension is still experimental, we can make API breaking changes in the future. For example, the extension could require all devices to support the APIs now, and we can still add an aspect later if necessary. If we decide that DPC++ can support these APIs on all devices, the extension can expose the APIs as "required device features". When it comes time to adopt them into SYCL-Next, we can still adopt them as "optional features" if other vendors think they will be hard to support.
That said, it is not too hard to add support for an optional feature / aspect. There are two main parts to the implementation:
-
The header file needs to add
[[__sycl_detail__::__uses_aspects__(aspect::foo)]]
to either the declaration of a function or to the definition of a type as described in the design. -
We need some backend-specific code that can query a device and decide whether the device supports the aspect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JackAKirk Given Greg's response above, would something like an aspect::non_uniform_groups
work for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it sounds sensible. I can try adding [[__sycl_detail__::__uses_aspects__(aspect::foo)]]
to the header and add the aspect and then check the behavior.
Signed-off-by: Jack Kirk <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest changes with aspects look good to me. Thanks for doing the extra work here, @JackAKirk!
Thanks, give me a minute, I have a few questions! |
Signed-off-by: Jack Kirk <[email protected]>
So it seems like this statement in the doc is false? "This extension also depends on the following other SYCL extensions: Nothing introduced in the extension document introduces anything new to root_group I think? I think that this extension document should be updated to just remove the above, so there is not confusion that the part I've added applies also to root_group: that it requires the new aspect. The other issue is that I have done something unusual for this aspect. I did not call PI/UR to get the aspect, because this aspect is something specific to the oneapi model rather than the adapters; and also I did not add a device info corresponding to the aspect, since atm device infos currently always map to a UR device info (except for the sycl-fusion aspect which defines the device info in experimental namespace in a different way), so it wouldn't I think be immediately possible to do this (But I don't see that it is required). |
@@ -546,6 +546,11 @@ bool device_impl::has(aspect Aspect) const { | |||
sizeof(pi_bool), &support, nullptr) == PI_SUCCESS; | |||
return call_successful && support; | |||
} | |||
case aspect::ext_oneapi_non_uniform_groups: { | |||
return (this->getBackend() == backend::ext_oneapi_level_zero) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Pennycook I also wanted to check that this correctly covers all the SPIRV cases: Are
(this->getBackend() == backend::ext_oneapi_level_zero) ||
(this->getBackend() == backend::opencl)
The only two backends supporting the spirv non uniform groups?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are the only two that we've tested, at least. I'm happy to leave this as it is for now -- if we can prove to ourselves later that things should run elsewhere, we can update the aspect.
We should clarify this. The dependency is there because this extension defines I think we should remove the link from the dependency section, and move it down to the "Group Taxonomy" section, where there's already a reference to the extension that says "
I have no preference here, and defer to @gmlueck. |
This aims to make it clearer that root_group is not part of the extension. Signed-off-by: Jack Kirk <[email protected]>
Signed-off-by: Jack Kirk <[email protected]>
OK how are the changes I just made? |
ext_oneapi_non_uniform_groups
aspect
Changes look great, thanks.
You mean from proposed to experimental? I think if we want to move this to experimental, we need to add a "Backend support status" section (see the template). This should say 1) that developers need to check the aspect before using the feature; and 2) that the implementation for the CUDA backend is currently incomplete (because it doesn't support Alternatively, we could leave this in proposed until we have
I missed this. I think
|
Signed-off-by: Jack Kirk <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeviceConfigFile.td
changes LGTM. @maarquitos14: FYI
Signed-off-by: Jack Kirk <[email protected]>
Signed-off-by: Jack Kirk <[email protected]>
|
@JackAKirk: A random thought just occurred to me... Do you think it would be possible to configure some/all SYCL kernels such that they're built with the Pascal scheduling model? NVIDIA's documentation shows how to do this with their toolchain here. I believe that if we constrain things to the Pascal scheduling model, your current implementation of What do you think? |
This sounds like an interesting idea that would maybe for I think it would require some testing to see what the behavior is when we get round to doing this. |
sycl/doc/extensions/proposed/sycl_ext_oneapi_non_uniform_groups.asciidoc
Show resolved
Hide resolved
@@ -30,7 +31,13 @@ template <typename Group> | |||
inline std::enable_if_t<sycl::is_group_v<std::decay_t<Group>> && | |||
std::is_same_v<Group, sycl::sub_group>, | |||
ballot_group<Group>> | |||
#ifdef __SYCL_DEVICE_ONLY__ | |||
get_ballot_group [[__sycl_detail__::__uses_aspects__( | |||
sycl::aspect::ext_oneapi_non_uniform_groups)]] (Group group, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right location to place an attribute for a function declaration? I thought the attribute went before the return type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the docs suggest this, but if I put it before the return type : i.e before std::enable_if_t..
I get:
error: '__uses_aspects__' attribute cannot be applied to types
33 | [[__sycl_detail__::__uses_aspects__(
If I put it before the inline
too then it works, but I can't have any variation like this that doesn't then give a warning when building the compiler like:
warning: extra tokens at end of #endif directive
I have tried putting it in a bunch of places, and the only way I can compile and run without any warnings/errors and get it working correctly is like how I've done it in the PR!
I could make it superficially different so it also avoids the #else
statement but it looks a mess, so I decided this way was best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to cppreference, the attribute is allowed in either place: before the inline
or directly after the function name. So, I think what you have is fine.
(https://en.cppreference.com/w/cpp/language/attributes)
FWIW, I presume you could do this if you wanted to place the attribute before the inline
:
template <typename Group>
#ifdef __SYCL_DEVICE_ONLY__
[[__sycl_detail__::__uses_aspects__(sycl::aspect::ext_oneapi_non_uniform_groups)]]
#endif
inline std::enable_if_t<sycl::is_group_v<std::decay_t<Group>> &&
std::is_same_v<Group, sycl::sub_group>,
ballot_group<Group>>
get_ballot_group(Group group, bool predicate);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right this does work, I thought I tried it and it didn't. It is better like that. I'll change it.
Signed-off-by: Jack Kirk <[email protected]>
sycl/doc/extensions/proposed/sycl_ext_oneapi_non_uniform_groups.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_non_uniform_groups.asciidoc
Outdated
Show resolved
Hide resolved
Signed-off-by: Jack Kirk <[email protected]>
…s.asciidoc format Co-authored-by: Greg Lueck <[email protected]>
…s.asciidoc format Co-authored-by: Greg Lueck <[email protected]>
Signed-off-by: Jack Kirk <[email protected]>
Signed-off-by: Jack Kirk <[email protected]>
Signed-off-by: Jack Kirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
51d80fc
to
2be3514
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! 👍
The errors that you get if you try to e.g. create a
ballot_group
in e.g. HIP backend are not instructive. This PR introducesext_oneapi_non_uniform_groups
and uses the[[__sycl_detail__::__uses_aspects__(aspect::foo)]]
annotations detailed in:https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OptionalDeviceFeatures.md#changes-to-dpc-headers
to give a clear runtime error in the case that a programmer tries to execute a kernel that uses non-uniform groups on a device that does not supported the non-uniform group extension.