Skip to content

[SYCL] Implement forward_progress extension API #13389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 61 commits into from
May 15, 2024

Conversation

lbushi25
Copy link
Contributor

@lbushi25 lbushi25 commented Apr 12, 2024

This PR implements the API for the forward progress guarantees extension defined here: https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_forward_progress.asciidoc
It is complete except that it does not verify that the forward progress guarantees requested by a kernel submission are actually supported by the device to which the kernel is submitted. That will be done in a later PR.

@lbushi25 lbushi25 requested review from steffenlarsen, bader, AlexeySachkov, Pennycook and gmlueck and removed request for Pennycook April 12, 2024 23:35
return forward_progress_guarantee::weakly_parallel;
default:
throw sycl::exception(sycl::errc::invalid,
"Work item is not a valid coordination scope!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for an initial implementation, but optimally UR would have interfaces for this so we don't need heuristics for picking the right gurantees.

@gmlueck
Copy link
Contributor

gmlueck commented Apr 15, 2024

It is complete except that it does not verify that the forward progress guarantees requested by a kernel submission are actually supported by the device to which the kernel is submitted. That will be done in a later PR.

Can you elaborate on what part of the spec is not yet implemented? Is it just this statement about throwing an exception that is not yet implemented?

New kernel properties are introduced to allow developers to declare that a given kernel requires specific forward progress guarantees for correctness. If a kernel is submitted to a device that cannot satisfy the request for specific progress guarantees, the implementation must throw an exception with the errc::feature_not_supported error code.

Is there an ETA for when the remaining parts will be implemented? I'm wondering when we want to declare the extension "implemented" and move the extension document to the "experimental" directory.

Also, does this PR also implement sycl_ext_oneapi_launch_queries? Those queries are required in order to correctly use the forward progress guarantee extension.

Copy link
Contributor

@KseniyaTikhomirova KseniyaTikhomirova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lbushi25
Copy link
Contributor Author

lbushi25 commented May 7, 2024

@intel/llvm-gatekeepers This is ready for merge.

@againull
Copy link
Contributor

againull commented May 7, 2024

@intel/llvm-gatekeepers This is ready for merge.

Merge button is blocked because of requested changes, explicit approval from @AlexeySachkov is needed.

@lbushi25 lbushi25 temporarily deployed to WindowsCILock May 8, 2024 01:44 — with GitHub Actions Inactive
@lbushi25 lbushi25 temporarily deployed to WindowsCILock May 8, 2024 02:16 — with GitHub Actions Inactive
@lbushi25
Copy link
Contributor Author

lbushi25 commented May 8, 2024

@intel/llvm-gatekeepers The PR is now approved. The AWS failures are unrelated to my PR.

@lbushi25
Copy link
Contributor Author

lbushi25 commented May 8, 2024

@intel/llvm-gatekeepers ping

@@ -76,6 +76,7 @@
// CHECK-NEXT: range.hpp
// CHECK-NEXT: info/info_desc.hpp
// CHECK-NEXT: ext/oneapi/experimental/device_architecture.hpp
// CHECK-NEXT: ext/oneapi/experimental/forward_progress.hpp
Copy link
Contributor

@aelovikov-intel aelovikov-intel May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is necessary? Or, in other words, how much of the new header code is needlessly included by sycl/detail/core.hpp even if the feature is unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info/info_desc.hpp needs info about forward progress enums so it includes forward_progress.hpp which means it will appear as a transitive dependecy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that the entire extensions is necessary as part of "core" SYCL include? I can't agree with that, because then our compile time will be skyrocketing going forward with any new PR and it's already terrible.

Copy link
Contributor Author

@lbushi25 lbushi25 May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether it is necessary as part of "core" SYCL but I do know that it is necessary in the files where I have included it, at least to the best of my knowledge. During pre-commit testing, some tests were failing because they were expecting to see my header file as a (direct or indirect) dependency for certain headers. So I added these check lines to make these tests pass. Take, for example, line 78 which checks for device_architecture which is an experimental extension just like forward_progress. Somebody added it here to make certain tests pass. Perhaps this points to a larger systemic issue with the way we've designed our tests/code but that will need to be handled separately in my opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that will need to be handled separately in my opinion.

How do you suggest we find a person who would be refactoring the entire SYCL implementation if all other people would be contributing to the constantly growing compile time using that excuse?

Copy link
Contributor Author

@lbushi25 lbushi25 May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aelovikov-intel Do we have your approval on this?
Otherwise, we can put this on hold while an outline of a standard approach on how to deal with these concerns is documented somewhere. I don't want to resort to ad hoc approaches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if I am wrong, @aelovikov-intel, but your point here is not that including into core.hpp affects users, but rather that we keep core.hpp as bare-bones as possible to help reduce compilation overhead in our tests, as the compiler has to re-parse our headers for even small tests and the less overhead we can have the better.

From a user's perspective, all they know exists from the SYCL spec is <sycl/sycl.hpp> and we make no exception for extensions currently. I would personally love to see that change in the future, but for now sycl.hpp must include everything and core.hpp should include only the essentials.

Copy link
Contributor Author

@lbushi25 lbushi25 May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. To summarize, the changes I've made to core header files in this PR, such as handler.hpp naturally belong to these headers so I cannot make them any more bare-bones than they already are. Conditional inclusion of these changes would be the best way to deal with this, using a command line switch for example, but Greg said in a comment above that this has already been considered and rejected.

Copy link
Contributor

@aelovikov-intel aelovikov-intel May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if I am wrong, @aelovikov-intel, but your point here is not that including into core.hpp affects users, but rather that we keep core.hpp as bare-bones as possible to help reduce compilation overhead in our tests

Kind of, see below.

Perhaps this points to a larger systemic issue with the way we've designed our tests/code but that will need to be handled separately in my opinion.

Yes, there is one, but we do work on it. #12890 introduced <sycl/detail/core.hpp> and later #13210 introduced an in-tree test supposed to keep the growth of the newly introduced header file under control. As such, saying

For now, however, I'm just respecting the status quo as previous extensions have done.

and implying that nothing is getting fixed outside of this PR is incorrect. Note, that it would be impossible for 1-2 people to try to gradually improve things not working full time on that task if other dozens of contributors would just keep bloating handler.hpp/queue.hpp (that are part of detail/core.hpp for obvious reasons) as being done in this PR.

Now, switching to what can be done here. I think you can use some kind of visitor pattern. <oneapi_properties>::has_property has to have a loop inside its implementation. What you can do is this:

handler.hpp:
// SFINAE-visitor
template <Property>
struct ProcessForwardProgressProperty {
  void operator() {
    // do nothing for non-forward-progress property
  }
};
...
 loop over all the properties passed to the method and call the above `operator()` with it.

forward_progress_property:
// define property and provide partial specialization for the above helper.

Also note that it is not the way to address my concerns, just potentially "a" way. It shows, however, that that the includes you're making/code you're bringing into handler.hpp are not as mandatory as you claimed in your early comments.

Copy link
Contributor Author

@lbushi25 lbushi25 May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aelovikov-intel
I discussed this with @AlexeySachkov and we decided to open an issue regarding this: #13774. I've assigned myself to it and will start working on it immediately.
In the meantime, let's proceed with getting this PR merged @intel/llvm-gatekeepers .

@lbushi25
Copy link
Contributor Author

This has introduced post-commit errors which are being addressed by #13793

steffenlarsen added a commit that referenced this pull request May 16, 2024
… extension (#13793)

The merge of #13389 has caused
post-commit failures due to three unused variable warnings.
This PR fixes that. One of these warnings also revealed a bug in my
original PR.

---------

Co-authored-by: Steffen Larsen <[email protected]>
Co-authored-by: Alexey Sachkov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants