Skip to content

[SYCL] Improve error reporting for kernel enqueue #860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

AlexeySachkov
Copy link
Contributor

@AlexeySachkov AlexeySachkov commented Nov 21, 2019

The idea is to handle error code returned from PI and based on input
arguments, emit meaningful exception with error explanation.
One of the side-effects of this is that this patch effectively allows to
use non-uniform work-groups if underlying OpenCL supports this
functionality.

@s-kanaev
Copy link
Contributor

Is it even OK to put two different functionalities within single pull-request?

@bader
Copy link
Contributor

bader commented Nov 21, 2019

Is it even OK to put two different functionalities within single pull-request?

No. It's not okay. It's recommended to keep pull requests simple and self-contained.

@AlexeySachkov
Copy link
Contributor Author

Is it even OK to put two different functionalities within single pull-request?

No. It's not okay. It's recommended to keep pull requests simple and self-contained.

Could you please point me to where I introduce several new functionalities?

This patch restores support of non-uniform work-groups if underlying
runtime supports them.

This is done by removing part of existing error-checking: I don't see any value to check for the same error twice. Moreover, it will break new test if I leave the old code

@s-kanaev
Copy link
Contributor

s-kanaev commented Nov 21, 2019

This is done by removing part of existing error-checking: I don't see any value to check for the same error twice. Moreover, it will break new test if I leave the old code

If support of non-uniform work-groups is required by specification then no support of it is a bug. This bug may obviously be reported and should have a distinct fix. Otherwise, if there must be no such support according to the specification I don't see any obvious reasons to let this support pass to source code.

@alexbatashev
Copy link
Contributor

@AlexeySachkov I have a more general question. This WG size logic is valid for OpenCL. But PI does not require you to have an OpenCL backend. And these checks may become invalid. Should we delegate such logic to plugins, so that every plugin implements its very own rules?

@AlexeySachkov
Copy link
Contributor Author

AlexeySachkov commented Nov 21, 2019

@AlexeySachkov I have a more general question. This WG size logic is valid for OpenCL. But PI does not require you to have an OpenCL backend. And these checks may become invalid. Should we delegate such logic to plugins, so that every plugin implements its very own rules?

Fair enough, but at this point I don't see a clear way how this can be achieved:

  • There is no special spec for PI: I was told to assume that each PI call that has OpenCL equivalent should behave in this manner. (If this is not true, please correct me)
  • If I understand correctly, PI has C interface, not C++. So, verbose exceptions I generated, cannot cross the border between PI plugin and SYCL RT
  • Any PI plugin routine might be called from different places and contexts and it might be not always clear which error to report or whenever to report error at all: some unusual error checking might be required. (speculating here, don't have actual examples)

Tagging @smaslov-intel here

@AlexeySachkov
Copy link
Contributor Author

This is done by removing part of existing error-checking: I don't see any value to check for the same error twice. Moreover, it will break new test if I leave the old code

If support of non-uniform work-groups is required by specification (and it really is as for method parallel_for_work_item) then no support of it is a bug. This bug may obviously be reported and should have a distinct fix.

This is not true, non-uniform work-groups are prohibited by the SYCL spec for any parallel_for variant

@AlexeySachkov AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from a1659eb to 044e3da Compare November 21, 2019 15:35
@smaslov-intel
Copy link
Contributor

This WG size logic is valid for OpenCL. But PI does not require you to have an OpenCL backend. And these checks may become invalid.

Why is the claim that some checks are invalid in other back-ends? Is it because 1) you don't know it is possible to check through L0 (fair concern), or because 2) you know it is impossible? In any case, from now on when adding new PI calls (even new type of arguments to existing PI API) we have to ask ourselves if that's going to work with major other back-ends (L0 and CUDA).

The plan is to:

  1. cover all of SYCL functionality with L0/plugin. If something cannot be covered then we have to have a discussion.
  2. extend DPC++ (over SYCL) with new features and feed them into L0/plugin extending them if needed (not necessarily extending in OpenCL)

@alexbatashev
Copy link
Contributor

Why is the claim that some checks are invalid in other back-ends?

This patch checks OpenCL version. Another backend will have another versioning and may not support non-uniform WG size at all. An easy solution would be to check for OpenCL 2.0+ first, and emit error about non-uniform WG size in all other cases. This way we will still be unable to handle situation when user uses custom plugin. The error message may become invalid (or should I say unexpected?), but at least it will conform to the spec.

https://github.com/intel/llvm/pull/860/files#diff-70a2b4f7b47aee28e73b8411d6d7186cR59-R94

@AlexeySachkov
Copy link
Contributor Author

@alexbatashev

This patch checks OpenCL version. Another backend will have another versioning and may not support non-uniform WG size at all.

SYCL 1.2.1 spec explicitly built on top of OpenCL, so, any PI plugin for SYCL 1.2.1 have to emulate/mimic OpenCL (probably even better to say OpenCL 1.2).

An easy solution would be to check for OpenCL 2.0+ first, and emit error about non-uniform WG size in all other cases.

I guess this is the only solution for now. Am I missing other ways?

This way we will still be unable to handle situation when user uses custom plugin.

In any case we don't have any API to detect which PI plugin is being used and which features it supports/which limitations it has. (or at least I'm unaware of such API).

The error message may become invalid (or should I say unexpected?), but at least it will conform to the spec.

https://github.com/intel/llvm/pull/860/files#diff-70a2b4f7b47aee28e73b8411d6d7186cR59-R94

BTW, this one will be good with any PI plugin, I think. But other messages explicitly mentioning OpenCL apparently might look strange if other PI plugins are used.

@alexbatashev
Copy link
Contributor

I guess this is the only solution for now. Am I missing other ways?

You're probably right. Until we have a way to report errors in a user-friendly manner from PI plugins, this is the only way.

@AlexeySachkov AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from 044e3da to 49b074e Compare November 22, 2019 11:11
@AlexeySachkov
Copy link
Contributor Author

Applied most of the comments from @romanovvlad

@AlexeySachkov AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from 49b074e to 6478a0b Compare November 22, 2019 19:25
@AlexeySachkov
Copy link
Contributor Author

Applied rest of the comments from @romanovvlad, fixed some things in test and error handling code

@AlexeySachkov AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from 6478a0b to 8a36626 Compare November 22, 2019 20:44
@AlexeySachkov
Copy link
Contributor Author

Improved test to fail if anything unexpected has been caught. Fixed fail on accelerator devices

@AlexeySachkov AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from 8a36626 to 81ccea5 Compare November 23, 2019 10:50
@AlexeySachkov
Copy link
Contributor Author

Looks like I used incorrect runtime for accelerator device. Adjusted the test to account all known issues in underlying runtimes used in CI.

@AlexeySachkov AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from 6777882 to f7ae7f6 Compare November 24, 2019 20:37
@AlexeySachkov AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from f7ae7f6 to 22740f7 Compare November 25, 2019 11:26
s-kanaev
s-kanaev previously approved these changes Nov 25, 2019
Copy link
Contributor

@s-kanaev s-kanaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me

romanovvlad
romanovvlad previously approved these changes Nov 25, 2019
@AlexeySachkov AlexeySachkov dismissed stale reviews from romanovvlad and s-kanaev via aad421e November 25, 2019 14:49
@AlexeySachkov AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from 22740f7 to aad421e Compare November 25, 2019 14:49
@AlexeySachkov
Copy link
Contributor Author

Rebased to incorporate the latest PI changes

s-kanaev
s-kanaev previously approved these changes Nov 25, 2019
Copy link
Contributor

@s-kanaev s-kanaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@AlexeySachkov AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from aad421e to 9eec123 Compare November 25, 2019 19:22
The idea is to handle error code returned from PI and based on input
arguments, emit meaningful exception with error explanation.
One of the side-effects of this is that this patch effectively allows to
use non-uniform work-groups if underlying OpenCL supports this
functionality.

Signed-off-by: Alexey Sachkov <[email protected]>
@AlexeySachkov AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from 9eec123 to ff116fc Compare November 25, 2019 19:24
Copy link
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexeySachkov, please, avoid force-pushing. Use merge/push instead. It's easier to track changes this way.

@bader bader merged commit d27cff2 into intel:sycl Nov 25, 2019
@AlexeySachkov AlexeySachkov deleted the private/asachkov/improve-enqueue-nd-range-error-reporting branch December 4, 2019 15:07
AlexeySachkov added a commit to AlexeySachkov/llvm that referenced this pull request Dec 24, 2019
Fixed inconsistency introduced by intel#860

In particular:
- `PI_DEVICE_NAME` -> `PI_DEVICE_INFO_NAME`
- `PI_DEVICE_MAX_WORK_GROUP_SIZE` -> `PI_DEVICE_INFO_MAX_WORK_GROUP_SIZE`

Signed-off-by: Alexey Sachkov <[email protected]>
romanovvlad pushed a commit that referenced this pull request Dec 25, 2019
Fixed inconsistency introduced by #860

In particular:
- `PI_DEVICE_NAME` -> `PI_DEVICE_INFO_NAME`
- `PI_DEVICE_MAX_WORK_GROUP_SIZE` -> `PI_DEVICE_INFO_MAX_WORK_GROUP_SIZE`

Signed-off-by: Alexey Sachkov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants