[SYCL] Improve error reporting for kernel enqueue #860

AlexeySachkov · 2019-11-21T12:19:05Z

The idea is to handle error code returned from PI and based on input
arguments, emit meaningful exception with error explanation.
One of the side-effects of this is that this patch effectively allows to
use non-uniform work-groups if underlying OpenCL supports this
functionality.

s-kanaev · 2019-11-21T12:39:45Z

Is it even OK to put two different functionalities within single pull-request?

bader · 2019-11-21T12:43:00Z

Is it even OK to put two different functionalities within single pull-request?

No. It's not okay. It's recommended to keep pull requests simple and self-contained.

AlexeySachkov · 2019-11-21T12:45:33Z

Is it even OK to put two different functionalities within single pull-request?

No. It's not okay. It's recommended to keep pull requests simple and self-contained.

Could you please point me to where I introduce several new functionalities?

This patch restores support of non-uniform work-groups if underlying
runtime supports them.

This is done by removing part of existing error-checking: I don't see any value to check for the same error twice. Moreover, it will break new test if I leave the old code

s-kanaev · 2019-11-21T12:53:14Z

This is done by removing part of existing error-checking: I don't see any value to check for the same error twice. Moreover, it will break new test if I leave the old code

If support of non-uniform work-groups is required by specification then no support of it is a bug. This bug may obviously be reported and should have a distinct fix. Otherwise, if there must be no such support according to the specification I don't see any obvious reasons to let this support pass to source code.

alexbatashev · 2019-11-21T13:25:47Z

@AlexeySachkov I have a more general question. This WG size logic is valid for OpenCL. But PI does not require you to have an OpenCL backend. And these checks may become invalid. Should we delegate such logic to plugins, so that every plugin implements its very own rules?

AlexeySachkov · 2019-11-21T14:14:41Z

@AlexeySachkov I have a more general question. This WG size logic is valid for OpenCL. But PI does not require you to have an OpenCL backend. And these checks may become invalid. Should we delegate such logic to plugins, so that every plugin implements its very own rules?

Fair enough, but at this point I don't see a clear way how this can be achieved:

There is no special spec for PI: I was told to assume that each PI call that has OpenCL equivalent should behave in this manner. (If this is not true, please correct me)
If I understand correctly, PI has C interface, not C++. So, verbose exceptions I generated, cannot cross the border between PI plugin and SYCL RT
Any PI plugin routine might be called from different places and contexts and it might be not always clear which error to report or whenever to report error at all: some unusual error checking might be required. (speculating here, don't have actual examples)

Tagging @smaslov-intel here

AlexeySachkov · 2019-11-21T14:18:08Z

This is done by removing part of existing error-checking: I don't see any value to check for the same error twice. Moreover, it will break new test if I leave the old code

If support of non-uniform work-groups is required by specification (and it really is as for method parallel_for_work_item) then no support of it is a bug. This bug may obviously be reported and should have a distinct fix.

This is not true, non-uniform work-groups are prohibited by the SYCL spec for any parallel_for variant

smaslov-intel · 2019-11-22T02:33:06Z

This WG size logic is valid for OpenCL. But PI does not require you to have an OpenCL backend. And these checks may become invalid.

Why is the claim that some checks are invalid in other back-ends? Is it because 1) you don't know it is possible to check through L0 (fair concern), or because 2) you know it is impossible? In any case, from now on when adding new PI calls (even new type of arguments to existing PI API) we have to ask ourselves if that's going to work with major other back-ends (L0 and CUDA).

The plan is to:

cover all of SYCL functionality with L0/plugin. If something cannot be covered then we have to have a discussion.
extend DPC++ (over SYCL) with new features and feed them into L0/plugin extending them if needed (not necessarily extending in OpenCL)

alexbatashev · 2019-11-22T05:36:53Z

Why is the claim that some checks are invalid in other back-ends?

This patch checks OpenCL version. Another backend will have another versioning and may not support non-uniform WG size at all. An easy solution would be to check for OpenCL 2.0+ first, and emit error about non-uniform WG size in all other cases. This way we will still be unable to handle situation when user uses custom plugin. The error message may become invalid (or should I say unexpected?), but at least it will conform to the spec.

https://github.com/intel/llvm/pull/860/files#diff-70a2b4f7b47aee28e73b8411d6d7186cR59-R94

AlexeySachkov · 2019-11-22T07:46:26Z

@alexbatashev

This patch checks OpenCL version. Another backend will have another versioning and may not support non-uniform WG size at all.

SYCL 1.2.1 spec explicitly built on top of OpenCL, so, any PI plugin for SYCL 1.2.1 have to emulate/mimic OpenCL (probably even better to say OpenCL 1.2).

An easy solution would be to check for OpenCL 2.0+ first, and emit error about non-uniform WG size in all other cases.

I guess this is the only solution for now. Am I missing other ways?

This way we will still be unable to handle situation when user uses custom plugin.

In any case we don't have any API to detect which PI plugin is being used and which features it supports/which limitations it has. (or at least I'm unaware of such API).

The error message may become invalid (or should I say unexpected?), but at least it will conform to the spec.

https://github.com/intel/llvm/pull/860/files#diff-70a2b4f7b47aee28e73b8411d6d7186cR59-R94

BTW, this one will be good with any PI plugin, I think. But other messages explicitly mentioning OpenCL apparently might look strange if other PI plugins are used.

alexbatashev · 2019-11-22T08:28:01Z

I guess this is the only solution for now. Am I missing other ways?

You're probably right. Until we have a way to report errors in a user-friendly manner from PI plugins, this is the only way.

sycl/source/detail/error_handling/enqueue.cpp

sycl/source/detail/scheduler/commands.cpp

sycl/test/basic_tests/parallel_for_range_host.cpp

sycl/source/detail/error_handling/enqueue.cpp

sycl/source/detail/scheduler/commands.cpp

sycl/test/basic_tests/parallel_for_range.cpp

AlexeySachkov · 2019-11-22T11:16:49Z

Applied most of the comments from @romanovvlad

AlexeySachkov · 2019-11-22T19:27:45Z

Applied rest of the comments from @romanovvlad, fixed some things in test and error handling code

AlexeySachkov · 2019-11-22T20:49:29Z

Improved test to fail if anything unexpected has been caught. Fixed fail on accelerator devices

AlexeySachkov · 2019-11-23T10:52:57Z

Looks like I used incorrect runtime for accelerator device. Adjusted the test to account all known issues in underlying runtimes used in CI.

sycl/source/detail/error_handling/enqueue_kernel.cpp

sycl/source/detail/scheduler/commands.cpp

sycl/test/basic_tests/parallel_for_range_host.cpp

sycl/test/basic_tests/parallel_for_range.cpp

sycl/test/basic_tests/parallel_for_range_host.cpp

sycl/source/detail/scheduler/commands.cpp

s-kanaev

Seems good to me

AlexeySachkov · 2019-11-25T14:55:17Z

Rebased to incorporate the latest PI changes

s-kanaev

Looks good to me.

The idea is to handle error code returned from PI and based on input arguments, emit meaningful exception with error explanation. One of the side-effects of this is that this patch effectively allows to use non-uniform work-groups if underlying OpenCL supports this functionality. Signed-off-by: Alexey Sachkov <[email protected]>

bader

@AlexeySachkov, please, avoid force-pushing. Use merge/push instead. It's easier to track changes this way.

Fixed inconsistency introduced by intel#860 In particular: - `PI_DEVICE_NAME` -> `PI_DEVICE_INFO_NAME` - `PI_DEVICE_MAX_WORK_GROUP_SIZE` -> `PI_DEVICE_INFO_MAX_WORK_GROUP_SIZE` Signed-off-by: Alexey Sachkov <[email protected]>

Fixed inconsistency introduced by #860 In particular: - `PI_DEVICE_NAME` -> `PI_DEVICE_INFO_NAME` - `PI_DEVICE_MAX_WORK_GROUP_SIZE` -> `PI_DEVICE_INFO_MAX_WORK_GROUP_SIZE` Signed-off-by: Alexey Sachkov <[email protected]>

AlexeySachkov requested review from alexbatashev, romanovvlad, sergey-semenov and jcranmer-intel November 21, 2019 12:19

AlexeySachkov assigned sergey-semenov Nov 21, 2019

AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from 23093ab to a1659eb Compare November 21, 2019 12:30

AlexeySachkov requested a review from s-kanaev November 21, 2019 12:35

AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from a1659eb to 044e3da Compare November 21, 2019 15:35

s-kanaev suggested changes Nov 22, 2019

View reviewed changes

romanovvlad reviewed Nov 22, 2019

View reviewed changes

AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from 044e3da to 49b074e Compare November 22, 2019 11:11

AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from 49b074e to 6478a0b Compare November 22, 2019 19:25

AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from 6478a0b to 8a36626 Compare November 22, 2019 20:44

AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from 8a36626 to 81ccea5 Compare November 23, 2019 10:50

romanovvlad requested changes Nov 23, 2019

View reviewed changes

AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from 6777882 to f7ae7f6 Compare November 24, 2019 20:37

s-kanaev suggested changes Nov 25, 2019

View reviewed changes

sycl/source/detail/scheduler/commands.cpp Outdated Show resolved Hide resolved

AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from f7ae7f6 to 22740f7 Compare November 25, 2019 11:26

s-kanaev approved these changes Nov 25, 2019

View reviewed changes

s-kanaev previously approved these changes Nov 25, 2019

View reviewed changes

romanovvlad previously approved these changes Nov 25, 2019

View reviewed changes

AlexeySachkov dismissed stale reviews from romanovvlad and s-kanaev via aad421e November 25, 2019 14:49

AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from 22740f7 to aad421e Compare November 25, 2019 14:49

AlexeySachkov requested review from romanovvlad and s-kanaev November 25, 2019 14:55

s-kanaev previously approved these changes Nov 25, 2019

View reviewed changes

AlexeySachkov dismissed s-kanaev’s stale review via 9eec123 November 25, 2019 19:22

AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from aad421e to 9eec123 Compare November 25, 2019 19:22

AlexeySachkov force-pushed the private/asachkov/improve-enqueue-nd-range-error-reporting branch from 9eec123 to ff116fc Compare November 25, 2019 19:24

bader approved these changes Nov 25, 2019

View reviewed changes

bader merged commit d27cff2 into intel:sycl Nov 25, 2019

AlexeySachkov deleted the private/asachkov/improve-enqueue-nd-range-error-reporting branch December 4, 2019 15:07

sergey-semenov mentioned this pull request Dec 4, 2019

[SYCL] Add runtime support for split device binaries #759

Merged

AlexeySachkov mentioned this pull request Dec 16, 2019

Improve error handling in enqueue_kernel_launch::handleError #935

Closed

AlexeySachkov mentioned this pull request Dec 23, 2019

[SYCL][NFC] align names of pi_device_info enumerators #966

Merged

AlexeySachkov mentioned this pull request Mar 17, 2020

[SYCL] Split invalid work-group size error handling #1216

Merged

AlexeySachkov mentioned this pull request Mar 15, 2023

[SYCL] Throw right errc for work_group_size query #8645

Merged

[SYCL] Improve error reporting for kernel enqueue #860

[SYCL] Improve error reporting for kernel enqueue #860

Uh oh!

Conversation

AlexeySachkov commented Nov 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

s-kanaev commented Nov 21, 2019

Uh oh!

bader commented Nov 21, 2019

Uh oh!

AlexeySachkov commented Nov 21, 2019

Uh oh!

s-kanaev commented Nov 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexbatashev commented Nov 21, 2019

Uh oh!

AlexeySachkov commented Nov 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexeySachkov commented Nov 21, 2019

Uh oh!

smaslov-intel commented Nov 22, 2019

Uh oh!

alexbatashev commented Nov 22, 2019

Uh oh!

AlexeySachkov commented Nov 22, 2019

Uh oh!

alexbatashev commented Nov 22, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexeySachkov commented Nov 22, 2019

Uh oh!

AlexeySachkov commented Nov 22, 2019

Uh oh!

AlexeySachkov commented Nov 22, 2019

Uh oh!

AlexeySachkov commented Nov 23, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

s-kanaev left a comment

Choose a reason for hiding this comment

Uh oh!

AlexeySachkov commented Nov 25, 2019

Uh oh!

s-kanaev left a comment

Choose a reason for hiding this comment

Uh oh!

bader left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AlexeySachkov commented Nov 21, 2019 •

edited

Loading

s-kanaev commented Nov 21, 2019 •

edited

Loading

AlexeySachkov commented Nov 21, 2019 •

edited

Loading