Skip to content

[SYCL] Fix SIGKILL after setting large WG sizes #2641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 15, 2020
Merged

[SYCL] Fix SIGKILL after setting large WG sizes #2641

merged 1 commit into from
Oct 15, 2020

Conversation

dm-vodopyanov
Copy link
Contributor

This patch fixes SIGKILL (out of memory error) caused by large number
of global work size. Now, if work group size wasn't specified in the
kernel source or IL, WGSize sets to {1, 1, 1}.

This patch fixes SIGKILL (out of memory error) caused by large number
of global work size. Now, if work group size wasn't specified in the
kernel source or IL, `WGSize` sets to `{1, 1, 1}`.
@dm-vodopyanov
Copy link
Contributor Author

/summary:run

Copy link
Contributor

@alexbatashev alexbatashev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bader bader merged commit 4d76de4 into intel:sycl Oct 15, 2020
@kbobrovs
Copy link
Contributor

@bader, @dm-vodopyanov
I don't agree with the fix, as users frequently let RT figure out the WG size. Which means performance will likely suffer.

@bader
Copy link
Contributor

bader commented Oct 15, 2020

If I understand correctly, this patch fixes functional problem. In general it's better to have slowly working application than broken application.

Do you have better solution in mind?

@kbobrovs
Copy link
Contributor

In general it's better to have slowly working application than broken application.

it is hard to disagree. But the fixes can different. To me it seems the problem is not in the removed code, but in the device/kernel property APIs reporting incorrect values.
If some urgent W/A is needed, then I'd suggest to add hard WG size limit - e.g. 64. Or smaller value if 64 still causes the problem.
Then continue to investigate.

@bader
Copy link
Contributor

bader commented Oct 15, 2020

If some urgent W/A is needed, then I'd suggest to add hard WG size limit - e.g. 64. Or smaller value if 64 still causes the problem.
Then continue to investigate.

That's exactly what Dmitry did, isn't it? He used 1 WG size limit instead of 64, which seems to be most portable value.

@dm-vodopyanov
Copy link
Contributor Author

@kbobrovs I agree that this is controversial patch and performance may be lower. It reproduces only on OpenCL FPGA Emu on the specific machine. The minimum value from WGSize1D and MaxWGSizes[2] (in our case it is WGSize1D) is correct but we are running out of memory. It could happen on any machine someday. I find 1 the safest way to get rid of the problem. OpenCL FPGA Emu developers don't expect such huge number in WG size when calling clEnqueueNDRangeKernel. Anyway, I can contact with them again and double-check can OpenCL runtime be improved to handle such situation.

@kbobrovs
Copy link
Contributor

That's exactly what Dmitry did, isn't it? He used 1 WG size limit instead of 64, which seems to be most portable value.

It definitely isn't. And it will likely make a difference for performance.

I agree that this is controversial patch and performance may be lower. It reproduces only on OpenCL FPGA Emu on the specific machine.

It seems that are we sacrificing performance on e.g. GPU hacking around potential FPGA Emu bug. Can at least device be checked if it is accelerator before using {1,1,1}? A TODO be added as well?

@alexbatashev
Copy link
Contributor

We can move responsibility of picking the right WG size from SYCL RT to particular plugins. IIRC, OpenCL spec allows for such behavior. Other plugins may implement more platform-specific logic to achieve better performance.

kbenzie pushed a commit to kbenzie/intel-llvm that referenced this pull request Feb 17, 2025
[CTS] add simple test that combines kernel launch and memcpy
Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025
[CTS] add simple test that combines kernel launch and memcpy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants