[Libomptarget] Fix JIT on the NVPTX target by calling ptx manually #77801

jhuber6 · 2024-01-11T17:23:52Z

Summary:
Recently a patch added an assertion in the GlobalHandler to indicate
when an ELF was not used. This began to fire whenever NVPTX JIT was
used, because the JIT pass output a PTX file instead of an ELF. The
CUModuleLoad method consumes .s internally and compiles it to a cubin,
however, this is too late as we perform several checks on the ELF
directly for the presence of certain symbols and to read some necessary
constants. This results in inconsistent behaviour.

To address this, this patch simply calls ptxas manually, similar to
how lld is called for the AMDGPU JIT pass. This is inevitably going to
be slower than simply passing it to the CUDA routine due to the overhead
involved in file IO and a fork call, but it's necessary for correctness.

CUDA provides an API for compiling PTX manually. However, this only
started showing up in CUDA 11.1 and is only provided "officially" in a
static library. The libnvidia-ptxjitcompiler.so next to the CUDA
driver has the same symbols and can likely be used as a replacement.
This would be the faster solution. However, given that it's not
documented it may have some issues.

Summary: Recently a patch added an assertion in the GlobalHandler to indicate when an ELF was not used. This began to fire whenever NVPTX JIT was used, because the JIT pass output a PTX file instead of an ELF. The CUModuleLoad method consumes `.s` internally and compiles it to a cubin, however, this is too late as we perform several checks on the ELF directly for the presence of certain symbols and to read some necessary constants. This results in inconsistent behaviour. To address this, this patch simply calls `ptxas` manually, similar to how `lld` is called for the AMDGPU JIT pass. This is inevitably going to be slower than simply passing it to the CUDA routine due to the overhead involved in file IO and a fork call, but it's necessary for correctness. CUDA provides an API for compiling PTX manually. However, this only started showing up in CUDA 11.1 and is only provided "officially" in a static library. The `libnvidia-ptxjitcompiler.so` next to the CUDA driver has the same symbols and can likely be used as a replacement. This would be the faster solution. However, given that it's not documented it may have some issues.

doru1004

LGTM

…lvm#77801) Summary: Recently a patch added an assertion in the GlobalHandler to indicate when an ELF was not used. This began to fire whenever NVPTX JIT was used, because the JIT pass output a PTX file instead of an ELF. The CUModuleLoad method consumes `.s` internally and compiles it to a cubin, however, this is too late as we perform several checks on the ELF directly for the presence of certain symbols and to read some necessary constants. This results in inconsistent behaviour. To address this, this patch simply calls `ptxas` manually, similar to how `lld` is called for the AMDGPU JIT pass. This is inevitably going to be slower than simply passing it to the CUDA routine due to the overhead involved in file IO and a fork call, but it's necessary for correctness. CUDA provides an API for compiling PTX manually. However, this only started showing up in CUDA 11.1 and is only provided "officially" in a static library. The `libnvidia-ptxjitcompiler.so` next to the CUDA driver has the same symbols and can likely be used as a replacement. This would be the faster solution. However, given that it's not documented it may have some issues.

jhuber6 requested review from dhruvachak, doru1004, jdoerfert, JonChesterfield, shiltian and ye-luo January 11, 2024 17:23

llvmbot added the openmp:libomptarget OpenMP offload runtime label Jan 11, 2024

doru1004 approved these changes Jan 11, 2024

View reviewed changes

jhuber6 merged commit 3ede817 into llvm:main Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Libomptarget] Fix JIT on the NVPTX target by calling ptx manually #77801

[Libomptarget] Fix JIT on the NVPTX target by calling ptx manually #77801

Uh oh!

jhuber6 commented Jan 11, 2024

Uh oh!

doru1004 left a comment

Uh oh!

Uh oh!

[Libomptarget] Fix JIT on the NVPTX target by calling ptx manually #77801

[Libomptarget] Fix JIT on the NVPTX target by calling ptx manually #77801

Uh oh!

Conversation

jhuber6 commented Jan 11, 2024

Uh oh!

doru1004 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!