Skip to content

[GHA] Uplift Linux GPU RT version to 24.35.30872.22 #15481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

bb-sycl
Copy link
Contributor

@bb-sycl bb-sycl commented Sep 24, 2024

Scheduled drivers uplift

@bb-sycl bb-sycl requested a review from a team as a code owner September 24, 2024 03:08
@sarnex
Copy link
Contributor

sarnex commented Sep 24, 2024

@jsji Any idea what's going on here? Note this is not dev-igc.

@jsji
Copy link
Contributor

jsji commented Sep 24, 2024

@jsji Any idea what's going on here? Note this is not dev-igc.

https://github.com/intel/llvm/actions/runs/11006256785/job/30592910147?pr=15481

sudo -E bash devops/scripts/install_drivers.sh llvm/devops/dependencies.json --all
shell: sh -e {0}
env:
GITHUB_TOKEN: ***
sudo: a terminal is required to read the password; either use the -S option to read from standard input or configure an askpass helper
sudo: a password is required

looks like we failed to install the driver due to sudo password? Any recent changes related to the sudo/password or docker image?

@sarnex
Copy link
Contributor

sarnex commented Sep 24, 2024

let me msg you

@sarnex
Copy link
Contributor

sarnex commented Sep 26, 2024

The problem is actually that we're incorrectly trying to install intel drivers on the NVidia runner, we didn't do that before but it was an unintended side effect to a recent CI change. Udit is working on it.

@uditagarwal97
Copy link
Contributor

The problem is actually that we're incorrectly trying to install intel drivers on the NVidia runner, we didn't do that before but it was an unintended side effect to a recent CI change. Udit is working on it.

PR for fix: #15528

@sarnex
Copy link
Contributor

sarnex commented Sep 27, 2024

@AllanZyne Any ideas about the many asan failures on DG2 here? Note this is using a new GPU driver, newer than the one in normal CI.

********************
Failed Tests (39):
  SYCL :: AddressSanitizer/bad-free/bad-free-host.cpp
  SYCL :: AddressSanitizer/bad-free/bad-free-minus1.cpp
  SYCL :: AddressSanitizer/bad-free/bad-free-plus1.cpp
  SYCL :: AddressSanitizer/common/demangle-kernel-name.cpp
  SYCL :: AddressSanitizer/common/kernel-debug.cpp
  SYCL :: AddressSanitizer/common/option-redzone-size.cpp
  SYCL :: AddressSanitizer/common/options-invalid.cpp
  SYCL :: AddressSanitizer/double-free/double-free.cpp
  SYCL :: AddressSanitizer/misaligned/misalign-int.cpp
  SYCL :: AddressSanitizer/misaligned/misalign-long.cpp
  SYCL :: AddressSanitizer/misaligned/misalign-short.cpp
  SYCL :: AddressSanitizer/multiple-reports/multiple_kernels.cpp
  SYCL :: AddressSanitizer/multiple-reports/one_kernel.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/device_global.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/device_global_image_scope.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/device_global_image_scope_unaligned.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/multi_device_images.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/large_group_size.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_char.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_func.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_int.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_short.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_no_local_size.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/unaligned_shadow_memory.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer_2d.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer_3d.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer_copy_fill.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/subbuffer.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/group_local_memory.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/local_accessor_basic.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/local_accessor_function.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/local_accessor_multiargs.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/multiple_source.cpp
  SYCL :: AddressSanitizer/out-of-bounds/private/multiple_private.cpp
  SYCL :: AddressSanitizer/out-of-bounds/private/single_private.cpp
  SYCL :: AddressSanitizer/use-after-free/quarantine-free.cpp
  SYCL :: AddressSanitizer/use-after-free/quarantine-no-free.cpp
  SYCL :: AddressSanitizer/use-after-free/use-after-free.cpp

@AllanZyne
Copy link
Contributor

@sarnex, I can't reproduce this.

wget https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.17537.20/intel-igc-core_1.0.17537.20_amd64.deb
wget https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.17537.20/intel-igc-opencl_1.0.17537.20_amd64.deb
wget https://github.com/intel/compute-runtime/releases/download/24.35.30872.22/intel-level-zero-gpu-dbgsym_1.3.30872.22_amd64.ddeb
wget https://github.com/intel/compute-runtime/releases/download/24.35.30872.22/intel-level-zero-gpu-legacy1-dbgsym_1.3.30872.22_amd64.ddeb
wget https://github.com/intel/compute-runtime/releases/download/24.35.30872.22/intel-level-zero-gpu-legacy1_1.3.30872.22_amd64.deb
wget https://github.com/intel/compute-runtime/releases/download/24.35.30872.22/intel-level-zero-gpu_1.3.30872.22_amd64.deb
wget https://github.com/intel/compute-runtime/releases/download/24.35.30872.22/intel-opencl-icd-dbgsym_24.35.30872.22_amd64.ddeb
wget https://github.com/intel/compute-runtime/releases/download/24.35.30872.22/intel-opencl-icd-legacy1-dbgsym_24.35.30872.22_amd64.ddeb
wget https://github.com/intel/compute-runtime/releases/download/24.35.30872.22/intel-opencl-icd-legacy1_24.35.30872.22_amd64.deb
wget https://github.com/intel/compute-runtime/releases/download/24.35.30872.22/intel-opencl-icd_24.35.30872.22_amd64.deb
wget https://github.com/intel/compute-runtime/releases/download/24.35.30872.22/libigdgmm12_22.5.0_amd64.deb
wget https://github.com/oneapi-src/level-zero/releases/download/v1.17.44/level-zero-devel_1.17.44+u22.04_amd64.deb
wget https://github.com/oneapi-src/level-zero/releases/download/v1.17.44/level-zero_1.17.44+u22.04_amd64.deb

I installed the above packages, but igc failed to build device code (without enabling sanitizer).

Compilation from IR - skipping loading of FCL
Error! Incompatible interface in IGC: GT_SYS_INFO
Error! IGC initialization failure. Error code = -6
Build failed with error code: -6
Command was: ocloc -file /tmp/parallel_for_char-pvc-6694e1-76748f.spv -output_no_suffix -spirv_input -device pvc -device_options pvc -ze-intel-enable-auto-large-GRF-mode

Do you know any reasons? Thank you!

@AllanZyne
Copy link
Contributor

Besides, according to CI log, "UR_RESULT_ERROR_UNSUPPORTED_FEATURE" means we use some APIs that level zero doesn't support.

Indeed, we use some L0 APIs that other e2e tests may unlikely use (e.g., VirtualMemory related APIs, Global Variable related APIs). Maybe the released gfx package contained some issues on these APIs.

@sarnex
Copy link
Contributor

sarnex commented Oct 1, 2024

I can't repro it locally either, let me try running again.

@sarnex sarnex reopened this Oct 3, 2024
@sarnex
Copy link
Contributor

sarnex commented Oct 4, 2024

@AllanZyne Sorry, I finally investigated this and found the problem. It's only failing for the opencl:gpu run, level_zero:gpu works. It fails on urVirtualMemReserve, and it seems that is unimplemented in UR for the opencl adapter based on here.

So probably we should disable all these tests on opencl:gpu. Does that seem reasonable to you?

Thanks

@AllanZyne
Copy link
Contributor

So probably we should disable all these tests on opencl:gpu. Does that seem reasonable to you?

Yes, it makes sense. We don't support opencl gpu.

@sarnex
Copy link
Contributor

sarnex commented Oct 7, 2024

@AllanZyne PR to disable ocl gpu here #15620

@bader bader closed this Oct 8, 2024
@bader bader deleted the ci/update_gpu_driver-linux-24.35.30872.22 branch October 8, 2024 03:48
sarnex added a commit that referenced this pull request Oct 8, 2024
It's not supported as per
[here](#15481 (comment)).

Signed-off-by: Sarnie, Nick <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants