Skip to content

Commit 122f221

Browse files
authored
[CI] Limit AMD E2E tests to 1 thread (#17422)
There's recurring instabilities on the AMD pre-commit runs, everytime they fail two things will happen: * 1 or more test will fail with a memory access fault * 1 or more test will hang and end up timing out This seemingly only happens when running the pre-built E2E tests in parallel. It is quite difficult to debug and could potentially be an issue in the AMD drivers. So as a workaround until we can figure out what's going on, this patch switches the AMD E2E prebuit tests to run in a single thread. This is obviously slower than running the tests in parallel, but because the instability causes hangs that end up hitting the 10 minutes timeout, a one thread run is faster than a failing multi-thread run. So we get consistent runs that are slower but may actually end up going through the job queue faster as they won't be hitting timeouts so often. On a local setup using the same AMD GPU as the CI: * Successful multi-thread run: ~73s * Successful single-thread run: ~255s * Failed multi-thread run: 600s+
1 parent 21ccf55 commit 122f221

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

.github/workflows/sycl-linux-precommit.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@ jobs:
7373
image_options: -u 1001 --device=/dev/dri --device=/dev/kfd
7474
target_devices: hip:gpu
7575
reset_intel_gpu: false
76+
extra_lit_opts: -j 1
7677
- name: Intel Arc A-Series Graphics
7778
runner: '["Linux", "arc"]'
7879
image_options: -u 1001 --device=/dev/dri -v /dev/dri/by-path:/dev/dri/by-path --privileged --cap-add SYS_ADMIN

0 commit comments

Comments
 (0)