[CI] Automatically detect AMD architecture #16071

sarnex · 2024-11-13T14:57:14Z

We can figure it out from the sycl-ls output. Confirmed working here

Signed-off-by: Sarnie, Nick <[email protected]>

uditagarwal97 · 2024-11-13T17:08:21Z

.github/workflows/sycl-linux-run-tests.yml

@@ -284,11 +284,8 @@ jobs:
          echo "opts=$CMAKE_EXTRA_ARGS" >> $GITHUB_OUTPUT
        else
          if [ "${{ contains(inputs.target_devices, 'ext_oneapi_hip')  }}" == "true" ]; then


I am of the opinion that we should move all the runner specific code to the caller instead. For post-commit, in sycl-post-commit: https://github.com/intel/llvm/blob/sycl/.github/workflows/sycl-post-commit.yml#L85, just like what we do for docker image options. In this case, we can pass CMake build options to sycl-link-run-tests using extra_cmake_args.

This isn't really runner specific code, it's target device specific code. Any time this workflow is called targeting HIP we will need this to run. Another downside of moving it up is we would have to duplicate this code in the pre and postcommit callsites, or make another workflow that implements this script in call it in both places.

My initial thoughts were the same as @uditagarwal97 suggested. I'm not sure if that duplication is enough to convince me to do things differently, but I don't have strong opinion about it anyway.

Looking now, even if we wanted to do it at the callsite, I don't think it will work if I understand what's happening correctly. So like here or below here, I don't think this runs on the runner that will run the test, I think it runs on the machine running the larger workflow. We need this command to run on the machine with the AMD GPU, so I'm not sure how we could move it up a layer. Also, since we need to run sycl-ls, we need the toolchain downloaded, which I don't think is available at the callsites either.

uditagarwal97

LGTM

ayylol · 2024-11-13T18:09:36Z

Should this not be done in the e2e python scripts? that way this issue is not only resolved for the ci, but also when running the e2e tests locally?

also note we have this comment suggesting that the way we are setting amd_arch is a temporary solution. As will as this strange bit that seems to do autodetection for amd_arch, but only for adding the "gpu-amd-gfx90a feature.

sarnex · 2024-11-13T18:14:15Z

Sorry, I should have thought about it more rather than do the minimum to get it working :)

I'll move it to the py script and update this PR, thanks for the idea.

Signed-off-by: Sarnie, Nick <[email protected]>

gonna completely rework pr

sarnex · 2024-11-13T21:42:47Z

sycl/test-e2e/Matrix/joint_matrix_hip_gfx90a.cpp

@@ -6,10 +6,10 @@
 //
 //===----------------------------------------------------------------------===//

-// RUN: %{build} -fsycl -fsycl-targets=amd_gpu_gfx90a %s -o %t.out
+// RUN: %clangxx -fsycl -fsycl-targets=amd_gpu_gfx90a %s -o %t.out


This change is a bug fix, we shouldn't use %{build} if we're setting -fsycl-targets/choosing file and output

sarnex · 2024-11-13T21:43:04Z

sycl/test-e2e/Matrix/joint_matrix_hip_gfx90a.cpp

 // RUN: %{run} %t.out

-// REQUIRES: gpu-amd-gfx90a
+// REQUIRES: arch-amd_gpu_gfx90a


this change is just to remove a hack in the testing infra, should be exactly the same functionally

hey, just wondering if you were able to build and run these gfx90a tests. AFAIK we don't have these in our CI, so I don't think they ever ran.

yeah i actually tested on the one AMD machine i know not in our CI and it happens to be gfx90a (MI250)

sarnex · 2024-11-13T21:43:35Z

sycl/test-e2e/lit.cfg.py

+        amd_arch_prefix = "arch-amd_gpu_"
+        amd_device_arch = [i for i in config.sycl_dev_features["hip:gpu"] if amd_arch_prefix in i]
+        if len(amd_device_arch) == 0:
+            lit_config.error(


i had to move this block down so the sycl-ls parsing we already do is available, but that meant i had to add a loop at the end to update the features

YuriPlyakhin · 2024-11-13T21:48:40Z

sycl/test-e2e/Matrix/runtime_query_hip_gfx90a.cpp

-// REQUIRES: gpu-amd-gfx90a
-// RUN: %{build} -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx90a -o %t.out
+// REQUIRES: arch-amd_gpu_gfx90a
+// RUN: %clangxx -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx90a -o %t.out


should we add -fsycl?

good catch, fixed in latest commit

Signed-off-by: Sarnie, Nick <[email protected]>

sarnex · 2024-11-13T23:44:03Z

@YuriPlyakhin Do you mind re-reviewing, I found a missed test fix.

sarnex · 2024-11-13T23:44:22Z

@aelovikov-intel Just pushed the commit moving it into the existing loop, it's not too bad I don't think

aelovikov-intel · 2024-11-14T00:47:05Z

sycl/test-e2e/lit.cfg.py

+            for a in architecture_feature:
+                arch = a
+            amd_arch_prefix = "arch-amd_gpu_"
+            if amd_arch_prefix not in arch or len(architecture_feature) != 1:
+                lit_config.error(
+                    "Cannot detect architecture for AMD HIP device, specify it explicitly"
+                )
+            config.amd_arch = arch.replace(amd_arch_prefix, "")


Is there a bug with indentation here? Otherwise I don't see a reason behind the loop at line 767...

It's because architecture_feature is a set and there's no easy way to get the first (and only, because of the error check) element out I think, indexing doesn't work. Another option is next(iter(architecture_feature)) I think.

But I'm not good at python so maybe there's something I missed

@uditagarwal97 @ayylol might know also

# Guaranteed to be a single element in the set arch = [x for x in architecture_feature][0]

Or better yet not use set to start with :)

thanks, ill fix it.

the existing code is using a set for this existing var, probably for some reason, and i dont want to touch it :P

Yeah, I think it shouldn't have been a set at all. Someone probably just copied aspects/sg_sizes but that wasn't a good idea.

tale as old as time

Signed-off-by: Sarnie, Nick <[email protected]>

YuriPlyakhin

LGTM

We can figure it out from the `sycl-ls` output. Confirmed working [here](https://github.com/intel/llvm/actions/runs/11841045635/job/32998817316?pr=16071) Closes: #16057 --------- Signed-off-by: Sarnie, Nick <[email protected]>

Unused after intel#16071.

Unused after #16071.

sarnex had a problem deploying to WindowsCILock November 13, 2024 14:57 — with GitHub Actions Error

[CI] Automatically detect AMD architecture

2d28503

Signed-off-by: Sarnie, Nick <[email protected]>

sarnex force-pushed the sycl-devops-pr/sarnex/amd branch from bfdaf4e to 2d28503 Compare November 13, 2024 15:32

sarnex had a problem deploying to WindowsCILock November 13, 2024 15:32 — with GitHub Actions Error

sarnex temporarily deployed to WindowsCILock November 13, 2024 15:32 — with GitHub Actions Inactive

sarnex temporarily deployed to WindowsCILock November 13, 2024 16:33 — with GitHub Actions Inactive

sarnex marked this pull request as ready for review November 13, 2024 16:56

sarnex requested a review from a team as a code owner November 13, 2024 16:56

sarnex requested a review from uditagarwal97 November 13, 2024 16:56

uditagarwal97 reviewed Nov 13, 2024

View reviewed changes

uditagarwal97 previously approved these changes Nov 13, 2024

View reviewed changes

sarnex added 3 commits November 13, 2024 12:09

do it in the testing infra instead

9f48bd6

Signed-off-by: Sarnie, Nick <[email protected]>

Merge remote-tracking branch 'upstream/sycl' into sycl

22641d9

fix UNSUPPORTED

a01f872

Signed-off-by: Sarnie, Nick <[email protected]>

sarnex requested review from a team as code owners November 13, 2024 21:41

sarnex requested a review from cperkinsintel November 13, 2024 21:41

sarnex had a problem deploying to WindowsCILock November 13, 2024 21:41 — with GitHub Actions Error

sarnex requested review from ayylol and uditagarwal97 November 13, 2024 21:41

sarnex had a problem deploying to WindowsCILock November 13, 2024 21:42 — with GitHub Actions Error

sarnex commented Nov 13, 2024

View reviewed changes

sarnex had a problem deploying to WindowsCILock November 13, 2024 21:44 — with GitHub Actions Error

YuriPlyakhin reviewed Nov 13, 2024

View reviewed changes

format

8494ca5

Signed-off-by: Sarnie, Nick <[email protected]>

sarnex had a problem deploying to WindowsCILock November 13, 2024 21:49 — with GitHub Actions Error

sarnex had a problem deploying to WindowsCILock November 13, 2024 22:49 — with GitHub Actions Error

move to existing loop

7db9bff

Signed-off-by: Sarnie, Nick <[email protected]>

sarnex had a problem deploying to WindowsCILock November 13, 2024 23:20 — with GitHub Actions Error

Merge remote-tracking branch 'upstream/sycl' into sycl

0061488

sarnex had a problem deploying to WindowsCILock November 13, 2024 23:38 — with GitHub Actions Error

format

b666f7b

Signed-off-by: Sarnie, Nick <[email protected]>

sarnex temporarily deployed to WindowsCILock November 13, 2024 23:43 — with GitHub Actions Inactive

sarnex temporarily deployed to WindowsCILock November 14, 2024 00:23 — with GitHub Actions Inactive

aelovikov-intel reviewed Nov 14, 2024

View reviewed changes

sarnex temporarily deployed to WindowsCILock November 14, 2024 01:11 — with GitHub Actions Inactive

sarnex requested a review from YuriPlyakhin November 14, 2024 15:22

aelovikov-intel approved these changes Nov 14, 2024

View reviewed changes

clean up set extraction

75f90f9

Signed-off-by: Sarnie, Nick <[email protected]>

sarnex had a problem deploying to WindowsCILock November 14, 2024 16:08 — with GitHub Actions Error

sarnex temporarily deployed to WindowsCILock November 14, 2024 16:08 — with GitHub Actions Inactive

sarnex temporarily deployed to WindowsCILock November 14, 2024 16:09 — with GitHub Actions Inactive

YuriPlyakhin approved these changes Nov 14, 2024

View reviewed changes

sarnex temporarily deployed to WindowsCILock November 14, 2024 17:16 — with GitHub Actions Inactive

sarnex temporarily deployed to WindowsCILock November 14, 2024 17:44 — with GitHub Actions Inactive

cperkinsintel approved these changes Nov 14, 2024

View reviewed changes

sarnex merged commit 73b0775 into sycl Nov 14, 2024
22 of 27 checks passed

sarnex deleted the sycl-devops-pr/sarnex/amd branch November 14, 2024 20:30

aelovikov-intel added a commit to aelovikov-intel/llvm that referenced this pull request Jan 17, 2025

[CI] Remove always empty matrix.extra_cmake_args

20e238c

Unused after intel#16071.

aelovikov-intel mentioned this pull request Jan 17, 2025

[CI] Remove always empty matrix.extra_cmake_args #16674

Merged

aelovikov-intel added a commit that referenced this pull request Jan 17, 2025

[CI] Remove always empty matrix.extra_cmake_args (#16674)

98c0d5d

Unused after #16071.

[CI] Automatically detect AMD architecture #16071

[CI] Automatically detect AMD architecture #16071

Uh oh!

Conversation

sarnex commented Nov 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sarnex Nov 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sarnex Nov 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

uditagarwal97 left a comment

Choose a reason for hiding this comment

Uh oh!

ayylol commented Nov 13, 2024

Uh oh!

sarnex commented Nov 13, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sarnex Nov 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sarnex Nov 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sarnex commented Nov 13, 2024

Uh oh!

sarnex commented Nov 13, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sarnex Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aelovikov-intel Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sarnex Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YuriPlyakhin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sarnex commented Nov 13, 2024 •

edited

Loading

sarnex Nov 13, 2024 •

edited

Loading

sarnex Nov 13, 2024 •

edited

Loading

sarnex Nov 13, 2024 •

edited

Loading

sarnex Nov 13, 2024 •

edited

Loading

sarnex Nov 14, 2024 •

edited

Loading

aelovikov-intel Nov 14, 2024 •

edited

Loading

sarnex Nov 14, 2024 •

edited

Loading