Get rid of post-commit testing #14383

ldrumm · 2024-07-02T14:15:15Z

This is an experiment to try and find a good balance of reliable test infrastructure that can't be ignored during merge, tests enough, and runs quickly.

It's based on the following principle:
Pull request testing should ask and answer the question

If merged, would the proposed change represent a regression to the
project?

It should answer it to a high confidence, in as short a time as possible It should not:

test things that are unrelated to the user's code
rely on dynamic data or unstable data beyond the author's control (e.g. last night's build results)
be obviously incomplete

With the above principles in mind the jobs here are optimized to maximise diversity coverage on available runners, while also minimizing the turnaround time to know whether a merge request is not a regression. There is a conflict here: high coverage relies on lots of hardware, and takes lots of time, but since a long turnaround time discourages contribution we have to make a very fine judgment of what is reasonable. Here we do this by building once per native platform toolchain (with potentially different configurations to broaden build coverage, and afterwards running subsets of tests on each host's available adaptors.

Mac OS = Apple Clang + libc++; DSO-linkage release build + assertions Win32 = MSVC; release build - no assertions
Linux = gcc + libstdc++ release build + assertions

This allows us to avoid building with both clang and gcc on a Linux platform - where the coverage gained is marginal (clang has slightly different diagnostics but otherwise doesn't really exercise the input code in a different manner to gcc) - but building with clang on Mac and gcc on Linux does carry a more significant chance of identifying a build regression. Since by default we enable -Werror in configuration, this allows us also to minimize the number of warnings merged, but still avoid doing multiple builds per platform.

In summary we're not really interested in testing the system compiler, but instead ensuring that a reasonable matrix of configurations (static vs dynamic vs noassert vs assert) do not regress.

We hopefully then answer the original question with reasonable confidence in correctness without the overhead of a brute-force combinations matrix.

We make the following assumptions:

the system compiler and standard library are sane. It's tempting to run the end to end tests with multiple host compilers, but divergence in results here most likely indicates a host compiler bug, or bad behaviour in the runtime, which would show up regardless of host codegen. So when the end to end tests fail, they fail due to the user's code, rather than a test of the system compiler.

Dogfooding is a noble goal, but it's not suitable for merge request testing, since it introduces an instability for which the author of the PR is not responsible. Thus: we hoist dogfooding into the nightly testing which doesn't affect a user's chance of reasonably getting a patch merged before the heat death of the universe.

This is an experiment to try and find a good balance of reliable test infrastructure that can't be ignored during merge, tests enough, and runs quickly. It's based on the following principle: Pull request testing should ask and answer the question > If merged, would the proposed change represent a regression to the project? It should answer it to a high confidence, in as short a time as possible It should not: - test things that are unrelated to the user's code - rely on dynamic data or unstable data beyond the author's control (e.g. last night's build results) - be obviously incomplete With the above principles in mind the jobs here are optimized to maximise diversity coverage on available runners, while also minimizing the turnaround time to know whether a merge request is not a regression. There is a conflict here: high coverage relies on lots of hardware, and takes lots of time, but since a long turnaround time discourages contribution we have to make a very fine judgment of what is reasonable. Here we do this by building *once* per native platform toolchain (with potentially differenct configurations to broaden *build* coverage, and afterwards running subsets of tests on each host's available adaptors Mac OS = Apple Clang + libc++; DSO-linkage release build + assertions Win32 = MSVC; release build - no assertions Linux = gcc + libstdc++ release build + assertions This allows us to avoid building with both clang and gcc on a Linux platform - where the coverage gained is marginal (clang has slightly different diagnostics but otherwise doesn't really exercise the input code in a different manner to gcc) - but building with clang on Mac and gcc on Linux does carry a more significant chance of identifying a build regression. Since by default we enable `-Werror` in configuration, this allows us also to minimize the number of warnings merged, but still avoid doing multiple builds per platform. In summary we're not really interested in testing the system compiler, but instead ensuring that a reasonable matrix of configurations (static vs dynamic vs noassert vs assert) do not regress. We hopefully then answer the original question *reasonable confidence in correctness* without the overhead of a brute-force combinations matrix. We make the following assumptions: - the system compiler and standard library are sane. It's tempting to run the end to end tests with multiple host compilers, but divergence in results here most likely indicates a host compiler bug, or bad behaviour in the runtime, which would show up regardless of host codegen. So when the end to end tests fail, they fail due to the user's code, rather than a test of the system compiler. Dogfooding is a noble goal, but it's not suitable for merge request testing, since it introduces an instability for which the author of the PR is *not responsible*. Thus: we hoist dogfooding into the nightly testing which doesn't affect a user's chance of reasonably getting a patch merged before the heat death of the universe.

ldrumm · 2024-07-02T15:49:47Z

Thus: we hoist dogfooding into the nightly testing

Looks like I missed that step. TODO: Add a self-built run to the nightly

sarnex

Initial comments, will do more in-depth review once the discussed changes are added

sarnex · 2024-07-02T15:54:42Z

.github/workflows/sycl-nightly.yml

@@ -15,7 +15,14 @@ jobs:
    with:
      build_cache_root: "/__w/"
      build_artifact_suffix: default
-      build_configure_extra_args: '--hip --cuda'
+      build_configure_extra_args:


Should we build precommit linux with the same flags (It looks like we don't but maybe I can't read)

I added this for more diversity i.e. pre-commit is static build (llvm config default), and then nightly testing gets exercised with a shared config. I'm not married to it, but I figured it was any easy way to increase coverage

One issue we actually do see relatively frequently is someone breaking the shared library build (usually missing a library dep in the cmake file) and today that is usually caught in post commit because that does shared lib build. With this, we would only catch it in the nightly.

However I think it's reasonable to assume that most developers are using the default config (configure.py) and assuming we have someone guarneteed to be checking the nightly and quickly report and/or fix issues, I don't have a huge problem with this

We could swap it round, so pre-commit testing catches the sharelib issues?

I wonder if it's possible to break static library build and not shared library build. If that's extremely unlikely such that either both are broken or shlib is broken, doing shlib in precommit would be fine with me.

We probably should just drop "sharedlib" build configuration. The primary use case for this configuration is to speed-up incremental build of LLVM libraries. This build configuration is not use in the production environment - it's used only by LLVM developers. If everyone uses standard build, there is no point to build "sharedlib" configuration in CI.

Personally I use it rather frequently and I would be annoyed if it was failing and not caught in CI.

Our downstream uses it a lot. It would be very bad if the issues will only be discovered when syncing downstream with upstream and not in upstream CI.

sarnex · 2024-07-02T15:55:20Z

.github/workflows/sycl-nightly.yml

+          - name: Perf tests on Intel Arc A-Series Graphics system
+            runner: '["Linux", "arc"]'
+            env: '{"LIT_FILTER":"PerformanceTests/"}'
+            extra_lit_opts: -a -j 1 --param enable-perf-tests=True


I think we need --param gpu-intel-dg2=True

Gotcha. Will fix

sarnex · 2024-07-02T15:57:11Z

.github/workflows/sycl-precommit.yml

+          - name: E2E on Intel GEN12 Graphics; OpenCL GPU, OpenCL FPGA
+            runner: '["Linux", "gen12"]'
+            extra_lit_opts: --param gpu-intel-gen12=True
+            target_devices: opencl:gpu;opencl:fpga


This might not be an issue introduced by this PR and may already exist, but I think having gpu-intel-gen12=True and multiple target devices means that the variable is true for all target device (unless we somehow set it to false in the lit python scripts if it's not gpu). Probably we only want that variable to be true for GPU targets.

@aelovikov-intel Any idea?

There is a parallel activity to auto-detect architecture. I think that would solve the issue, but I don't remember the status of it.

It should be already available: #13976

Cool, thanks! Probably we need to either map the results there to the gpu-intel-* we have been using, or the other way around. Having two ways to specify the gpu seems bad.

Cool, thanks! Probably we need to either map the results there to the gpu-intel-* we have been using, or the other way around. Having two ways to specific the gpu seems bad.

That's the "in progress" part that I'm unaware about the status of. @AlexeySachkov , someone from your team is looking into it, right?

aelovikov-intel · 2024-07-02T16:50:56Z

.github/workflows/sycl-nightly.yml

@@ -80,6 +87,37 @@ jobs:
            image_options: -u 1001 --device=/dev/dri --privileged --cap-add SYS_ADMIN
            target_devices: opencl:cpu
            tests_selector: cts
+
+          - name: E2E tests with dev igc on Intel Arc A-Series Graphics


Please update your commit message with the summary of changes related to Arc GPU testing. @sarnex and @YuriPlyakhin would have to approve the changes to the devigc usage.

It seems as part of this PR we stop doing arc dev igc testing in precommit and only do it in the nightly. This is fine with me but we need @jsji's approval, he worked on the dev igc CI and is more familiar with the use case.

@ldrumm , why do you suggest to remove dev igc testing from precommit?

I think the time spent in running the dev igc testing is fairly small, we should try to keep it in pre-commit if it is possible -- moving it from pre-commit to post-commit (nightly) would make things complicated unless we can identify the culprit commit and ping author automatically in post-commit (nightly).

+1 to what @jsji said

The time testing the dev driver might be small but it results in an instability. The principle here is to have a stable testing platform where the user's PR is tested against a stable config, not where a users PR is tested against unstable GPU drivers. If we're to have any unstable test plastforms available in pre-commit they should be explicitly opt-in only

It basically boils down to "GPU driver developers should do their own testing, not experiment on dpc++ contributors"

Thanks @ldrumm , understand the principle. However, the dev igc testing is not trying to do testing for GPU driver, but rather doing the continuous integration tests for Joint Matrix/ESIMD features, which require IGC changes time-to-time. The testing already restricted to Joint Matrix and ESIMD only.

Also, @ldrumm , dev igc driver is being updated only after Joint Matrix/ESIMD E2E tests passed with it, so it is stable in context of precommit testing.

aelovikov-intel · 2024-07-02T16:55:15Z

.github/workflows/sycl-linux-build.yml

@@ -14,7 +14,7 @@ on:
      build_image:
        type: string
        required: false
-        default: "ghcr.io/intel/llvm/ubuntu2204_build:latest"
+        default: "ghcr.io/intel/llvm/ubuntu2204_build:latest-0300ac924620a51f76c4929794637b82790f12ab"


Why this? At the very least, it should have a comment in the code explaining it.

This is the same as the other places that use this tag.

$ ag 0300ac92462 -A 1 -B 1 workflows/sycl-linux-build.yml 16- required: false 17: default: "ghcr.io/intel/llvm/ubuntu2204_build:latest-0300ac924620a51f76c4929794637b82790f12ab" 18- build_ref: workflows/sycl-precommit.yml 81- runner: '["Linux", "amdgpu"]' 82: image: ghcr.io/intel/llvm/ubuntu2204_build:latest-0300ac924620a51f76c4929794637b82790f12ab 83- image_options: -u 1001 --device=/dev/dri --device=/dev/kfd workflows/sycl-linux-precommit-aws.yml 66- runner: '["aws_cuda-${{ github.event.workflow_run.id }}-${{ github.event.workflow_run.run_attempt }}"]' 67: image: ghcr.io/intel/llvm/ubuntu2204_build:latest-0300ac924620a51f76c4929794637b82790f12ab 68- image_options: -u 1001 --gpus all --cap-add SYS_ADMIN --env NVIDIA_DISABLE_REQUIRE=1

What should the comment say? "Use stable tag 0300ac9"? Seems redundant to me

aelovikov-intel · 2024-07-02T16:57:15Z

.github/workflows/sycl-nightly.yml

+            use_dev_igc: ${{ contains(needs.detect_changes.outputs.filters, 'devigccfg') }}
+            extra_lit_opts: --param matrix-xmx8=True --param gpu-intel-dg2=True
+
+          # Performance tests below. Specifics:


These tests are very dumb for now and the idea behind them is to be able to access the logs to look through performance number on a per-commit basis. I don't see a reason to add this into nightly...

Sure. I'll move them back

aelovikov-intel · 2024-07-02T16:59:23Z

.github/workflows/sycl-precommit.yml

+      sycl_toolchain_archive: ${{ needs.build_linux.outputs.artifact_archive_name }}
+      sycl_toolchain_decompress_command: ${{ needs.build_linux.outputs.artifact_decompress_command }}
+
+  build_win:


Why are you removing the split between lin/win tasks? I would have expected you to link the commit that introduced the split in the PR's description together with your arguments on why you think that was wrong.

Also, you're not removing sycl-windows-precommit.yml, why?

Also, even if we decide to keep that change, it should be done in a separate PR and not here.

Why are you removing the split between lin/win tasks? I would have expected you to link the commit that introduced the split in the PR's description together with your arguments on why you think that was wrong.

So people can see all precommit tests together. It never occurred to me that they might have been together at one point and then separated out, so I didn't consider something was done "wrong", thus no need to focus on

Also, you're not removing sycl-windows-precommit.yml, why?

That's a mistake. Will fix.

Also, even if we decide to keep that change, it should be done in a separate PR and not here.

Which change? Why does it need a separate merge request?

Which change?

Merging win/lin into a single YML, obviously...

Why does it need a separate merge request?

Because that would be an atomic change and per good software development practices PRs have to represent atomic changes, not a bunch of different fixes at once.

It never occurred to me that they might have been together at one point and then separated out

They were. If you decide to merge them back, please make it clear in your new atomic PR why original reasoning doesn't take place anymore.

obviously

Not obviously. Don't patronise me. Your description was unclear and I'm not a mind reader

My first comment in this thread was attributed to a single change:

Why are you removing the split between lin/win tasks?

The second (and the only other comment at that time) in this thread was

if we decide to keep that change...

What else could it possibly refer to, in your opinion?

YuriPlyakhin · 2024-07-02T17:20:28Z

.github/workflows/sycl-precommit.yml

+            extra_lit_opts: --param gpu-intel-gen12=True
+            target_devices: opencl:gpu;opencl:fpga
+
+          - name: E2E on Intel Arc A-Series Graphics; Level zero GPU, OpenCL CPU


Suggested change

- name: E2E on Intel Arc A-Series Graphics; Level zero GPU, OpenCL CPU

- name: E2E on Intel Arc A-Series Graphics; Level zero GPU, OpenCL GPU

I'm confused. Isn't this FGPA and not GPU?

Originally on ARC we tested with 2 GPU backends: level 0 and OpenCL: target_devices: ext_oneapi_level_zero:gpu;opencl:gpu
Not sure, where FPGA or CPU is coming from...

YuriPlyakhin · 2024-07-02T17:22:47Z

.github/workflows/sycl-precommit.yml

@@ -98,23 +105,16 @@ jobs:
            install_drivers: ${{ contains(needs.detect_changes.outputs.filters, 'drivers') }}
            extra_lit_opts: --param matrix-xmx8=True --param gpu-intel-dg2=True
            env: '{"LIT_FILTER":${{ needs.determine_arc_tests.outputs.arc_tests }} }'
-          - name: E2E tests with dev igc on Intel Arc A-Series Graphics
+
+          - name: E2E on Intel Arc A-Series Graphics


Should the name be more specific on what targets are being tested, for example: OpenCL FPGA, OpenCL GPU, etc....

Probably, I'll have a look

YuriPlyakhin · 2024-07-02T17:26:23Z

.github/workflows/sycl-precommit.yml

            extra_lit_opts: --param matrix-xmx8=True --param gpu-intel-dg2=True
-            env: '{"LIT_FILTER":${{ needs.determine_arc_tests.outputs.arc_tests }} }'
+            target_devices: opencl:gpu;opencl:opencl:fpga


OpenCL GPU is being tested with previous "E2E on Intel Arc A-Series Graphics" run, right?
If yes, we should not repeat it again probably.

Also, what is the point to test OpenCL FPGA on HW with Intel ARC? I would use more powerful system for that, and GPU is not necessary for it.

+1 on this i think fpga testing on any system is the same, it just needs to be powerful enough to not timeout/etc

dm-vodopyanov · 2024-07-03T11:34:00Z

.github/workflows/sycl-precommit.yml

            runner: '["Linux", "amdgpu"]'
            image: ghcr.io/intel/llvm/ubuntu2204_build:latest-0300ac924620a51f76c4929794637b82790f12ab
            image_options: -u 1001 --device=/dev/dri --device=/dev/kfd
            target_devices: ext_oneapi_hip:gpu
-          - name: Intel
+
+          - name: E2E on Intel GEN12; level zero GPU, OpenCL CPU


Everywhere in the code: Level Zero - both words should start with capital letters

ldrumm · 2024-07-03T15:37:32Z

On Wed Jul 3, 2024 at 4:16 PM BST, Nick Sarnie wrote: One issue we actually do see relatively frequently is someone breaking the shared library build (usually missing a library dep in the cmake file) and today that is usually caught in post commit because that does shared lib build. With this, we would only catch it in the nightly.

We could swap it round, so pre-commit testing catches the sharelib issues?

github-actions · 2024-12-31T01:58:40Z

This pull request is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.

github-actions · 2025-01-31T01:57:37Z

This pull request was closed because it has been stalled for 30 days with no activity.

ldrumm requested a review from a team July 2, 2024 14:15

ldrumm requested a review from a team as a code owner July 2, 2024 14:15

ldrumm temporarily deployed to WindowsCILock July 2, 2024 14:16 — with GitHub Actions Inactive

ldrumm mentioned this pull request Jul 2, 2024

Silence an unused variable warning with Apple Clang #14384

Closed

ldrumm assigned bader, sommerlukas, dm-vodopyanov, AlexeySachkov, sarnex, martygrant, sergey-semenov, againull, asudarsa and aelovikov-intel Jul 2, 2024

ldrumm requested review from steffenlarsen and stdale-intel July 2, 2024 15:04

ldrumm had a problem deploying to WindowsCILock July 2, 2024 15:24 — with GitHub Actions Failure

ldrumm had a problem deploying to WindowsCILock July 2, 2024 15:31 — with GitHub Actions Failure

sarnex reviewed Jul 2, 2024

View reviewed changes

aelovikov-intel reviewed Jul 2, 2024

View reviewed changes

aelovikov-intel requested a review from YuriPlyakhin July 2, 2024 16:51

aelovikov-intel reviewed Jul 2, 2024

View reviewed changes

YuriPlyakhin reviewed Jul 2, 2024

View reviewed changes

bader removed their assignment Jul 2, 2024

fixup! Get rid of post-commit testing

ce31515

ldrumm temporarily deployed to WindowsCILock July 3, 2024 11:14 — with GitHub Actions Inactive

dm-vodopyanov reviewed Jul 3, 2024

View reviewed changes

ldrumm had a problem deploying to WindowsCILock July 3, 2024 12:03 — with GitHub Actions Failure

github-actions bot added the Stale label Dec 31, 2024

github-actions bot closed this Jan 31, 2025

	- name: E2E on Intel Arc A-Series Graphics; Level zero GPU, OpenCL CPU
	- name: E2E on Intel Arc A-Series Graphics; Level zero GPU, OpenCL GPU

Get rid of post-commit testing #14383

Get rid of post-commit testing #14383

Uh oh!

Conversation

ldrumm commented Jul 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldrumm commented Jul 2, 2024

Uh oh!

sarnex left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aelovikov-intel Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sarnex Jul 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sarnex Jul 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ldrumm commented Jul 2, 2024 •

edited

Loading

aelovikov-intel Jul 3, 2024 •

edited

Loading

sarnex Jul 2, 2024 •

edited

Loading

sarnex Jul 2, 2024 •

edited

Loading

YuriPlyakhin Jul 3, 2024 •

edited

Loading

YuriPlyakhin Jul 2, 2024 •

edited

Loading