Skip to content

[CI] Tune nightly benchmarking job for better reliability #17122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Mar 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
8f45038
Add PVC_PERF runner as sole option
ianayl Feb 21, 2025
0a8b0d2
Turn down fail threshold
ianayl Feb 21, 2025
e672cda
Restrict number of cores used
ianayl Feb 28, 2025
684f94e
Fix missing shell directive
ianayl Feb 28, 2025
57e530e
Tweak thresholds
ianayl Feb 28, 2025
aaa19f0
Bump up tolerance
ianayl Feb 28, 2025
bdf68d3
Bump iterations for more consistent results
ianayl Feb 28, 2025
b495154
Bump iterations
ianayl Feb 28, 2025
384c4d6
Reduce number of iterations
ianayl Feb 28, 2025
dd3f861
Lower number of iterations
ianayl Feb 28, 2025
0ee8a9a
Temporarily lower min_threshold to test results
ianayl Feb 28, 2025
5cab659
Require more samples before comparison
ianayl Mar 3, 2025
e556368
Tweak tolerance and add check for intel/llvm in nightly
ianayl Mar 5, 2025
84eda44
Increase iterations to 5000 again
ianayl Mar 5, 2025
565bfbd
Reduce tolerance to 5%
ianayl Mar 5, 2025
247ed16
Merge branch 'sycl' of https://github.com/intel/llvm into ianayl/tune…
ianayl Mar 5, 2025
04d5220
Bump tolerance to 7%
ianayl Mar 5, 2025
7637810
Bump tolerance up to 8%
ianayl Mar 6, 2025
98a9b3d
Test 10000 iterations
ianayl Mar 6, 2025
13f86ec
Revert "Test 10000 iterations"
ianayl Mar 6, 2025
fedb018
Do not reset GPU
ianayl Mar 10, 2025
26180cd
Readd back /dev/dri/by-path to docker image arguments
ianayl Mar 12, 2025
302d6b0
Re-enable resetting the GPU
ianayl Mar 12, 2025
fdfbb9a
Switch out manual triggers for installing igc drivers for resetting i…
ianayl Mar 12, 2025
08fb37a
Fix capitalization
ianayl Mar 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 4 additions & 6 deletions .github/workflows/sycl-linux-run-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,7 @@ on:
- '["cts-cpu"]'
- '["Linux", "build"]'
- '["cuda"]'
- '["PVC_PERF"]'
image:
type: choice
options:
Expand Down Expand Up @@ -170,17 +171,14 @@ on:
Extra options to be added to LIT_OPTS.
default: ''

install_igc_driver:
reset_intel_gpu:
description: |
Reset Intel GPUs
type: choice
options:
- false
- true

install_dev_igc_driver:
type: choice
options:
- false
- true
e2e_testing_mode:
type: choice
options:
Expand Down
9 changes: 2 additions & 7 deletions .github/workflows/sycl-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ jobs:
sycl_cts_artifact: sycl_cts_bin

aggregate_benchmark_results:
if: always() && !cancelled()
if: github.repository == 'intel/llvm' && !cancelled()
name: Aggregate benchmark results and produce historical averages
uses: ./.github/workflows/sycl-benchmark-aggregate.yml
secrets:
Expand All @@ -262,13 +262,8 @@ jobs:
fail-fast: false
matrix:
include:
- name: Run compute-benchmarks on L0 Gen12
runner: '["Linux", "gen12"]'
image_options: -u 1001 --device=/dev/dri -v /dev/dri/by-path:/dev/dri/by-path --privileged --cap-add SYS_ADMIN
target_devices: level_zero:gpu
reset_intel_gpu: true
- name: Run compute-benchmarks on L0 PVC
runner: '["Linux", "pvc"]'
runner: '["PVC_PERF"]'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should include the OS here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to add a "Linux" tag to the UR runner then, although it's worth noting that the UR folks would prefer to not have other jobs run on their runners, so we'll need to make sure whatever tags we add to that runner does not result in other workflows picking it up.

For now, I'll add "Linux" somewhere in the name

Copy link
Contributor

@sarnex sarnex Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wow the runner doesnt have the linux tag automatically, thats weird

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was intentional though, I would check up with @pbalcer

Copy link
Contributor

@pbalcer pbalcer Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukaszstolarczuk was the one who set it up ;-) I'm not an expert on runners.

But yeah, the original idea behind this system was that it'd just have one runner script instance, used exclusively for the benchmarks. But right now this system isn't very busy, so...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explicitly used --no-default-labels for setting this runner up. I guess you can update labels, to whatever you wish, pls just remember to update the ur-benchmarks-reusable.yml afterwards.

On general note - what Piotr said - we wanted this runner to be used mostly for performance, as it's also used for measuring UMF perf and too much traffic may influence the results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we wanted this runner to be used mostly for performance, as it's also used for measuring UMF perf and too much traffic may influence the results.

+1 Yeah, I also wanted to reserve this runner for performance benchmarking exclusively. @lukaszstolarczuk there's only one GHA process running on this runner, right? If so, we can be sure that only one benchmarking job will be executing at a given time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this moment it is not a one runner. We want to move UMF into intel org - only then we can have a single, shared runner. For now we have 2 runners (one for SYCL, one for UMF), each bound to a different NUMA node.

image_options: -u 1001 --device=/dev/dri -v /dev/dri/by-path:/dev/dri/by-path --privileged --cap-add SYS_ADMIN
target_devices: level_zero:gpu
reset_intel_gpu: true
Expand Down
23 changes: 22 additions & 1 deletion devops/actions/run-tests/benchmark/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,27 @@ runs:
echo "# This workflow is not guaranteed to work with other backends."
echo "#" ;;
esac
- name: Compute CPU core range to run benchmarks on
shell: bash
run: |
# Taken from ur-benchmark-reusable.yml:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make it a shared action in devops/actions if its called by UR CI also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we combine both CI's there should only be 1 invocation of this, but that's when the dust all settles


# Compute the core range for the first NUMA node; second node is used by
# UMF. Skip the first 4 cores as the kernel is likely to schedule more
# work on these.
CORES="$(lscpu | awk '
/NUMA node0 CPU|On-line CPU/ {line=$0}
END {
split(line, a, " ")
split(a[4], b, ",")
sub(/^0/, "4", b[1])
print b[1]
}')"
echo "CPU core range to use: $CORES"
echo "CORES=$CORES" >> $GITHUB_ENV

ZE_AFFINITY_MASK=0
echo "ZE_AFFINITY_MASK=$ZE_AFFINITY_MASK" >> $GITHUB_ENV
- name: Run compute-benchmarks
shell: bash
run: |
Expand All @@ -69,7 +90,7 @@ runs:
echo "-----"
sycl-ls
echo "-----"
./devops/scripts/benchmarking/benchmark.sh -n '${{ runner.name }}' -s || exit 1
taskset -c "$CORES" ./devops/scripts/benchmarking/benchmark.sh -n '${{ runner.name }}' -s || exit 1
- name: Push compute-benchmarks results
if: always()
shell: bash
Expand Down
8 changes: 4 additions & 4 deletions devops/benchmarking/config.ini
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@
; Compute-benchmark compile/run options
[compute_bench]
; Value for -j during compilation of compute-benchmarks
compile_jobs = 2
compile_jobs = 40
; Number of iterations to run compute-benchmark tests
iterations = 100
iterations = 5000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this affect the Nightly? i.e. how much overhead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The run itself takes about 45 minutes, but since it's on PVC_PERF it should run in parallel with the other E2E tests, so I suspect it shouldn't be much. However, I've started a nightly run to confirm this.


; Options for benchmark result metrics (to record/compare against)
[metrics]
Expand All @@ -23,15 +23,15 @@ recorded = Median,StdDev
; the historical average. Metrics not included here are not compared against
; when passing/failing benchmark results.
; Format: comma-separated list of <metric>:<deviation percentage in decimals>
tolerances = Median:0.5
tolerances = Median:0.08

; Options for computing historical averages
[average]
; Number of days (from today) to look back for results when computing historical
; average
cutoff_range = 7
; Minimum number of samples required to compute a historical average
min_threshold = 3
min_threshold = 10

; ONEAPI_DEVICE_SELECTOR linting/options
[device_selector]
Expand Down
Loading