Skip to content

Commit 5250c0e

Browse files
[CI] Add CI workflow to run compute-benchmarks on incoming syclos PRs (#14454)
This PR: - adds a "benchmark" mode to sycl-linux-run-tests.yml, which benchmarks a given SYCL branch/build using [compute-benchmarks](https://github.com/intel/compute-benchmarks/) - stores benchmark results in a git repo, and - aggregates benchmark results in order to produce a median, which is used to pass or fail the benchmark workflow The current plan is to enable this benchmark to run nightly in order to catch regressions, although there is potential for this workflow to be used in precommit. As a result, a lot of components in this workflow are either separate reusable components, or directly written with precommit in mind. The current benchmarking workflow functions as so: 1. An "aggregate" workflow is ran, which aggregates historic benchmark results in the aforementioned git repo, and produces a historical median - This calls upon aggregate.py to handle the actual compute heavy-lifting 2. The core benchmarking workflow is ran: - This calls upon benchmark.sh, which handles the logic for building and running compute-benchmarks - Then, compare.py is called upon for the actual comparing of benchmark data against the historical median generated prior The workflows are fully configurable via benchmark-ci.conf; enabled compute-benchmarks tests can be configured via enabled_tests.conf. Feel free to test out the workflow via manual dispatches of sycl-linux-run-tests.yml on branch benchmarking-workflow, but be aware that the run currently will always fail, as Github repository secrets are not yet added. --------- Co-authored-by: aelovikov-intel <[email protected]>
1 parent 8a9e847 commit 5250c0e

File tree

13 files changed

+1237
-1
lines changed

13 files changed

+1237
-1
lines changed
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
name: Aggregate compute-benchmark averages from historical data
2+
3+
# The benchmarking workflow in sycl-linux-run-tests.yml passes or fails based on
4+
# how the benchmark results compare to a historical average: This historical
5+
# average is calculated in this workflow, which aggregates historical data and
6+
# produces measures of central tendency (median in this case) used for this
7+
# purpose.
8+
9+
on:
10+
workflow_dispatch:
11+
inputs:
12+
lookback_days:
13+
description: |
14+
Number of days from today to look back in historical results for:
15+
This sets the age limit of data used in average calculation: Any
16+
benchmark results created before `lookback_days` from today is
17+
excluded from being aggregated in the historical average.
18+
type: number
19+
required: true
20+
workflow_call:
21+
inputs:
22+
lookback_days:
23+
type: number
24+
required: true
25+
secrets:
26+
LLVM_SYCL_BENCHMARK_TOKEN:
27+
description: |
28+
Github token used by the faceless account to push newly calculated
29+
medians.
30+
required: true
31+
32+
33+
permissions:
34+
contents: read
35+
36+
jobs:
37+
aggregate:
38+
name: Aggregate average (median) value for all metrics
39+
runs-on: ubuntu-latest
40+
steps:
41+
- uses: actions/checkout@v4
42+
with:
43+
sparse-checkout: |
44+
devops/scripts/benchmarking
45+
devops/benchmarking
46+
devops/actions/benchmarking
47+
- name: Aggregate benchmark results and produce historical average
48+
uses: ./devops/actions/benchmarking/aggregate
49+
with:
50+
lookback_days: ${{ inputs.lookback_days }}
51+
env:
52+
GITHUB_TOKEN: ${{ secrets.LLVM_SYCL_BENCHMARK_TOKEN }}

.github/workflows/sycl-linux-run-tests.yml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ on:
2525
required: False
2626
tests_selector:
2727
description: |
28-
Two possible options: "e2e" and "cts".
28+
Three possible options: "e2e", "cts", and "compute-benchmarks".
2929
type: string
3030
default: "e2e"
3131

@@ -152,6 +152,7 @@ on:
152152
options:
153153
- e2e
154154
- cts
155+
- compute-benchmarks
155156

156157
env:
157158
description: |
@@ -314,3 +315,12 @@ jobs:
314315
sycl_cts_artifact: ${{ inputs.sycl_cts_artifact }}
315316
target_devices: ${{ inputs.target_devices }}
316317
retention-days: ${{ inputs.retention-days }}
318+
319+
- name: Run compute-benchmarks on SYCL
320+
if: inputs.tests_selector == 'compute-benchmarks'
321+
uses: ./devops/actions/run-tests/benchmark
322+
with:
323+
target_devices: ${{ inputs.target_devices }}
324+
env:
325+
RUNNER_TAG: ${{ inputs.runner }}
326+
GITHUB_TOKEN: ${{ secrets.LLVM_SYCL_BENCHMARK_TOKEN }}

.github/workflows/sycl-nightly.yml

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -243,6 +243,46 @@ jobs:
243243
sycl_toolchain_decompress_command: ${{ needs.ubuntu2204_build.outputs.artifact_decompress_command }}
244244
sycl_cts_artifact: sycl_cts_bin
245245

246+
aggregate_benchmark_results:
247+
if: always() && !cancelled()
248+
name: Aggregate benchmark results and produce historical averages
249+
uses: ./.github/workflows/sycl-benchmark-aggregate.yml
250+
secrets:
251+
LLVM_SYCL_BENCHMARK_TOKEN: ${{ secrets.LLVM_SYCL_BENCHMARK_TOKEN }}
252+
with:
253+
lookback_days: 100
254+
255+
run-sycl-benchmarks:
256+
needs: [ubuntu2204_build, aggregate_benchmark_results]
257+
if: ${{ always() && !cancelled() && needs.ubuntu2204_build.outputs.build_conclusion == 'success' }}
258+
strategy:
259+
fail-fast: false
260+
matrix:
261+
include:
262+
- name: Run compute-benchmarks on L0 Gen12
263+
runner: '["Linux", "gen12"]'
264+
image_options: -u 1001 --device=/dev/dri -v /dev/dri/by-path:/dev/dri/by-path --privileged --cap-add SYS_ADMIN
265+
target_devices: level_zero:gpu
266+
reset_intel_gpu: true
267+
- name: Run compute-benchmarks on L0 PVC
268+
runner: '["Linux", "pvc"]'
269+
image_options: -u 1001 --device=/dev/dri -v /dev/dri/by-path:/dev/dri/by-path --privileged --cap-add SYS_ADMIN
270+
target_devices: level_zero:gpu
271+
reset_intel_gpu: false
272+
uses: ./.github/workflows/sycl-linux-run-tests.yml
273+
secrets: inherit
274+
with:
275+
name: ${{ matrix.name }}
276+
runner: ${{ matrix.runner }}
277+
image_options: ${{ matrix.image_options }}
278+
target_devices: ${{ matrix.target_devices }}
279+
tests_selector: compute-benchmarks
280+
reset_intel_gpu: ${{ matrix.reset_intel_gpu }}
281+
ref: ${{ github.sha }}
282+
sycl_toolchain_artifact: sycl_linux_default
283+
sycl_toolchain_archive: ${{ needs.ubuntu2204_build.outputs.artifact_archive_name }}
284+
sycl_toolchain_decompress_command: ${{ needs.ubuntu2204_build.outputs.artifact_decompress_command }}
285+
246286
nightly_build_upload:
247287
name: Nightly Build Upload
248288
if: ${{ github.ref_name == 'sycl' }}
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
name: 'Aggregate compute-benchmark results and produce historical averages'
2+
3+
# The benchmarking workflow in sycl-linux-run-tests.yml passes or fails based on
4+
# how the benchmark results compare to a historical average: This historical
5+
# average is calculated in this composite workflow, which aggregates historical
6+
# data and produces measures of central tendency (median in this case) used for
7+
# this purpose.
8+
#
9+
# This action assumes that /devops has been checked out in ./devops. This action
10+
# also assumes that GITHUB_TOKEN was properly set in env, because according to
11+
# Github, that's apparently the recommended way to pass a secret into a github
12+
# action:
13+
#
14+
# https://docs.github.com/en/actions/security-for-github-actions/security-guides/using-secrets-in-github-actions#accessing-your-secrets
15+
#
16+
17+
inputs:
18+
lookback_days:
19+
type: number
20+
required: true
21+
22+
runs:
23+
using: "composite"
24+
steps:
25+
- name: Obtain oldest timestamp allowed for data in aggregation
26+
shell: bash
27+
run: |
28+
# DO NOT use inputs.lookback_days directly, only use SANITIZED_TIMESTAMP.
29+
SANITIZED_LOOKBACK_DAYS="$(echo '${{ inputs.lookback_days }}' | grep -oE '^[0-9]+$')"
30+
if [ -z "$SANITIZED_LOOKBACK_DAYS" ]; then
31+
echo "Please ensure inputs.lookback_days is a number."
32+
exit 1
33+
fi
34+
SANITIZED_TIMESTAMP="$(date -d "$SANITIZED_LOOKBACK_DAYS days ago" +%Y%m%d_%H%M%S)"
35+
if [ -z "$(echo "$SANITIZED_TIMESTAMP" | grep -oE '^[0-9]{8}_[0-9]{6}$' )" ]; then
36+
echo "Invalid timestamp generated: is inputs.lookback_days valid?"
37+
exit 1
38+
fi
39+
echo "SANITIZED_TIMESTAMP=$SANITIZED_TIMESTAMP" >> $GITHUB_ENV
40+
- name: Load benchmarking configuration
41+
shell: bash
42+
run: |
43+
$(python ./devops/scripts/benchmarking/load_config.py ./devops constants)
44+
echo "SANITIZED_PERF_RES_GIT_REPO=$SANITIZED_PERF_RES_GIT_REPO" >> $GITHUB_ENV
45+
echo "SANITIZED_PERF_RES_GIT_BRANCH=$SANITIZED_PERF_RES_GIT_BRANCH" >> $GITHUB_ENV
46+
- name: Checkout historical performance results repository
47+
shell: bash
48+
run: |
49+
if [ ! -d ./llvm-ci-perf-results ]; then
50+
git clone -b "$SANITIZED_PERF_RES_GIT_BRANCH" "https://github.com/$SANITIZED_PERF_RES_GIT_REPO" ./llvm-ci-perf-results
51+
fi
52+
- name: Run aggregator on historical results
53+
shell: bash
54+
run: |
55+
# The current format of the historical results respository is:
56+
#
57+
# /<ONEAPI_DEVICE_SELECTOR>/<runner>/<test name>
58+
#
59+
# Thus, a min/max depth of 3 is used to enumerate all test cases in the
60+
# repository. Test name is also derived from here.
61+
find ./llvm-ci-perf-results -mindepth 3 -maxdepth 3 -type d ! -path '*.git*' |
62+
while read -r dir; do
63+
test_name="$(basename "$dir")"
64+
python ./devops/scripts/benchmarking/aggregate.py ./devops "$test_name" "$dir" "$SANITIZED_TIMESTAMP"
65+
done
66+
- name: Upload average to the repo
67+
shell: bash
68+
run: |
69+
cd ./llvm-ci-perf-results
70+
git config user.name "SYCL Benchmarking Bot"
71+
git config user.email "[email protected]"
72+
git pull
73+
# Make sure changes have been made
74+
if git diff --quiet && git diff --cached --quiet; then
75+
echo "No changes to median, skipping push."
76+
else
77+
git add .
78+
git commit -m "[GHA] Aggregate median data from $SANITIZED_TIMESTAMP to $(date +%Y%m%d_%H%M%S)"
79+
git push "https://[email protected]/$SANITIZED_PERF_RES_GIT_REPO.git" "$SANITIZED_PERF_RES_GIT_BRANCH"
80+
fi
81+
- name: Find aggregated average results artifact here
82+
if: always()
83+
shell: bash
84+
run: |
85+
cat << EOF
86+
#
87+
# Artifact link for aggregated averages here:
88+
#
89+
EOF
90+
- name: Archive new medians
91+
if: always()
92+
uses: actions/upload-artifact@v4
93+
with:
94+
name: llvm-ci-perf-results new medians
95+
path: ./llvm-ci-perf-results/**/*-median.csv
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
name: 'Run compute-benchmarks'
2+
3+
# Run compute-benchmarks on SYCL
4+
#
5+
# This action assumes SYCL is in ./toolchain, and that /devops has been
6+
# checked out in ./devops. This action also assumes that GITHUB_TOKEN
7+
# was properly set in env, because according to Github, that's apparently the
8+
# recommended way to pass a secret into a github action:
9+
#
10+
# https://docs.github.com/en/actions/security-for-github-actions/security-guides/using-secrets-in-github-actions#accessing-your-secrets
11+
#
12+
# This action also expects a RUNNER_TAG environment variable to be set to the
13+
# runner tag used to run this workflow: Currently, only gen12 and pvc on Linux
14+
# are fully supported. Although this workflow won't stop you from running other
15+
# devices, note that only gen12 and pvc has been tested to work.
16+
#
17+
18+
inputs:
19+
target_devices:
20+
type: string
21+
required: True
22+
23+
runs:
24+
using: "composite"
25+
steps:
26+
- name: Check specified runner type / target backend
27+
shell: bash
28+
env:
29+
TARGET_DEVICE: ${{ inputs.target_devices }}
30+
run: |
31+
case "$RUNNER_TAG" in
32+
'["Linux", "gen12"]' | '["Linux", "pvc"]') ;;
33+
*)
34+
echo "#"
35+
echo "# WARNING: Only gen12/pvc on Linux is fully supported."
36+
echo "# This workflow is not guaranteed to work with other runners."
37+
echo "#" ;;
38+
esac
39+
40+
# input.target_devices is not directly used, as this allows code injection
41+
case "$TARGET_DEVICE" in
42+
level_zero:*) ;;
43+
*)
44+
echo "#"
45+
echo "# WARNING: Only level_zero backend is fully supported."
46+
echo "# This workflow is not guaranteed to work with other backends."
47+
echo "#" ;;
48+
esac
49+
- name: Run compute-benchmarks
50+
shell: bash
51+
run: |
52+
cat << EOF
53+
#
54+
# NOTE TO DEVELOPERS:
55+
#
56+
57+
Check latter steps of the workflow: This job produces an artifact with:
58+
- benchmark results from passing/failing tests
59+
- log containing all failing (too slow) benchmarks
60+
- log containing all erroring benchmarks
61+
62+
While this step in the workflow provides debugging output describing this
63+
information, it might be easier to inspect the logs from the artifact
64+
instead.
65+
66+
EOF
67+
export ONEAPI_DEVICE_SELECTOR="${{ inputs.target_devices }}"
68+
export CMPLR_ROOT=./toolchain
69+
echo "-----"
70+
sycl-ls
71+
echo "-----"
72+
./devops/scripts/benchmarking/benchmark.sh -n '${{ runner.name }}' -s || exit 1
73+
- name: Push compute-benchmarks results
74+
if: always()
75+
shell: bash
76+
run: |
77+
# TODO -- waiting on security clearance
78+
# Load configuration values
79+
$(python ./devops/scripts/benchmarking/load_config.py ./devops constants)
80+
81+
cd "./llvm-ci-perf-results"
82+
git config user.name "SYCL Benchmarking Bot"
83+
git config user.email "[email protected]"
84+
git pull
85+
git add .
86+
# Make sure changes have been made
87+
if git diff --quiet && git diff --cached --quiet; then
88+
echo "No new results added, skipping push."
89+
else
90+
git commit -m "[GHA] Upload compute-benchmarks results from https://github.com/intel/llvm/actions/runs/${{ github.run_id }}"
91+
git push "https://[email protected]/$SANITIZED_PERF_RES_GIT_REPO.git" "$SANITIZED_PERF_RES_GIT_BRANCH"
92+
fi
93+
- name: Find benchmark result artifact here
94+
if: always()
95+
shell: bash
96+
run: |
97+
cat << EOF
98+
#
99+
# Artifact link for benchmark results here:
100+
#
101+
EOF
102+
- name: Archive compute-benchmark results
103+
if: always()
104+
uses: actions/upload-artifact@v4
105+
with:
106+
name: Compute-benchmark run ${{ github.run_id }} (${{ runner.name }})
107+
path: ./artifact

devops/benchmarking/config.ini

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
;
2+
; This file contains configuration options to change the behaviour of the
3+
; benchmarking workflow in sycl-linux-run-tests.yml.
4+
;
5+
; DO NOT USE THE CONTENTS OF THIS FILE DIRECTLY -- Due to security concerns, The
6+
; contents of this file must be sanitized first before use.
7+
; See: /devops/scripts/benchmarking/common.py
8+
;
9+
10+
; Compute-benchmark compile/run options
11+
[compute_bench]
12+
; Value for -j during compilation of compute-benchmarks
13+
compile_jobs = 2
14+
; Number of iterations to run compute-benchmark tests
15+
iterations = 100
16+
17+
; Options for benchmark result metrics (to record/compare against)
18+
[metrics]
19+
; Sets the metrics to record/aggregate in the historical average.
20+
; Format: comma-separated list of column names in compute-benchmark results
21+
recorded = Median,StdDev
22+
; Sets the tolerance for each recorded metric and their allowed deviation from
23+
; the historical average. Metrics not included here are not compared against
24+
; when passing/failing benchmark results.
25+
; Format: comma-separated list of <metric>:<deviation percentage in decimals>
26+
tolerances = Median:0.5
27+
28+
; Options for computing historical averages
29+
[average]
30+
; Number of days (from today) to look back for results when computing historical
31+
; average
32+
cutoff_range = 7
33+
; Minimum number of samples required to compute a historical average
34+
min_threshold = 3
35+
36+
; ONEAPI_DEVICE_SELECTOR linting/options
37+
[device_selector]
38+
; Backends to allow in device_selector
39+
enabled_backends = level_zero,opencl,cuda,hip
40+
; native_cpu is disabled
41+
42+
; Devices to allow in device_selector
43+
enabled_devices = cpu,gpu
44+
; fpga is disabled

0 commit comments

Comments
 (0)