-
Notifications
You must be signed in to change notification settings - Fork 787
[CI] Update GPU driver uplift process #6581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Adding abilility to self test task on pull_request made * Adding commit description containing driver version * Removing linux_staging drivers update. linux_staging versions are used to produce "unstable" containers in SYCL nightly workflow, but in fact "unstable" matching "latest" currently and there are no testing processes in the CI currently which use "unstable" containers. Not sure what the original intention was, but until we have any *separate* testing processes for "unstable" containers it doesn't make sense to update the versions. * Use personal access token to kick off pre-commit testing automatically
branches: | ||
- sycl | ||
paths: | ||
- .github/workflows/sycl_update_gpu_driver.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get the purpose of this change. I read it as we run .github/workflows/sycl_update_gpu_driver.yml
workflow on for open PR changing .github/workflows/sycl_update_gpu_driver.yml file
event.
Shouldn't we trigger workflows that test the driver instead of updating it?
See https://github.com/intel/llvm/runs/7815213825?check_suite_focus=true <--- running this job doesn't make sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change enforce sycl_update_gpu_driver workflow run in pre-commit when it's getting updated. Like on the link you added. Otherwise pre-commit won't run sycl_update_gpu_driver workflow at all.
Workflow itself by its purpose creates new PRs. The reason why last run (the one you referred to) was useless is that there've been no new driver version. I need to re-run pre-commit task (this sycl_update_gpu_driver workflow) as soon as new driver is available. That will create a completely new separate PR with driver version update, and I expect that will kick off standard pre-commit in new PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated sycl_precommit job to do not trigger full sycl_precommit testing when sycl_update_gpu_driver is changed.
However in this specific PR it is still triggered ... because I changed sycl_precommit in this PR :)
run: | | ||
cd $GITHUB_WORKSPACE | ||
# Set fake identity to fulfil git requirements | ||
git config --global user.name "GitHub Actions" | ||
git config --global user.email "[email protected]" | ||
git checkout -B $BRANCH | ||
git add -u | ||
git commit -m "[GHA] Uplift GPU RT version for Linux CI" || exit 0 # exit if commit is empty | ||
git commit -m "[GHA] Uplift GPU RT version for Linux CI" -m "Uplift GPU RT version for Linux to $NEW_DRIVER_VERSION" || exit 0 # exit if commit is empty | ||
git push https://[email protected]/${{ github.repository }} ${BRANCH} | ||
gh pr create --head $BRANCH --title "[GHA] Uplift GPU RT version for Linux CI" --body "Uplift GPU RT version for Linux to $NEW_DRIVER_VERSION" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove linux_staging
update job? How do we get CI results with the latest GPU driver, which breaks pre-commit tests?
The idea here is that we would like to know the status on the latest available GPU driver as there were cases in the past when GPU uplift introduces a functional regression. In this case we would like to report it to driver team and wait for a new fixed driver version before we recommend uplifted driver for our users.
AFAIK, we build a two docker images: one with the latest driver and another one with the tested driver. If you remove the update for the latest driver, you should probably remove the job building corresponding docker image.
llvm/.github/workflows/sycl_containers.yaml
Lines 102 to 139 in 677e2e2
# This job produces a Docker container with the latest versions of Intel | |
# drivers, that can be found on GitHub. | |
drivers_image_ubuntu2004_unstable: | |
if: github.repository == 'intel/llvm' | |
name: Intel Drivers (unstable) Ubuntu 20.04 Docker image | |
runs-on: ubuntu-latest | |
needs: base_image_ubuntu2004 | |
steps: | |
- name: Checkout | |
uses: actions/checkout@v2 | |
with: | |
fetch-depth: 2 | |
- name: Get dependencies configuration | |
id: deps | |
run: | | |
DEPS=`cat devops/dependencies.json` | |
DEPS="${DEPS//'%'/'%25'}" | |
DEPS="${DEPS//$'\n'/'%0A'}" | |
DEPS="${DEPS//$'\r'/'%0D'}" | |
echo $DEPS | |
echo "::set-output name=deps::$DEPS" | |
- name: Build and Push Container | |
uses: ./devops/actions/build_container | |
with: | |
push: ${{ github.event_name != 'pull_request' }} | |
file: ubuntu2004_intel_drivers | |
username: ${{ github.repository_owner }} | |
password: ${{ secrets.GITHUB_TOKEN }} | |
tags: | | |
ghcr.io/${{ github.repository }}/ubuntu2004_intel_drivers:unstable-${{ github.sha }} | |
ghcr.io/${{ github.repository }}/ubuntu2004_intel_drivers:unstable | |
build-args: | | |
compute_runtime_tag=${{fromJson(steps.deps.outputs.deps).linux_staging.compute_runtime.github_tag}} | |
igc_tag=${{fromJson(steps.deps.outputs.deps).linux_staging.igc.github_tag}} | |
tbb_tag=${{fromJson(steps.deps.outputs.deps).linux_staging.tbb.github_tag}} | |
fpgaemu_tag=${{fromJson(steps.deps.outputs.deps).linux_staging.fpgaemu.github_tag}} | |
cpu_tag=${{fromJson(steps.deps.outputs.deps).linux_staging.oclcpu.github_tag}} | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is explanation in PR description. Let me know if it's not very clear.
We don't use "unstable" anywhere today, except "unstable" containers build. There are 3 areas if we'd like to do a full cleanup (removal) of "unstable" images:
- Update GPU driver (this specific PR is doing that)
- dependencies.json
- sycl_containers.yaml (code you referred to)
I was hesitating on making a full cleanup, made only (1) in this patch limiting patch only to GPU driver update.
As soon as (1) is done (2) and (3) will become effectively a dead code, I expect new unstable builds won't be triggered. I can remove it or can keep it, if we ever want to have some "unstable" containers in the future with some real purpose.
Note that driver version uplift process will keep working through the "latest" images.
In other words after this change:
- [GHA] Uplift GPU RT version for Linux CI #6543 - this will stay
- [GHA] Uplift GPU RT version for Nightly Builds #6542 - this will stop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree that we should remove "unstable" containers. From time to time the regressions in GPU driver, which block updates. Having unstable container is useful for getting latest status and for those who is not affected by the GPU driver regressions.
Co-authored-by: Alexey Bader <[email protected]>
…hange This is not required on the workflow change. Workflow sycl_update_gpu_driver.yml when run will produce a new PR for driver uplift, that's where full testing is expected in fact.
This reverts commit 386f822.
It's appeared that secrets are not passed in pull_request workflow trigger. That means that such types of jobs that require secrets (to push changes or create PRs) cannot be effectively tested in pre-commit. It's appeared that the only way to test it is post-commit, also can be tested in temporary branch where change is getting committed. This change is ready to land. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@bader, if you have further comments I'll be happy to address in a follow up PRs. |
Triggered job manually, works well it seems: #6608 |
It doesn't seem to work as expected. I'm looking at recent PR created to update GPU driver and I see that it [tests old version](https://github.com/intel/llvm/runs/8084525204?check_suite_focus=true of the driver instead of the version it tries to update to. UPD: the same problem can be observed in #6608 mentioned your comment above. |
Thanks. It's been addressed at #6723 |
to produce "unstable" containers in SYCL nightly workflow, but in
fact "unstable" matching "latest" currently and there are no testing
processes in the CI currently which use "unstable" containers. Not sure what
the original intention was, but until we have any separate testing processes
for "unstable" containers it doesn't make sense to update the versions.