Skip to content

[CI] Update GPU driver uplift process #6581

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 18, 2022
Merged

Conversation

pvchupin
Copy link
Contributor

@pvchupin pvchupin commented Aug 12, 2022

  • Ignore sycl_update_gpu_driver job change in sycl_precommit. Full pre-commit is expected to be done in PRs generated by sycl_update_gpu_driver, not the change in sycl_update_gpu_driver.
  • Add driver version to commit title.
  • Remove linux_staging drivers update. linux_staging versions are used
    to produce "unstable" containers in SYCL nightly workflow, but in
    fact "unstable" matching "latest" currently and there are no testing
    processes in the CI currently which use "unstable" containers. Not sure what
    the original intention was, but until we have any separate testing processes
    for "unstable" containers it doesn't make sense to update the versions.
  • Use personal access token to kick off pre-commit testing automatically in PR created by this job.

* Adding abilility to self test task on pull_request made
* Adding commit description containing driver version
* Removing linux_staging drivers update. linux_staging versions are used
to produce "unstable" containers in SYCL nightly workflow, but in
fact "unstable" matching "latest" currently and there are no testing
processes in the CI currently which use "unstable" containers. Not sure what
the original intention was, but until we have any *separate* testing processes
for "unstable" containers it doesn't make sense to update the versions.
* Use personal access token to kick off pre-commit testing automatically
@pvchupin pvchupin requested a review from a team as a code owner August 12, 2022 23:41
@pvchupin pvchupin requested a review from cperkinsintel August 12, 2022 23:48
branches:
- sycl
paths:
- .github/workflows/sycl_update_gpu_driver.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get the purpose of this change. I read it as we run .github/workflows/sycl_update_gpu_driver.yml workflow on for open PR changing .github/workflows/sycl_update_gpu_driver.yml file event.
Shouldn't we trigger workflows that test the driver instead of updating it?

See https://github.com/intel/llvm/runs/7815213825?check_suite_focus=true <--- running this job doesn't make sense to me.

Copy link
Contributor Author

@pvchupin pvchupin Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change enforce sycl_update_gpu_driver workflow run in pre-commit when it's getting updated. Like on the link you added. Otherwise pre-commit won't run sycl_update_gpu_driver workflow at all.

Workflow itself by its purpose creates new PRs. The reason why last run (the one you referred to) was useless is that there've been no new driver version. I need to re-run pre-commit task (this sycl_update_gpu_driver workflow) as soon as new driver is available. That will create a completely new separate PR with driver version update, and I expect that will kick off standard pre-commit in new PR.

Copy link
Contributor Author

@pvchupin pvchupin Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated sycl_precommit job to do not trigger full sycl_precommit testing when sycl_update_gpu_driver is changed.
However in this specific PR it is still triggered ... because I changed sycl_precommit in this PR :)

run: |
cd $GITHUB_WORKSPACE
# Set fake identity to fulfil git requirements
git config --global user.name "GitHub Actions"
git config --global user.email "[email protected]"
git checkout -B $BRANCH
git add -u
git commit -m "[GHA] Uplift GPU RT version for Linux CI" || exit 0 # exit if commit is empty
git commit -m "[GHA] Uplift GPU RT version for Linux CI" -m "Uplift GPU RT version for Linux to $NEW_DRIVER_VERSION" || exit 0 # exit if commit is empty
git push https://[email protected]/${{ github.repository }} ${BRANCH}
gh pr create --head $BRANCH --title "[GHA] Uplift GPU RT version for Linux CI" --body "Uplift GPU RT version for Linux to $NEW_DRIVER_VERSION"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove linux_staging update job? How do we get CI results with the latest GPU driver, which breaks pre-commit tests?

The idea here is that we would like to know the status on the latest available GPU driver as there were cases in the past when GPU uplift introduces a functional regression. In this case we would like to report it to driver team and wait for a new fixed driver version before we recommend uplifted driver for our users.

AFAIK, we build a two docker images: one with the latest driver and another one with the tested driver. If you remove the update for the latest driver, you should probably remove the job building corresponding docker image.

# This job produces a Docker container with the latest versions of Intel
# drivers, that can be found on GitHub.
drivers_image_ubuntu2004_unstable:
if: github.repository == 'intel/llvm'
name: Intel Drivers (unstable) Ubuntu 20.04 Docker image
runs-on: ubuntu-latest
needs: base_image_ubuntu2004
steps:
- name: Checkout
uses: actions/checkout@v2
with:
fetch-depth: 2
- name: Get dependencies configuration
id: deps
run: |
DEPS=`cat devops/dependencies.json`
DEPS="${DEPS//'%'/'%25'}"
DEPS="${DEPS//$'\n'/'%0A'}"
DEPS="${DEPS//$'\r'/'%0D'}"
echo $DEPS
echo "::set-output name=deps::$DEPS"
- name: Build and Push Container
uses: ./devops/actions/build_container
with:
push: ${{ github.event_name != 'pull_request' }}
file: ubuntu2004_intel_drivers
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}
tags: |
ghcr.io/${{ github.repository }}/ubuntu2004_intel_drivers:unstable-${{ github.sha }}
ghcr.io/${{ github.repository }}/ubuntu2004_intel_drivers:unstable
build-args: |
compute_runtime_tag=${{fromJson(steps.deps.outputs.deps).linux_staging.compute_runtime.github_tag}}
igc_tag=${{fromJson(steps.deps.outputs.deps).linux_staging.igc.github_tag}}
tbb_tag=${{fromJson(steps.deps.outputs.deps).linux_staging.tbb.github_tag}}
fpgaemu_tag=${{fromJson(steps.deps.outputs.deps).linux_staging.fpgaemu.github_tag}}
cpu_tag=${{fromJson(steps.deps.outputs.deps).linux_staging.oclcpu.github_tag}}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is explanation in PR description. Let me know if it's not very clear.

We don't use "unstable" anywhere today, except "unstable" containers build. There are 3 areas if we'd like to do a full cleanup (removal) of "unstable" images:

  1. Update GPU driver (this specific PR is doing that)
  2. dependencies.json
  3. sycl_containers.yaml (code you referred to)

I was hesitating on making a full cleanup, made only (1) in this patch limiting patch only to GPU driver update.
As soon as (1) is done (2) and (3) will become effectively a dead code, I expect new unstable builds won't be triggered. I can remove it or can keep it, if we ever want to have some "unstable" containers in the future with some real purpose.

Note that driver version uplift process will keep working through the "latest" images.
In other words after this change:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree that we should remove "unstable" containers. From time to time the regressions in GPU driver, which block updates. Having unstable container is useful for getting latest status and for those who is not affected by the GPU driver regressions.

Pavel Chupin and others added 8 commits August 15, 2022 14:52
…hange

This is not required on the workflow change. Workflow
sycl_update_gpu_driver.yml when run will produce a new PR for driver
uplift, that's where full testing is expected in fact.
@pvchupin pvchupin temporarily deployed to deps_aaaaa August 17, 2022 23:36 Inactive
@pvchupin pvchupin temporarily deployed to deps_uplift August 17, 2022 23:40 Inactive
@pvchupin pvchupin temporarily deployed to deps_uplift August 18, 2022 00:05 Inactive
@pvchupin
Copy link
Contributor Author

It's appeared that secrets are not passed in pull_request workflow trigger. That means that such types of jobs that require secrets (to push changes or create PRs) cannot be effectively tested in pre-commit. It's appeared that the only way to test it is post-commit, also can be tested in temporary branch where change is getting committed.
Ran such testing and PR has been generated at #6602

This change is ready to land.

@pvchupin pvchupin requested a review from bader August 18, 2022 00:40
Copy link
Contributor

@cperkinsintel cperkinsintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pvchupin pvchupin merged commit 635e246 into intel:sycl Aug 18, 2022
@pvchupin pvchupin deleted the driver-upgrade branch August 18, 2022 18:23
@pvchupin
Copy link
Contributor Author

@bader, if you have further comments I'll be happy to address in a follow up PRs.

@pvchupin
Copy link
Contributor Author

Triggered job manually, works well it seems: #6608

@bader
Copy link
Contributor

bader commented Sep 3, 2022

  • Use personal access token to kick off pre-commit testing automatically in PR created by this job.

It doesn't seem to work as expected. I'm looking at recent PR created to update GPU driver and I see that it [tests old version](https://github.com/intel/llvm/runs/8084525204?check_suite_focus=true of the driver instead of the version it tries to update to.
@pvchupin, could you take a look, please?

UPD: the same problem can be observed in #6608 mentioned your comment above.

@pvchupin
Copy link
Contributor Author

pvchupin commented Sep 8, 2022

  • Use personal access token to kick off pre-commit testing automatically in PR created by this job.

It doesn't seem to work as expected. I'm looking at recent PR created to update GPU driver and I see that it [tests old version](https://github.com/intel/llvm/runs/8084525204?check_suite_focus=true of the driver instead of the version it tries to update to. @pvchupin, could you take a look, please?

UPD: the same problem can be observed in #6608 mentioned your comment above.

Thanks. It's been addressed at #6723

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants