Skip to content

[Github] Fetch all commits in PR for code formatting checks #69766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

boomanaiden154
Copy link
Contributor

@boomanaiden154 boomanaiden154 commented Oct 20, 2023

This patch makes a couple changes to the PR code formatting check:

  • Moves the changed-files action to before the checkout to make sure that it pulls
    information from the Github API rather than by running git diff to alleviate some
    performance problems.
  • Checkout the head of the pull request head instead of the base of the pull request
    to ensure that we have the PR commits inside the checkout.
  • Add an additional sparse checkout of the necessary LLVM tools to run the action
    to alleviate security problems introduced by checking out the head of the pull
    request. Only code from the base of the pull request runs.
  • Adjust the commit references to be based on HEAD as Github doesn't give
    exact commit SHAs for the first commit in the PR.

@boomanaiden154
Copy link
Contributor Author

Needs some work to figure out how much history we should clone and the best way to do so.

@boomanaiden154 boomanaiden154 force-pushed the code-format-changed-files-api branch from c76ed5d to a898e07 Compare October 20, 2023 20:37
This patch ensures that all commits from a PR are fetched before running
the rest of the job. This allows the job to scale to any number of
commits per PR. In addition, this significantly reduces the time taken
in the files-changed step where the job would itself try to unshallow
the clone, but do so in some inefficient way. That step which previously
took ~10 minutes now takes 3 seconds after this patch.
@boomanaiden154 boomanaiden154 force-pushed the code-format-changed-files-api branch from a898e07 to f44998a Compare October 20, 2023 21:25
@boomanaiden154 boomanaiden154 changed the title [Github] Force changed files step to use GH API for code formatting [Github] Fetch all commits in PR for code formatting checks Oct 20, 2023
@boomanaiden154 boomanaiden154 marked this pull request as ready for review October 20, 2023 21:26
@llvmbot
Copy link
Member

llvmbot commented Oct 20, 2023

@llvm/pr-subscribers-github-workflow

Author: Aiden Grossman (boomanaiden154)

Changes

This patch ensures that all commits from a PR are fetched before running
the rest of the job. This allows the job to scale to any number of
commits per PR. In addition, this significantly reduces the time taken
in the files-changed step where the job would itself try to unshallow
the clone, but do so in some inefficient way. That step which previously
took ~10 minutes now takes 3 seconds after this patch.


Full diff: https://github.com/llvm/llvm-project/pull/69766.diff

1 Files Affected:

  • (modified) .github/workflows/pr-code-format.yml (+1-2)
diff --git a/.github/workflows/pr-code-format.yml b/.github/workflows/pr-code-format.yml
index 3a91ffb0b1ad9a7..8ca27c2f6363b16 100644
--- a/.github/workflows/pr-code-format.yml
+++ b/.github/workflows/pr-code-format.yml
@@ -10,14 +10,13 @@ jobs:
       - name: Fetch LLVM sources
         uses: actions/checkout@v4
         with:
-          fetch-depth: 2
+          fetch-depth: ${{ github.event.pull_request.commits }}
 
       - name: Get changed files
         id: changed-files
         uses: tj-actions/changed-files@v39
         with:
           separator: ","
-          fetch_depth: 100 # Fetches only the last 10 commits
 
       - name: "Listed files"
         run: |

@boomanaiden154
Copy link
Contributor Author

Now just fetching the number of commits in the PR. This makes the changed files step fast (couple of seconds) and the changed files step still ensures the relevant history is there so that the script can run git clang-format properly.

Tested against my fork and everything was working well there.

@boomanaiden154
Copy link
Contributor Author

This needs quite a bit more thought. Will try and hack on it once I get a chance.

@boomanaiden154 boomanaiden154 force-pushed the code-format-changed-files-api branch from 92ef4b7 to b2eb425 Compare October 29, 2023 01:22
@boomanaiden154
Copy link
Contributor Author

@tstellar @tru This should be approximately ready for review at this point. There are a couple key changes made in this PR:

  • Moved the changed files action to before the checkout to ensure that it grabs information from the Github API.
  • Switch the checkout ref to the head of the PR rather than the base.
  • Add an additional sparse LLVM checkout from the PR target ref with the files necessary to run the workflow to prevent security concerns.
  • Adjust commit references to use values relative to HEAD rather than exact commit SHAs as there is no way to get the SHA of the first commit from the Github variables as far as I can tell.

I've tested this against a couple LLVM PRs on my fork and everything seems to be working well, including a couple cherry-picked PRs that didn't run before due to the changed-files action failing. This also significantly improves the speed of the workflow.

It's a little bit hacky doing the multiple checkouts. This can probably be alleviaated by performing the build inside a container (which would also make this faster), but I'll leave that to a future change for the moment.

@tru
Copy link
Collaborator

tru commented Oct 29, 2023

Nicely done! It looks like a good approach. If I understand the spare checkout correctly it's only to get the scripts to run. Can't we fetch these files with curl instead of a sparse checkout? Should be faster and it's what we do in other places.

@boomanaiden154
Copy link
Contributor Author

Nicely done! It looks like a good approach. If I understand the spare checkout correctly it's only to get the scripts to run. Can't we fetch these files with curl instead of a sparse checkout? Should be faster and it's what we do in other places.

Yes, the sparse checkout is only to get the scripts that need to run. I don't think the time difference is significant. The sparse checkout takes about two seconds from my testing and it has a couple nice properties like using the scripts from the target branch rather than an arbitrary ref (probably main) in the curl invocations which makes future testing/adaptation quite a bit easier.

@tru
Copy link
Collaborator

tru commented Oct 29, 2023

Oh in my previous testing sparse checkout was really really slow. But that sounds good then! LGTM feel free to merge, but do it when you can monitor the actions tab and see if the jobs don't fail in an unexpected way.

@boomanaiden154
Copy link
Contributor Author

Weird that the sparse checkout was slow in your previous experience. I'll definitely make sure to keep an eye out for weird failures. There's gotta be at least something. I'll keep an eye on the timing for the sparse checkouts and make sure nothing blows up there too.

I'll let @tstellar take a look to make sure I'm not doing anything too crazy, update the PR message, and then merge.

@boomanaiden154 boomanaiden154 merged commit 4aa12af into llvm:main Oct 30, 2023
boomanaiden154 added a commit that referenced this pull request Oct 30, 2023
…69766)"

This reverts commit 4aa12af.

This change introduced failures upon checking out the PR source code.
Pulling this out of tree while I investigate further.
@boomanaiden154
Copy link
Contributor Author

Well that didn't last long.

This was causing fetch failures. Need to do some more debugging and figure out why. Even with debug logging, the error messages are incredibly unhelpful.

https://github.com/llvm/llvm-project/actions/runs/6697812983/job/18198489209

boomanaiden154 added a commit to boomanaiden154/llvm-project that referenced this pull request Nov 2, 2023
…lvm#69766)"

This commit relands 4aa12af. This was
originally reverted as it caused fetch failures due to the usage of the
wrong ref. I didn't catch this in testing as it was attempting to check
the branch out from origin, where it didn't exist, but did on my fork as
I never tested any cross-repo PRs.
boomanaiden154 added a commit to boomanaiden154/llvm-project that referenced this pull request Nov 4, 2023
…lvm#69766)"

This commit relands 4aa12af. This was
originally reverted as it caused fetch failures due to the usage of the
wrong ref. I didn't catch this in testing as it was attempting to check
the branch out from origin, where it didn't exist, but did on my fork as
I never tested any cross-repo PRs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants