-
Notifications
You must be signed in to change notification settings - Fork 14
Optionally keep self-hosted-runners allocated after creating them #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
109dee3
to
00accaf
Compare
I tested this in conjunction with https://github.com/git-for-windows/git-for-windows-automation/actions/runs/3825495631/jobs/6508527516 and was surprised that the runner did not come online immediately... turns out that I had forgotten about the post-deploy script! I'm not so much of a PowerShell expert, so I want to use the next deployment of an aarch64 package as an excuse to test this before I mark this PR as ready for review. |
6ece1a0
to
838f771
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dscho I updated two things in your code:
In the .yml
file, I replaced stopService=$env:DEALLOCATE_IMMEDIATELY
with stopService={{ env.DEALLOCATE_IMMEDIATELY }}
.
In the azure-arm-template.json
file, I added the stopService
parameter to the parameters
object on top of the file. It needs to be defined there before you can actually use it as parameters('stopService')
.
I also took the opportunity to use the D:\
drive if possible for faster Actions runs. Hope that's OK!
838f771
to
59ed548
Compare
I think there is a But I have a more fundamental concern with using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
@@ -135,9 +141,10 @@ jobs: | |||
with: | |||
resourceGroupName: ${{ secrets.AZURE_RESOURCE_GROUP }} | |||
template: ./azure-self-hosted-runners/azure-arm-template.json | |||
parameters: ./azure-self-hosted-runners/azure-arm-template-example-parameters.json githubActionsRunnerRegistrationUrl="${{ env.ACTIONS_RUNNER_REGISTRATION_URL }}" githubActionsRunnerToken="${{ env.ACTIONS_RUNNER_TOKEN }}" postDeploymentPsScriptUrl="${{ env.POST_DEPLOYMENT_SCRIPT_URL }}" virtualMachineName=${{ steps.generate-vm-name.outputs.vm_name }} virtualMachineSize=Standard_D8pls_v5 publicIpAddressName1=${{ steps.generate-vm-name.outputs.vm_name }}-ip adminUsername=${{ secrets.AZURE_VM_USERNAME }} adminPassword=${{ secrets.AZURE_VM_PASSWORD }} | |||
parameters: ./azure-self-hosted-runners/azure-arm-template-example-parameters.json githubActionsRunnerRegistrationUrl="${{ env.ACTIONS_RUNNER_REGISTRATION_URL }}" githubActionsRunnerToken="${{ env.ACTIONS_RUNNER_TOKEN }}" postDeploymentPsScriptUrl="${{ env.POST_DEPLOYMENT_SCRIPT_URL }}" virtualMachineName=${{ steps.generate-vm-name.outputs.vm_name }} virtualMachineSize=Standard_D8pls_v5 publicIpAddressName1=${{ steps.generate-vm-name.outputs.vm_name }}-ip adminUsername=${{ secrets.AZURE_VM_USERNAME }} adminPassword=${{ secrets.AZURE_VM_PASSWORD }} stopService={{ env.DEALLOCATE_IMMEDIATELY }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the {{
needs to be replaced with ${{
. But as I said, I'd like to avoid that construct in favor of native PowerShell environment variable expansion here (and elsewhere in this part of the workflow definition, in a separate commit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the
{{
needs to be replaced with${{
.
Oops! Good catch. Will fix.
But as I said, I'd like to avoid that construct in favor of native PowerShell environment variable expansion here (and elsewhere in this part of the workflow definition, in a separate commit).
Unfortunately, that won't work. The DEALLOCATE_IMMEDIATELY
env variable is only available in the Actions Runner context. When the VM is created, it does the following:
- Download the post-deployment script from https://raw.githubusercontent.com/git-for-windows/git-for-windows-automation/main/azure-self-hosted-runners/post-deployment-script.ps1
- Execute it from within the VM (which doesn't have env variables like
DEALLOCATE_IMMEDIATELY
)
So we need to explicitly pass these parameters with their actual values to the azure/arm-deploy
action, which in turn passes it to the Microsoft.CustomScriptExtension
, which in turn will execute the two steps above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think I misunderstood actually:
In general, I think we should always use the native environment variable expansion in scripts, i.e.
$env:XYZ
in PowerShell,$XYZ
in Bash.
That makes sense to me. Just replacing ${{ env.XYZ }}
everywhere in the workflow. Will do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this look?
It looks good! I did have to change the ACTIONS_RUNNER_REGISTRATION_URL
lines because they only set the environment variable for subsequent steps, not the current one, but we no longer use it in a subsequent step, but in the current one.
59ed548
to
52bab03
Compare
52bab03
to
ca6646b
Compare
END | ||
) | ||
|
||
echo "AZURE_ARM_PARAMETERS=$AZURE_ARM_PARAMETERS" >> $GITHUB_ENV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really possible to put multi-line values into GITHUB_ENV
? I was under the impression that that's impossible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you need a <<EOF
trick: see e.g. this blog post.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it locally, and AZURE_ARM_PARAMETERS=$(cat <<-END
actually lets Bash turn it into a single-line, space-separated string, so then it can be written into the $GITHUB_ENV
without having to do the <<EOF
trick you mentioned 🙏🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed that the $(...)
aren't enclosed in double-quotes. Excellent!
801cf79
to
9044953
Compare
We need to couple the post-deployment script URL more tightly to the current revision, otherwise the workflow run might pick up an unintended version of that script. Signed-off-by: Johannes Schindelin <[email protected]>
For now, we manually spin up the runners. It makes more sense to keep them running so that they can pick up the job immediately instead of deallocating them and then manually restart them. Signed-off-by: Johannes Schindelin <[email protected]>
Let's switch to Standard_D8plds_v5. The "d" stands for: d – Diskfull (local temp disk is present) This means that we get an additional drive on the VM (D:\) that is blaziingly fast. It leads to a ~25% speed increase when doing heavy IO operations. Ref: https://learn.microsoft.com/en-us/azure/virtual-machines/managed-disks-overview#temporary-disk Ref: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources Ref: https://azureprice.net/vm/Standard_D8plds_v5 Signed-off-by: Dennis Ameling <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
Some Azure VM types have a temporary disk which is local to the VM and therefore provides _much_ faster disk IO than the OS Disk (or any other attached disk). Hosted GitHub Actions runners also leverage this disk and do their work in D:/a/_work, so let's use it too if we can. It leads to a ~25% speed increase when doing heavy IO operations. Ref: https://learn.microsoft.com/en-us/azure/virtual-machines/managed-disks-overview#temporary-disk Ref: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources Signed-off-by: Dennis Ameling <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
We will soon be adding logic that automatically spins up and removes runners on-demand, so there's no need to deallocate in that case. An additional benefit of not deallocating is that we can leverage the fast D:\ drive to install the Runner on. This drive would otherwise be destroyed as part of the deallocation process. Signed-off-by: Dennis Ameling <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
9044953
to
027d7e3
Compare
# For a convenient overview of all arm64 VM types, see e.g. https://azureprice.net/?_cpuArchitecture=Arm64 | ||
AZURE_VM_TYPE: Standard_D8plds_v5 | ||
# At the time of writing, "centralindia" was the cheapest region for the VM type we're using. https://azureprice.net/vm/Standard_D8plds_v5 | ||
AZURE_VM_REGION: centralindia |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. It does not look as if that region has the VM type we require.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. AzurePrice does list it as being available in that region. Probably "eastus" would do the trick then 🤷🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://azureprice.net/content/about says that this site is not affiliated with Microsoft, and I fail to see any equivalent prices in the Azure Portal (probably because the former tries to factor in Windows license costs?)
In any case, I think I solved the riddle about West India:
Unfortunately, I do not find any official documentation about pricing at https://learn.microsoft.com/en-us/azure/virtual-machines/dplsv5-dpldsv5-series, at least not anything helpful (the calculator seems to either be unaware of that series or calls it by a different name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with "westus2" and started a new workflow run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and https://azure.microsoft.com/en-us/explore/global-infrastructure/products-by-region/?products=virtual-machines®ions=central-india,south-india,west-india claims that availability of these machines in India is projected for Q1 2023.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there!! 😅 🤦🏼 https://github.com/git-for-windows/git-for-windows-automation/actions/runs/3881130673/jobs/6625764356
==> Verifying source file signatures with gpg...
xpdf-4.00.tar.gz ... FAILED (unknown public key F4825F5397271342)
==> ERROR: One or more PGP signatures could not be verified!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I actually had that fixed in git-for-windows/MINGW-packages@c8ccd71 already, but then had a brain fart when I re-ran the failed jobs, not remembering that a re-run would fail to target the newly-pushed commit.
This makes it easier to configure the Azure VM region. Let's default to "westus2" while we're at it, as it's one of the cheaper options available at the time of writing. Signed-off-by: Dennis Ameling <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
The string we used to provide parameters to Azure ARM was getting way too long. Let's use a different approach by preparing the string in the previous step. An additional benefit to this is that we can use Bash's environment variable expansion, which should be more robust than the one GitHub provides through ${{ env.XYZ }}. Signed-off-by: Dennis Ameling <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
The network interface and the public IP do not need to exist without the VM. Therefore, let's let them be deleted when the VM is deleted. Signed-off-by: Johannes Schindelin <[email protected]>
We register the new runner as "ephemereal", i.e. it runs one job and gets automatically removed from the self-hosted runner group. To avoid incurring unnecessary Azure cost, let's shut down the VM after running the job. Signed-off-by: Johannes Schindelin <[email protected]>
It's not like we want to create tens of runners at the same time. One is sufficient. Signed-off-by: Johannes Schindelin <[email protected]>
404997c
to
9f692f6
Compare
with: | ||
azcliversion: 2.43.0 | ||
inlineScript: | | ||
az vm delete -n "$ACTIONS_RUNNER_NAME" -g ${{ secrets.AZURE_RESOURCE_GROUP }} --yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works, kind of. But it leaves two resources behind, the network security group and the virtual network:
@dennisameling you mentioned previously that we could avoid that by reusing a single resource? Or is there a way to pass one or two deleteOption
s so that these resources are deleted, too, when the VM is deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see whether 4c2ba65 works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is a known issue: Azure/azure-cli#22158
We might need 1-2 additional az
CLI calls to find the NSG/Vnet that are attached to the VM and delete them in follow-up calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also looks like deleteOption
isn't a supported property on Microsoft.Network/networkSecurityGroups
and Microsoft.Network/virtualNetworks
:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it does not work.
We might need 1-2 additional
az
CLI calls to find the NSG/Vnet that are attached to the VM and delete them in follow-up calls.
I guess we need 2 additional CLI calls, but we know the exact name i.e. we do not have to identify the resources via a separate call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need 2 additional CLI calls, but we know the exact name i.e. we do not have to identify the resources via a separate call.
Yep - looks like that should be pretty straightforward:
https://learn.microsoft.com/en-us/cli/azure/network/nsg?view=azure-cli-latest#az-network-nsg-delete
az network nsg delete -g MyResourceGroup -n MyNsg
az network vnet delete -g MyResourceGroup -n myVNet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually need three CLI calls, for vnet, nsg and public-ip
855256e
to
4c2ba65
Compare
Since we're using ephemeral runners, the VMs are no longer used after they completed a workflow job, and need to be deleted. This GitHub workflow is designed to perform that action, and it is intended to be triggered by the GitForWindowsHelper GitHub App in response to receiving a `workflow_job.completed` event. Signed-off-by: Johannes Schindelin <[email protected]>
4c2ba65
to
7d3ef90
Compare
I'm declaring this ready to be merged, after it successfully created several runners in the attempt to build |
Thanks for your work on this!! 🔥 |
For now, we manually spin up the runners. It makes more sense to keep them running so that they can pick up the job immediately instead of deallocating them and then manually restart them.