Skip to content

Optionally keep self-hosted-runners allocated after creating them #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jan 10, 2023

Conversation

dscho
Copy link
Member

@dscho dscho commented Jan 2, 2023

For now, we manually spin up the runners. It makes more sense to keep them running so that they can pick up the job immediately instead of deallocating them and then manually restart them.

@dscho dscho requested a review from dennisameling January 2, 2023 22:42
@dscho dscho marked this pull request as draft January 2, 2023 22:56
@dscho dscho force-pushed the make-self-hosted-runner-deallocation-optional branch from 109dee3 to 00accaf Compare January 2, 2023 23:11
@dscho
Copy link
Member Author

dscho commented Jan 2, 2023

I tested this in conjunction with https://github.com/git-for-windows/git-for-windows-automation/actions/runs/3825495631/jobs/6508527516 and was surprised that the runner did not come online immediately... turns out that I had forgotten about the post-deploy script! I'm not so much of a PowerShell expert, so I want to use the next deployment of an aarch64 package as an excuse to test this before I mark this PR as ready for review.

@dennisameling dennisameling force-pushed the make-self-hosted-runner-deallocation-optional branch 2 times, most recently from 6ece1a0 to 838f771 Compare January 3, 2023 07:19
Copy link
Contributor

@dennisameling dennisameling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dscho I updated two things in your code:

In the .yml file, I replaced stopService=$env:DEALLOCATE_IMMEDIATELY with stopService={{ env.DEALLOCATE_IMMEDIATELY }}.

In the azure-arm-template.json file, I added the stopService parameter to the parameters object on top of the file. It needs to be defined there before you can actually use it as parameters('stopService').

I also took the opportunity to use the D:\ drive if possible for faster Actions runs. Hope that's OK!

@dennisameling dennisameling force-pushed the make-self-hosted-runner-deallocation-optional branch from 838f771 to 59ed548 Compare January 3, 2023 07:22
@dscho
Copy link
Member Author

dscho commented Jan 3, 2023

In the .yml file, I replaced stopService=$env:DEALLOCATE_IMMEDIATELY with stopService={{ env.DEALLOCATE_IMMEDIATELY }}.

I think there is a $ missing before the {{.

But I have a more fundamental concern with using ${{ env.XYZ }} instead of $env:XYZ: the latter is a lot more robust with respect to spaces, quotes and other special characters in the value of the environment variable. In general, I think we should always use the native environment variable expansion in scripts, i.e. $env:XYZ in PowerShell, $XYZ in Bash.

Copy link
Member Author

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@@ -135,9 +141,10 @@ jobs:
with:
resourceGroupName: ${{ secrets.AZURE_RESOURCE_GROUP }}
template: ./azure-self-hosted-runners/azure-arm-template.json
parameters: ./azure-self-hosted-runners/azure-arm-template-example-parameters.json githubActionsRunnerRegistrationUrl="${{ env.ACTIONS_RUNNER_REGISTRATION_URL }}" githubActionsRunnerToken="${{ env.ACTIONS_RUNNER_TOKEN }}" postDeploymentPsScriptUrl="${{ env.POST_DEPLOYMENT_SCRIPT_URL }}" virtualMachineName=${{ steps.generate-vm-name.outputs.vm_name }} virtualMachineSize=Standard_D8pls_v5 publicIpAddressName1=${{ steps.generate-vm-name.outputs.vm_name }}-ip adminUsername=${{ secrets.AZURE_VM_USERNAME }} adminPassword=${{ secrets.AZURE_VM_PASSWORD }}
parameters: ./azure-self-hosted-runners/azure-arm-template-example-parameters.json githubActionsRunnerRegistrationUrl="${{ env.ACTIONS_RUNNER_REGISTRATION_URL }}" githubActionsRunnerToken="${{ env.ACTIONS_RUNNER_TOKEN }}" postDeploymentPsScriptUrl="${{ env.POST_DEPLOYMENT_SCRIPT_URL }}" virtualMachineName=${{ steps.generate-vm-name.outputs.vm_name }} virtualMachineSize=Standard_D8pls_v5 publicIpAddressName1=${{ steps.generate-vm-name.outputs.vm_name }}-ip adminUsername=${{ secrets.AZURE_VM_USERNAME }} adminPassword=${{ secrets.AZURE_VM_PASSWORD }} stopService={{ env.DEALLOCATE_IMMEDIATELY }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the {{ needs to be replaced with ${{. But as I said, I'd like to avoid that construct in favor of native PowerShell environment variable expansion here (and elsewhere in this part of the workflow definition, in a separate commit).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the {{ needs to be replaced with ${{.

Oops! Good catch. Will fix.

But as I said, I'd like to avoid that construct in favor of native PowerShell environment variable expansion here (and elsewhere in this part of the workflow definition, in a separate commit).

Unfortunately, that won't work. The DEALLOCATE_IMMEDIATELY env variable is only available in the Actions Runner context. When the VM is created, it does the following:

  1. Download the post-deployment script from https://raw.githubusercontent.com/git-for-windows/git-for-windows-automation/main/azure-self-hosted-runners/post-deployment-script.ps1
  2. Execute it from within the VM (which doesn't have env variables like DEALLOCATE_IMMEDIATELY)

So we need to explicitly pass these parameters with their actual values to the azure/arm-deploy action, which in turn passes it to the Microsoft.CustomScriptExtension, which in turn will execute the two steps above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think I misunderstood actually:

In general, I think we should always use the native environment variable expansion in scripts, i.e. $env:XYZ in PowerShell, $XYZ in Bash.

That makes sense to me. Just replacing ${{ env.XYZ }} everywhere in the workflow. Will do!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this look?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this look?

It looks good! I did have to change the ACTIONS_RUNNER_REGISTRATION_URL lines because they only set the environment variable for subsequent steps, not the current one, but we no longer use it in a subsequent step, but in the current one.

@dennisameling dennisameling force-pushed the make-self-hosted-runner-deallocation-optional branch from 59ed548 to 52bab03 Compare January 3, 2023 09:25
@dennisameling dennisameling marked this pull request as ready for review January 3, 2023 09:25
@dennisameling dennisameling force-pushed the make-self-hosted-runner-deallocation-optional branch from 52bab03 to ca6646b Compare January 3, 2023 09:45
END
)

echo "AZURE_ARM_PARAMETERS=$AZURE_ARM_PARAMETERS" >> $GITHUB_ENV
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really possible to put multi-line values into GITHUB_ENV? I was under the impression that that's impossible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you need a <<EOF trick: see e.g. this blog post.

Copy link
Contributor

@dennisameling dennisameling Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it locally, and AZURE_ARM_PARAMETERS=$(cat <<-END actually lets Bash turn it into a single-line, space-separated string, so then it can be written into the $GITHUB_ENV without having to do the <<EOF trick you mentioned 🙏🏼

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that the $(...) aren't enclosed in double-quotes. Excellent!

@dscho dscho force-pushed the make-self-hosted-runner-deallocation-optional branch 2 times, most recently from 801cf79 to 9044953 Compare January 10, 2023 08:10
dscho and others added 5 commits January 10, 2023 11:31
We need to couple the post-deployment script URL more tightly to the
current revision, otherwise the workflow run might pick up an unintended
version of that script.

Signed-off-by: Johannes Schindelin <[email protected]>
For now, we manually spin up the runners. It makes more sense to keep
them running so that they can pick up the job immediately instead of
deallocating them and then manually restart them.

Signed-off-by: Johannes Schindelin <[email protected]>
Let's switch to Standard_D8plds_v5. The "d" stands for:

d – Diskfull (local temp disk is present)

This means that we get an additional drive on the VM (D:\) that is
blaziingly fast. It leads to a ~25% speed increase when doing heavy
IO operations.

Ref: https://learn.microsoft.com/en-us/azure/virtual-machines/managed-disks-overview#temporary-disk
Ref:
https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources
Ref: https://azureprice.net/vm/Standard_D8plds_v5
Signed-off-by: Dennis Ameling <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Some Azure VM types have a temporary disk which is local to the VM and
therefore provides _much_ faster disk IO than the OS Disk (or any other
attached disk).

Hosted GitHub Actions runners also leverage this disk and do their work
in D:/a/_work, so let's use it too if we can. It leads to a ~25% speed
increase when doing heavy IO operations.

Ref: https://learn.microsoft.com/en-us/azure/virtual-machines/managed-disks-overview#temporary-disk
Ref: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources

Signed-off-by: Dennis Ameling <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
We will soon be adding logic that automatically spins up and removes
runners on-demand, so there's no need to deallocate in that case. An
additional benefit of not deallocating is that we can leverage the fast
D:\ drive to install the Runner on. This drive would otherwise be
destroyed as part of the deallocation process.

Signed-off-by: Dennis Ameling <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
# For a convenient overview of all arm64 VM types, see e.g. https://azureprice.net/?_cpuArchitecture=Arm64
AZURE_VM_TYPE: Standard_D8plds_v5
# At the time of writing, "centralindia" was the cheapest region for the VM type we're using. https://azureprice.net/vm/Standard_D8plds_v5
AZURE_VM_REGION: centralindia
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. AzurePrice does list it as being available in that region. Probably "eastus" would do the trick then 🤷🏼

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://azureprice.net/content/about says that this site is not affiliated with Microsoft, and I fail to see any equivalent prices in the Azure Portal (probably because the former tries to factor in Windows license costs?)

In any case, I think I solved the riddle about West India:

image

Unfortunately, I do not find any official documentation about pricing at https://learn.microsoft.com/en-us/azure/virtual-machines/dplsv5-dpldsv5-series, at least not anything helpful (the calculator seems to either be unaware of that series or calls it by a different name).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with "westus2" and started a new workflow run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!! 😅 🤦🏼 https://github.com/git-for-windows/git-for-windows-automation/actions/runs/3881130673/jobs/6625764356

 ==> Verifying source file signatures with gpg...
    xpdf-4.00.tar.gz ... FAILED (unknown public key F4825F5397271342)
==> ERROR: One or more PGP signatures could not be verified!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I actually had that fixed in git-for-windows/MINGW-packages@c8ccd71 already, but then had a brain fart when I re-ran the failed jobs, not remembering that a re-run would fail to target the newly-pushed commit.

dennisameling and others added 4 commits January 10, 2023 16:17
This makes it easier to configure the Azure VM region. Let's default to
"westus2" while we're at it, as it's one of the cheaper options
available at the time of writing.

Signed-off-by: Dennis Ameling <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
The string we used to provide parameters to Azure ARM was getting
way too long. Let's use a different approach by preparing the string in
the previous step. An additional benefit to this is that we can use
Bash's environment variable expansion, which should be more robust
than the one GitHub provides through ${{ env.XYZ }}.

Signed-off-by: Dennis Ameling <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
The network interface and the public IP do not need to exist without the
VM. Therefore, let's let them be deleted when the VM is deleted.

Signed-off-by: Johannes Schindelin <[email protected]>
We register the new runner as "ephemereal", i.e. it runs one job and
gets automatically removed from the self-hosted runner group.

To avoid incurring unnecessary Azure cost, let's shut down the VM
after running the job.

Signed-off-by: Johannes Schindelin <[email protected]>
It's not like we want to create tens of runners at the same time. One is
sufficient.

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho dscho force-pushed the make-self-hosted-runner-deallocation-optional branch from 404997c to 9f692f6 Compare January 10, 2023 15:17
with:
azcliversion: 2.43.0
inlineScript: |
az vm delete -n "$ACTIONS_RUNNER_NAME" -g ${{ secrets.AZURE_RESOURCE_GROUP }} --yes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, kind of. But it leaves two resources behind, the network security group and the virtual network:

image

@dennisameling you mentioned previously that we could avoid that by reusing a single resource? Or is there a way to pass one or two deleteOptions so that these resources are deleted, too, when the VM is deleted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see whether 4c2ba65 works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is a known issue: Azure/azure-cli#22158

We might need 1-2 additional az CLI calls to find the NSG/Vnet that are attached to the VM and delete them in follow-up calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it does not work.

We might need 1-2 additional az CLI calls to find the NSG/Vnet that are attached to the VM and delete them in follow-up calls.

I guess we need 2 additional CLI calls, but we know the exact name i.e. we do not have to identify the resources via a separate call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need 2 additional CLI calls, but we know the exact name i.e. we do not have to identify the resources via a separate call.

Yep - looks like that should be pretty straightforward:

https://learn.microsoft.com/en-us/cli/azure/network/nsg?view=azure-cli-latest#az-network-nsg-delete

az network nsg delete -g MyResourceGroup -n MyNsg

https://learn.microsoft.com/en-us/cli/azure/network/vnet?view=azure-cli-latest#az-network-vnet-delete

az network vnet delete -g MyResourceGroup -n myVNet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dscho dscho force-pushed the make-self-hosted-runner-deallocation-optional branch from 855256e to 4c2ba65 Compare January 10, 2023 16:25
Since we're using ephemeral runners, the VMs are no longer used after
they completed a workflow job, and need to be deleted. This GitHub
workflow is designed to perform that action, and it is intended to be
triggered by the GitForWindowsHelper GitHub App in response to receiving
a `workflow_job.completed` event.

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho dscho force-pushed the make-self-hosted-runner-deallocation-optional branch from 4c2ba65 to 7d3ef90 Compare January 10, 2023 16:36
@dscho
Copy link
Member Author

dscho commented Jan 10, 2023

I'm declaring this ready to be merged, after it successfully created several runners in the attempt to build mingw-w64-clang-aarch64-xpdf-tools 😉

@dscho dscho merged commit fb9dda1 into main Jan 10, 2023
@dscho dscho deleted the make-self-hosted-runner-deallocation-optional branch January 10, 2023 19:40
@dennisameling
Copy link
Contributor

Thanks for your work on this!! 🔥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants