|
1 | 1 | # Packer-based image build
|
2 | 2 |
|
3 |
| -The appliance contains code and configuration to use Packer with the [OpenStack builder](https://www.packer.io/plugins/builders/openstack) to build images. |
| 3 | +The appliance contains code and configuration to use [Packer](https://developer.hashicorp.com/packer) with the [OpenStack builder](https://www.packer.io/plugins/builders/openstack) to build images. |
4 | 4 |
|
5 |
| -The image built is referred to as a "fat" image as it contains binaries for all nodes, but no configuration. Using a "fat" image: |
| 5 | +The Packer configuration defined here builds "fat images" which contain binaries for all nodes, but no cluster-specific configuration. Using these: |
6 | 6 | - Enables the image to be tested in CI before production use.
|
7 | 7 | - Ensures re-deployment of the cluster or deployment of additional nodes can be completed even if packages are changed in upstream repositories (e.g. due to RockyLinux or OpenHPC updates).
|
8 | 8 | - Improves deployment speed by reducing the number of package downloads to improve deployment speed.
|
9 | 9 |
|
10 |
| -A default fat image is built in StackHPC's CI workflow and made available to clients. However it is possible to build site-specific fat images if required. |
| 10 | +By default, a fat image build starts from a RockyLinux GenericCloud image and updates all DNF packages already present. |
11 | 11 |
|
12 |
| -A fat image build starts from a RockyLinux GenericCloud image and (by default) updates all dnf packages in that image. |
| 12 | +The fat images StackHPC builds and test in CI are available from [GitHub releases](https://github.com/stackhpc/ansible-slurm-appliance/releases). However with some additional configuration it is also possible to: |
| 13 | +1. Build site-specific fat images from scratch. |
| 14 | +2. Extend an existing fat image with additional software. |
13 | 15 |
|
14 |
| -# Build Process |
15 |
| -- Ensure the current OpenStack credentials have sufficient authorisation to upload images (this may or may not require the `member` role for an application credential, depending on your OpenStack configuration). |
16 |
| -- Create a file `environments/<environment>/builder.pkrvars.hcl` containing at a minimum e.g.: |
17 |
| - |
18 |
| - ```hcl |
19 |
| - flavor = "general.v1.small" # VM flavor to use for builder VMs |
20 |
| - networks = ["26023e3d-bc8e-459c-8def-dbd47ab01756"] # List of network UUIDs to attach the VM to |
21 |
| - source_image_name = "Rocky-8.9-GenericCloud" # Name of source image. This must exist in OpenStack and should be a Rocky Linux GenericCloud-based image. |
22 |
| - ``` |
23 |
| - |
24 |
| - This configuration will generate and use an ephemeral SSH key for communicating with the Packer VM. If this is undesirable, set `ssh_keypair_name` to the name of an existing keypair in OpenStack. The private key must be on the host running Packer, and its path can be set using `ssh_private_key_file`. |
25 | 16 |
|
26 |
| - The network used for the Packer VM must provide outbound internet access but does not need to provide access to resources which the final cluster nodes require (e.g. Slurm control node, network filesystem servers etc.). |
| 17 | +# Usage |
| 18 | + |
| 19 | +The steps for building site-specific fat images or extending an existing fat image are the same: |
| 20 | + |
| 21 | +1. Ensure the current OpenStack credentials have sufficient authorisation to upload images (this may or may not require the `member` role for an application credential, depending on your OpenStack configuration). |
| 22 | +2. Create a Packer [variable definition file](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/variables#assigning-values-to-input-variables) at e.g. `environments/<environment>/builder.pkrvars.hcl` containing at a minimum e.g.: |
27 | 23 |
|
28 |
| - For additional options such as non-default private key locations or jumphost configuration see the variable descriptions in `./openstack.pkr.hcl`. |
| 24 | + ```hcl |
| 25 | + flavor = "general.v1.small" # VM flavor to use for builder VMs |
| 26 | + networks = ["26023e3d-bc8e-459c-8def-dbd47ab01756"] # List of network UUIDs to attach the VM to |
| 27 | + ``` |
| 28 | + |
| 29 | + - The network used for the Packer VM must provide outbound internet access but does not need to provide access to resources which the final cluster nodes require (e.g. Slurm control node, network filesystem servers etc.). |
| 30 | + |
| 31 | + - For additional options such as non-default private key locations or jumphost configuration see the variable descriptions in `./openstack.pkr.hcl`. |
29 | 32 |
|
30 |
| -- Activate the venv and the relevant environment. |
| 33 | + - For an example of configuration for extending an existing fat image see below. |
31 | 34 |
|
32 |
| -- Build images using the relevant variable definition file: |
| 35 | +3. Activate the venv and the relevant environment. |
| 36 | +
|
| 37 | +4. Build images using the relevant variable definition file, e.g.: |
| 38 | +
|
| 39 | + cd packer/ |
| 40 | + PACKER_LOG=1 /usr/bin/packer build -only=openstack.openhpc --on-error=ask -var-file=$PKR_VAR_environment_root/builder.pkrvars.hcl openstack.pkr.hcl |
| 41 | +
|
| 42 | + Note that the `-only` flag here restricts the build to the non-OFED fat image "source" (in Packer terminology). Other |
| 43 | + source options are: |
| 44 | + - `-only=openhpc-ofed`: Build a fat image including Mellanox OFED |
| 45 | + - `-only=openhpc-extra`: Build an image which extends an existing fat image - in this case the variable `source_image` or `source_image_name}` must also be set in the Packer variables file. |
| 46 | + |
| 47 | +5. The built image will be automatically uploaded to OpenStack with a name prefixed `openhpc-` and including a timestamp and a shortened git hash. |
| 48 | +
|
| 49 | +# Build Process |
33 | 50 |
|
34 |
| - cd packer |
35 |
| - PACKER_LOG=1 /usr/bin/packer build -only openstack.openhpc --on-error=ask -var-file=$PKR_VAR_environment_root/builder.pkrvars.hcl openstack.pkr.hcl |
| 51 | +In summary, Packer creates an OpenStack VM, runs Ansible on that, shuts it down, then creates an image from the root disk. |
36 | 52 |
|
37 |
| - Note the build VM is added to the `builder` group to differentiate them from "real" nodes - see developer notes below. |
| 53 | +Many of the Packer variables defined in `openstack.pkr.hcl` control the definition of the build VM and how to SSH to it to run Ansible, which are generic OpenStack builder options. Packer varibles can be set in a file at any convenient path; the above |
| 54 | +example shows the use of the environment variable `$PKR_VAR_environment_root` (which itself sets the Packer variable |
| 55 | +`environment_root`) to automatically select a variable file from the current environment, but for site-specific builds |
| 56 | +using a path in a "parent" environment is likely to be more appropriate (as builds should not be environment-specific, to allow testing). |
38 | 57 |
|
39 |
| -- The built image will be automatically uploaded to OpenStack with a name prefixed `openhpc-` and including a timestamp and a shortened git hash. |
| 58 | +What is Slurm Appliance-specific are the details of how Ansible is run: |
| 59 | +- The build VM is always added to the `builder` inventory group, which differentiates it from "real" nodes. This allows |
| 60 | + variables to be set differently during Packer builds, e.g. to prevent services starting. The defaults for this are in `environments/common/inventory/group_vars/builder/`, which could be extended or overriden for site-specific fat image builds using `builder` groupvars for the relevant environment. It also runs some builder-specific code (e.g. to ensure Packer's SSH |
| 61 | + keys are removed from the image). |
| 62 | +- The default fat image build also adds the build VM to the "top-level" `compute`, `control` and `login` groups. This ensures |
| 63 | + the Ansible specific to all of these types of nodes run (other inventory groups are constructed from these by `environments/common/inventory/groups file` - this is not builder-specific). |
| 64 | +- Which groups the build VM is added to is controlled by the Packer `groups` variable. This can be redefined for builds using the `openhpc-extra` source to add the build VM into specific groups. E.g. with a Packer variable file: |
40 | 65 |
|
41 |
| -# Notes for developers |
| 66 | + source_image_name = { |
| 67 | + RL9 = "openhpc-ofed-RL9-240619-0949-66c0e540" |
| 68 | + } |
| 69 | + groups = { |
| 70 | + openhpc-extra = ["foo"] |
| 71 | + } |
42 | 72 |
|
43 |
| -Packer build VMs are added to both the `builder` group and the other top-level groups (e.g. `control`, `compute`, etc.). The former group allows `environments/common/inventory/group_vars/builder/defaults.yml` to set variables specifically for the Packer builds, e.g. for services which should not be started. |
| 73 | + the build VM uses an existing "fat image" (rather than a RockyLinyux GenericCloud one) and is added to the `builder` and `foo` groups. This means only code targeting `builder` and `foo` groups runs. In this way an existing image can be extended with site-specific code, without modifying the part of the image which has already been tested in the StackHPC CI. |
44 | 74 |
|
45 |
| -Note that hostnames in the Packer VMs are not the same as the equivalent "real" hosts. Therefore variables required inside a Packer VM must be defined as group vars, not hostvars. |
| 75 | + - The playbook `ansible/fatimage.yml` is run which is only a subset of `ansible/site.yml`. This allows restricting the code |
| 76 | + which runs during build for cases where setting `builder` groupvars is not sufficient (e.g. a role always attempts to configure or start services). This may eventually be removed. |
46 | 77 |
|
47 |
| -Ansible may need to proxy to compute nodes. If the Packer build should not use the same proxy to connect to the builder VMs, note that proxy configuration should not be added to the `all` group. |
| 78 | +There are some things to be aware of when developing Ansible to run in a Packer build VM: |
| 79 | + - Only some tasks make sense. E.g. any services with a reliance on the network cannot be started, and may not be able to be enabled if when creating an instance with the resulting image the remote service will not be immediately present. |
| 80 | + - Nothing should be written to the persistent state directory `appliances_state_dir`, as this is on the root filesystem rather than an OpenStack volume. |
| 81 | + - Care should be taken not to leave data on the root filesystem which is not wanted in the final image, (e.g secrets). |
| 82 | + - Build VM hostnames are not the same as for equivalent "real" hosts and do not contain `login`, `control` etc. Therefore variables used by the build VM must be defined as groupvars not hostvars. |
| 83 | + - Ansible may need to proxy to real compute nodes. If Packer should not use the same proxy to connect to the |
| 84 | + build VMs (e.g. build happens on a different network), proxy configuration should not be added to the `all` group. |
| 85 | + - Currently two fat image "sources" are defined, with and without OFED. This simplifies CI configuration by allowing the |
| 86 | + default source images to be defined in the `openstack.pkr.hcl` definition. |
0 commit comments