Skip to content

Commit f02bb5e

Browse files
committed
Merge branch 'stackhpc/2024.1' into 2024.1-ansible-lint-alex
2 parents c81275f + fe96cb4 commit f02bb5e

29 files changed

+914
-88
lines changed

.automation

doc/source/configuration/release-train.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _stackhpc_release_train:
2+
13
======================
24
StackHPC Release Train
35
======================

doc/source/contributor/package-updates.rst

Lines changed: 28 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -63,18 +63,20 @@ The following steps describe the process to test the new package and container r
6363
Creating the multinode environments
6464
-----------------------------------
6565

66-
There is a comprehensive guide to setting up a multinode environment with Terraform, found here: https://github.com/stackhpc/terraform-kayobe-multinode. There are some things to note:
66+
The `Multinode deployment workflow <https://github.com/stackhpc/stackhpc-kayobe-config/actions/workflows/stackhpc-multinode.yml>`_ can be used to automatically test changes.
67+
68+
To manually test the changes, there is a comprehensive guide to set up a Multinode environment with Terraform, found here: https://github.com/stackhpc/terraform-kayobe-multinode. There are some things to note:
6769

6870
* OVN is enabled by default, you should override it under ``etc/kayobe/environments/ci-multinode/kolla.yml kolla_enable_ovn: false`` for the OVS multinode environment.
6971

70-
* Remember to set different vxlan_vnis for each.
72+
* Remember to set a different ``vxlan_vni`` for each.
7173

72-
* Before starting any tests, run ``dnf distro-sync`` on each host to ensure you are using the same snapshots as in the release train. You can do this using the following commands:
74+
* Before starting any tests, run ``dnf distro-sync -y`` on each host to ensure you are using the same snapshots as in the release train. Option ``-y`` is used to prevent hosts hang waiting for the confirmation input. You can do this using the following commands:
7375

7476
.. code-block:: console
7577
76-
kayobe seed host command run -b --command "dnf distro-sync"
77-
kayobe overcloud host command run -b --command "dnf distro-sync"
78+
kayobe seed host command run -b --command "dnf distro-sync -y"
79+
kayobe overcloud host command run -b --command "dnf distro-sync -y"
7880
7981
* This may have installed a new kernel version. If so, you will need to reboot the overcloud hosts. You can check the installed kernels and the currently running kernel with the following commands. If the latest listed version is not running, you will need to reboot.
8082

@@ -85,7 +87,7 @@ There is a comprehensive guide to setting up a multinode environment with Terraf
8587
8688
kayobe playbook run --limit seed,overcloud $KAYOBE_CONFIG_PATH/ansible/reboot.yml
8789
88-
* The tempest tests run automatically at the end of deploy-openstack.sh. If you have the time, it is worth fixing any failing tests you can so that there is greater coverage for the package updates. (Also remember to propose these fixes in the relevant repos where applicable.)
90+
* The tempest tests run automatically at the end of the multinode deployment script. If you have the time, it is worth fixing any failing tests you can so that there is greater coverage for the package updates. (Also remember to propose these fixes in the relevant repos where applicable.)
8991

9092
Upgrading host packages
9193
-----------------------
@@ -102,6 +104,7 @@ For Rocky Linux 9, bump the snapshot versions in /etc/yum/repos.d with:
102104

103105
.. code-block:: console
104106
107+
kayobe seed host configure -t dnf
105108
kayobe overcloud host configure -t dnf
106109
107110
Install new packages:
@@ -112,22 +115,32 @@ Install new packages:
112115
113116
Perform a rolling reboot of hosts:
114117

118+
.. note::
119+
In the Multinode environment, the seed-hypervisor cannot access control
120+
plane instances with the Openstack client. To use Openstack client, connect
121+
to the Seed instance via SSH first. For authentication, use scp to copy
122+
``public-openrc.sh`` to the Seed
123+
115124
.. code-block:: console
116125
117-
export ANSIBLE_SERIAL=1
118-
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml --limit controllers
119-
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml --limit compute[0]
126+
# Check your hypervisor hostname
127+
(seed) openstack hypervisor list
128+
129+
# Reboot controller instances and zeroth compute instance
130+
(seed-hypervisor) export ANSIBLE_SERIAL=1
131+
(seed-hypervisor) kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml --limit controllers
132+
(seed-hypervisor) kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml --limit compute[0]
120133
121134
# Test live migration
122-
openstack server create --image cirros --flavor m1.tiny --network external --hypervisor-hostname antelope-pkg-refresh-ovs-compute-02.novalocal --os-compute-api-version 2.74 server1
123-
openstack server migrate --live-migration server1
124-
watch openstack server show server1
135+
(seed) openstack server create --image cirros --flavor m1.tiny --network external --hypervisor-hostname <Your Hypervisor Hostname> --os-compute-api-version 2.74 server1
136+
(seed) openstack server migrate --live-migration server1
137+
(seed) watch openstack server show server1
125138
126-
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml --limit compute[1]
139+
(seed-hypervisor) kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml --limit compute[1]
127140
128141
# Try and migrate back
129-
openstack server migrate --live-migration server1
130-
watch openstack server show server1
142+
(seed) openstack server migrate --live-migration server1
143+
(seed) watch openstack server show server1
131144
132145
Upgrading containers within a release
133146
-------------------------------------

doc/source/operations/upgrading-openstack.rst

Lines changed: 53 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,13 @@ to ``default``. Whilst this does not have any negative impact on services
121121
that utilise Redis it will feature prominently in any preview of the overcloud
122122
configuration.
123123

124+
AvailabilityZoneFilter removal
125+
------------------------------
126+
127+
Support for the ``AvailabilityZoneFilter`` filter has been dropped in Nova.
128+
Remove it from any Nova config files before upgrading. It will cause errors in
129+
Caracal and halt the Nova scheduler.
130+
124131
Known issues
125132
============
126133

@@ -137,6 +144,31 @@ Known issues
137144
applying package updates. This will happen automatically as a post hook when
138145
running the ``kayobe overcloud host package update`` command.
139146

147+
* After upgrading OpenSearch to the latest 2023.1 container image, we have seen
148+
cluster routing allocation be disabled on some systems. See bug for details:
149+
https://bugs.launchpad.net/kolla-ansible/+bug/2085943.
150+
This will cause the "Perform a flush" handler to fail during the 2024.1
151+
OpenSearch upgrade. To workaround this, you can run the following PUT request
152+
to enable allocation again:
153+
154+
.. code-block:: console
155+
156+
curl -X PUT "https://<kolla-vip>:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d '{ "transient" : { "cluster.routing.allocation.enable" : "all" } } '
157+
158+
* Cinder database migrations fail during the upgrade process when the
159+
``use_quota`` column is set to ``NULL``, which can be the case on deleted
160+
volumes and snapshots if OpenStack has been in operation for several
161+
releases. See `Launchpad bug 2070475
162+
<https://bugs.launchpad.net/cinder/+bug/2070475>`__ for details. Until the
163+
`database migrations are fixed
164+
<https://review.opendev.org/c/openstack/cinder/+/923635>`__, the data can be
165+
fixed with the following MySQL queries:
166+
167+
.. code-block:: mysql
168+
169+
UPDATE volumes SET use_quota = 1 WHERE use_quota IS NULL AND deleted_at IS NOT NULL;
170+
UPDATE snapshots SET use_quota = 1 WHERE use_quota IS NULL AND deleted_at IS NOT NULL;
171+
140172
Security baseline
141173
=================
142174

@@ -189,10 +221,14 @@ to 3.12, then to 3.13 on Antelope before the Caracal upgrade. This upgrade
189221
should not cause an API outage (though it should still be considered "at
190222
risk").
191223

192-
Some errors have been observed in testing when the upgrades are perfomed
224+
Some errors have been observed in testing when the upgrades are performed
193225
back-to-back. A 200s delay eliminates this issue. On particularly large or slow
194226
deployments, consider increasing this timeout.
195227

228+
Additionally errors have been observed at sites with OVS networking where after
229+
the upgrade, tenant networking is broken and requires a reset of RabbitMQ. This
230+
can be done by running the rabbitmq-reset playbook.
231+
196232
.. code-block:: bash
197233
198234
kayobe overcloud service configuration generate --node-config-dir /tmp/ignore -kt none
@@ -413,9 +449,8 @@ To upgrade the Ansible control host:
413449
Syncing Release Train artifacts
414450
-------------------------------
415451

416-
New `StackHPC Release Train <../configuration/release-train>` content should be
417-
synced to the local Pulp server. This includes host packages (Deb/RPM) and
418-
container images.
452+
New :ref:`stackhpc_release_train` content should be synced to the local Pulp
453+
server. This includes host packages (Deb/RPM) and container images.
419454

420455
.. _sync-rt-package-repos:
421456

@@ -932,17 +967,27 @@ would be applied:
932967
kayobe overcloud host configure --check --diff
933968
934969
When ready to apply the changes, it may be advisable to do so in batches, or at
935-
least start with a small number of hosts.:
970+
least start with a small number of hosts:
936971

937972
.. code-block:: console
938973
939974
kayobe overcloud host configure --limit <host>
940975
941-
Alternatively, to apply the configuration to all hosts:
942976
943-
.. code-block:: console
977+
.. warning::
978+
979+
Take extra care when configuring Ceph hosts. Set the hosts to maintenance
980+
mode before reconfiguring them, and unset when done:
981+
982+
.. code-block:: console
983+
984+
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/ceph-enter-maintenance.yml --limit <host>
985+
kayobe overcloud host configure --limit <host>
986+
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/ceph-exit-maintenance.yml --limit <host>
944987
945-
kayobe overcloud host configure
988+
**Always** reconfigure hosts in small batches or one-by-one. Check the Ceph
989+
state after each host configuration. Ensure all warnings and errors are
990+
resolved before moving on.
946991

947992
.. _building_ironic_deployment_images:
948993

etc/kayobe/ansible/deploy-os-capacity-exporter.yml

Lines changed: 53 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -15,59 +15,61 @@
1515
tags: os_capacity
1616
gather_facts: false
1717
tasks:
18-
- name: Create os-capacity directory
19-
ansible.builtin.file:
20-
path: /opt/kayobe/os-capacity/
21-
state: directory
22-
when: stackhpc_enable_os_capacity
23-
24-
- name: Read admin-openrc credential file
25-
ansible.builtin.command:
26-
cmd: cat {{ lookup('ansible.builtin.env', 'KOLLA_CONFIG_PATH') }}/admin-openrc.sh
18+
- name: Check if admin-openrc.sh exists
19+
ansible.builtin.stat:
20+
path: "{{ lookup('ansible.builtin.env', 'KOLLA_CONFIG_PATH') }}/admin-openrc.sh"
2721
delegate_to: localhost
28-
register: credential
29-
when: stackhpc_enable_os_capacity
30-
changed_when: false
22+
register: openrc_file_stat
23+
run_once: true
3124

32-
- name: Set facts for admin credentials
33-
ansible.builtin.set_fact:
34-
stackhpc_os_capacity_auth_url: "{{ credential.stdout_lines | select('match', '.*OS_AUTH_URL*.') | first | split('=') | last | replace(\"'\", '') }}"
35-
stackhpc_os_capacity_project_name: "{{ credential.stdout_lines | select('match', '.*OS_PROJECT_NAME*.') | first | split('=') | last | replace(\"'\", '') }}"
36-
stackhpc_os_capacity_domain_name: "{{ credential.stdout_lines | select('match', '.*OS_PROJECT_DOMAIN_NAME*.') | first | split('=') | last | replace(\"'\", '') }}"
37-
stackhpc_os_capacity_openstack_region_name: "{{ credential.stdout_lines | select('match', '.*OS_REGION_NAME*.') | first | split('=') | last | replace(\"'\", '') }}"
38-
stackhpc_os_capacity_username: "{{ credential.stdout_lines | select('match', '.*OS_USERNAME*.') | first | split('=') | last | replace(\"'\", '') }}"
39-
stackhpc_os_capacity_password: "{{ credential.stdout_lines | select('match', '.*OS_PASSWORD*.') | first | split('=') | last | replace(\"'\", '') }}"
40-
when: stackhpc_enable_os_capacity
25+
- block:
26+
- name: Ensure os-capacity directory exists
27+
ansible.builtin.file:
28+
path: /opt/kayobe/os-capacity/
29+
state: directory
4130

42-
- name: Template clouds.yml
43-
ansible.builtin.template:
44-
src: templates/os_capacity-clouds.yml.j2
45-
dest: /opt/kayobe/os-capacity/clouds.yaml
46-
when: stackhpc_enable_os_capacity
47-
register: clouds_yaml_result
31+
- name: Read admin-openrc credential file
32+
ansible.builtin.command:
33+
cmd: "cat {{ lookup('ansible.builtin.env', 'KOLLA_CONFIG_PATH') }}/admin-openrc.sh"
34+
delegate_to: localhost
35+
register: credential
36+
changed_when: false
4837

49-
- name: Copy CA certificate to OpenStack Capacity nodes
50-
ansible.builtin.copy:
51-
src: "{{ stackhpc_os_capacity_openstack_cacert }}"
52-
dest: /opt/kayobe/os-capacity/cacert.pem
53-
when:
54-
- stackhpc_enable_os_capacity
55-
- stackhpc_os_capacity_openstack_cacert | length > 0
56-
register: cacert_result
38+
- name: Set facts for admin credentials
39+
ansible.builtin.set_fact:
40+
stackhpc_os_capacity_auth_url: "{{ credential.stdout_lines | select('match', '.*OS_AUTH_URL*.') | first | split('=') | last | replace(\"'\",'') }}"
41+
stackhpc_os_capacity_project_name: "{{ credential.stdout_lines | select('match', '.*OS_PROJECT_NAME*.') | first | split('=') | last | replace(\"'\",'') }}"
42+
stackhpc_os_capacity_domain_name: "{{ credential.stdout_lines | select('match', '.*OS_PROJECT_DOMAIN_NAME*.') | first | split('=') | last | replace(\"'\",'') }}"
43+
stackhpc_os_capacity_openstack_region_name: "{{ credential.stdout_lines | select('match', '.*OS_REGION_NAME*.') | first | split('=') | last | replace(\"'\",'') }}"
44+
stackhpc_os_capacity_username: "{{ credential.stdout_lines | select('match', '.*OS_USERNAME*.') | first | split('=') | last | replace(\"'\",'') }}"
45+
stackhpc_os_capacity_password: "{{ credential.stdout_lines | select('match', '.*OS_PASSWORD*.') | first | split('=') | last | replace(\"'\",'') }}"
5746

58-
- name: Ensure os_capacity container is running
59-
community.docker.docker_container:
60-
name: os_capacity
61-
image: ghcr.io/stackhpc/os-capacity:master
62-
env:
63-
OS_CLOUD: openstack
64-
OS_CLIENT_CONFIG_FILE: /etc/openstack/clouds.yaml
65-
mounts:
66-
- type: bind
67-
source: /opt/kayobe/os-capacity/
68-
target: /etc/openstack/
69-
network_mode: host
70-
restart: "{{ clouds_yaml_result is changed or cacert_result is changed }}"
71-
restart_policy: unless-stopped
72-
become: true
73-
when: stackhpc_enable_os_capacity
47+
- name: Template clouds.yml
48+
ansible.builtin.template:
49+
src: templates/os_capacity-clouds.yml.j2
50+
dest: /opt/kayobe/os-capacity/clouds.yaml
51+
register: clouds_yaml_result
52+
53+
- name: Copy CA certificate to OpenStack Capacity nodes
54+
ansible.builtin.copy:
55+
src: "{{ stackhpc_os_capacity_openstack_cacert }}"
56+
dest: /opt/kayobe/os-capacity/cacert.pem
57+
when: stackhpc_os_capacity_openstack_cacert | length > 0
58+
register: cacert_result
59+
60+
- name: Ensure os_capacity container is running
61+
community.docker.docker_container:
62+
name: os_capacity
63+
image: ghcr.io/stackhpc/os-capacity:{{ stackhpc_os_capacity_version }}
64+
env:
65+
OS_CLOUD: openstack
66+
OS_CLIENT_CONFIG_FILE: /etc/openstack/clouds.yaml
67+
mounts:
68+
- type: bind
69+
source: /opt/kayobe/os-capacity/
70+
target: /etc/openstack/
71+
network_mode: host
72+
restart: "{{ clouds_yaml_result is changed or cacert_result is changed }}"
73+
restart_policy: unless-stopped
74+
become: true
75+
when: stackhpc_enable_os_capacity and openrc_file_stat.stat.exists

0 commit comments

Comments
 (0)