Skip to content

2024.1: 2023.1 merge #1052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 234 commits into from
Apr 25, 2024
Merged

2024.1: 2023.1 merge #1052

merged 234 commits into from
Apr 25, 2024

Conversation

markgoddard
Copy link
Contributor

  • kayobe-env: Unstick KOLLA_SOURCE_PATH and KOLLA_VENV_PATH
  • Fix OSD summary pie chart
  • Add a note about network interfaces changes
  • Synchronise with kayobe stable/yoga
  • Bump seed Pulp to 3.43.1
  • Skip docker registry login by default, login only when pulp is deployed
  • Update .gitreview for unmaintained/yoga
  • Bump kolla image tags for ubuntu jammy 2023.1 except for neutron
  • Deprecate network configuration in environments
  • Upgrade neutron kolla tag but ovs is not included
  • CI: Fail aio job when Terraform reaches max attempts
  • Reduce Elasticsearch/OpenSearch heap size to 200m in ci-aio env
  • Disable Heat in ci-aio env
  • Reduce aio flavor to en1.medium
  • CI: Add tags to aio VMs
  • CI: Add a workflow to clean up stale instances
  • CI: Use skc-ci-aio user for aio jobs
  • Ubuntu Focal to Jammy migration support (Ubuntu Focal to Jammy migration support #902)
  • Stop warning about invalid group name characters
  • Fix growroot playbook for NVMe devices
  • CI: Clean up instances in BUILD state, list by ID
  • CI: Use correct URL for upper constraints
  • bumping ansible vault
  • docs: Update network interface note to mention group_vars
  • Enable AiO jobs to be cancelled even if they're underway.
  • CI: Add cancellation support to check-tags job
  • Adds alerts for software raid failures (Adds alerts for software raid failures #935)
  • docs: guide for migrating to containerized libvirt in R8/R9 migration
  • Remove Ubuntu Jammy upgrade release note
  • Various os_capacity fixes
  • Update Kolla container images for Ubuntu Jammy Zed (Update Kolla container images for Ubuntu Jammy Zed #904)
  • Ensure cron service is started for smartmon
  • Bump RL9 host image to RL9.3 (Bump RL9 host image to RL9.3 #897)
  • Add Ubuntu Jammy upgrade doc
  • Update neutron tag to include OVS images
  • Add missing haproxy and letsencrypt images
  • Set kolla_build_neutron_ovs to true if regex empty
  • Add missing quotes
  • Change the variable definition location
  • Fix query for top Ceph pools by capacity used
  • Default to RegionOne for os-capacity
  • Moving variable from play to group_vars This lets us override in case one of the machines have different VG name Otherwise play variables take priority.
  • Document update of IPA kernel URL
  • Override kolla_base_distro_version
  • Add docs page for running Tempest
  • Update etc/kayobe/kolla/globals.yml
  • Bump magnum-capi-helm version to latest
  • Bump Magnum
  • bump magnum tag
  • reno
  • Fix os-capacity playbook crash on delegate_to
  • Improved AIO deployment script
  • magnum_tag
  • docs: Update microversions in tempest.conf example for Zed
  • Update magnum container images for Zed
  • docs: Update microversions in tempest.conf example for 2023.1
  • Update magnum container images for 2023.1
  • Correctly map kolla_base_distro_version
  • Replace references to CentOS with Rocky Linux
  • docs: Add BASE_IMAGE build-arg for kayobe image build
  • Fix libvirt error for tenks on Rocky Linux 9
  • Update etc/kayobe/environments/aufn-ceph/a-universe-from-nothing.sh
  • Switch ansible-modules-hashivault back to upstream
  • Switch ansible-modules-hashivault back to upstream
  • Remove kolla-limit from host configure example
  • Fix certificate path for os-capacity haproxy in Antelope
  • Update upgrading docs to include Opensearch issue
  • Move release note to correct place
  • Add Nova Compute Ironic failover procedure
  • Post service deploy hook for OpenStack Capacity
  • squash: Address comments from Alex
  • Expand notes on re-deploying
  • Add Trivy image scanning (Add Trivy image scanning #436)
  • bump magnum-capi-helm version
  • Add note about upstream bug
  • reno
  • Fix Ceph "Objects in the Cluster" dashboard panel
  • Fix tempest doc long line
  • CI: Support unmaintained branches in release determination
  • Expand Magnum Cluster API docs (Expand Magnum Cluster API docs #972)
  • Fix broken link
  • Fix Jinja templating in Barbican Vault config
  • bump magnum-capi-helm version
  • Bump container image tags
  • Use StackHPC downstream requirements fork
  • Add missing grafana plugins from upstream kolla
  • bump tag
  • Bump tags for grafana
  • Add release note
  • Fix releasenote location
  • hotfix: Fix setting containers_list and running without a command
  • hotfix: Fix failure message
  • Run OVN playbook without limit during upgrade
  • Merge pull request Use StackHPC downstream requirements fork #981 from stackhpc/use-fork-requirements
  • feature reno
  • Update cephadm collection version
  • Rebuild heat images with yaql 3.0.0 for 2023.1
  • Rebuild heat images with yaql 3.0.0 for zed
  • Rebuild heat images with yaql 3.0.0 for yoga
  • Update Magnum CAPI Helm driver version (Update Magnum CAPI Helm driver version #1007)
  • Fail on any unparsed Ansible inventory
  • Update docs to reflect upstream Magnum driver changes (Update docs for Magnum CAPI driver config #1000)
  • docs: Add an upgrade doc note about Glance show_multiple_locations
  • Fix host image builds on Arc runners
  • Fix AIO connectivity loss in automated script
  • Fix AIO deploy script
  • ci-multinode: Use skc-ci-aio user for ci-multinode env
  • ci-multinode: Use Ark package repositories to install packages
  • ci-multinode: Allow rebooting for SELinux state
  • ci-multinode: Add API FQDNs to /etc/hosts in fix-networking.yml
  • ci-multinode: Wait for connection in fix-networking.yml
  • ci-multinode: Use qemu virtualisation
  • ci-multinode: Set default Ceph release to Quincy on Rocky Linux 9
  • os_capacity: Add tags to playbook, update vault docs
  • Update Magnum driver from v0.12.0 to v0.13.0
  • Revert "docs: Add an upgrade doc note about Glance show_multiple_locations"
  • Update Magnum image tags
  • Fix tox whitespace warning
  • Add release note
  • Add retries to overcloud host image pulp tasks
  • Update Ubuntu horizon tag to fix CVE-2023-31122
  • Raise alert on degraded network bonds
  • docs: Remove prometheus and grafana config symlinks
  • docs: Add more context and links to vault docs
  • Magnum - removed appending to ca.cart
  • Add alert to detect bonds with a single link
  • add playbook with workaround for 'tc mirred to Houston'
  • Correct backup for seed images in RL9 migration

markgoddard and others added 30 commits September 21, 2023 17:53
The kayobe-env script does not update the KOLLA_SOURCE_PATH and
KOLLA_VENV_PATH variables if they are already set.  This can lead to
dangerous and difficult to diagnose issues where Kayobe uses a different
version of Kolla Ansible than expected.

This change updates these variables each time the kayobe-env script is
sourced.

Change-Id: I3b4b0b611750b9c7846ff5f74554aee2f14939e4
Closes-Bug: #2036711
(cherry picked from commit 651b8be)
Synchronise with kayobe @ 32b12be953d0c2f60970a95b82d0ebf846d0f86f.

Change-Id: Ib4b75c084820f16ce245c1e1c50f10ccb2083537
Change-Id: Ibd47c684580d35f58b0b715eab2e5110d17bb69a
Previously the script exited 0 at the end of the loop, then failed in a later step.
CI: Fail aio job when Terraform reaches max attempts
This should free up some memory, allowing us to use a smaller flavor.
This should free up some memory, allowing us to use a smaller flavor.
This flavor has 8G memory and is cheaper than en1.large.
This will make it easier to identify them.
Sometimes we can end up with aio instances left running indefinitely.
This can lead to unnecessary cloud costs. This change adds a periodic
workflow that runs every 2 hours and deletes instances with the
skc-ci-aio tag that are over 3 hours old.
This user only has read-only access to the package and container
repositories, so is safer than using the release-train-ci user which has
read/write permissions.

For the container image build job we can use the skc-ci-aio user to
access the package repositories, but must use the release-train-ci user
to push container images.
Add playbook to upgrade ubuntu focal hosts


Co-authored-by: Alex-Welsh <[email protected]>
This should silence the following Ansible warning [1]:

    [WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details

[1] https://docs.ansible.com/ansible/latest/reference_appendices/config.html#transform-invalid-group-chars
The disk_tmp variable uses a device path rather than a device name.
This URL should not be affected by branch names etc.
CI: Reduce aio flavor to en1.medium (8G)
Stop warning about invalid group name characters
CI: Add tags to aio VMs, periodically clean up stale aio instances (yoga)
Alex-Welsh and others added 26 commits April 15, 2024 10:57
Update Magnum driver from v0.12.0 to v0.13.0
Retries have been added to the stackhpc.pulp collection to improve
reliability. Adding the same here.
Add retries to overcloud host image pulp tasks
This will raise a alert when at least one of the bond members is down.
Adapted from awesome-prometheus-alerts [1].

[1] https://samber.github.io/awesome-prometheus-alerts/rules.html#rule-host-and-hardware-1-34
Raise alert on degraded network bonds
These are no longer necessary due to support for kayobe multiple
environment merging being backported to Antelope.
docs: Remove prometheus and grafana config symlinks
docs: Add more context and links to vault docs
Appending to ca.crt in make-cert-client.sh (introduced in #724203) causes
multiple identical ca certs being added into /etc/kubernetes/certs/ca.crt
which prevents kube-controller-manager from starting
Magnum - removed appending to ca.cart
This change adds a new Prometheus alert HostNetworkBondSingleLink which
will be raised when a bond is configured with only one member. This can
happen when NetworkManager detects that a bond member is down at boot
time. This would fail to be detected by the HostNetworkBondDegraded
alert.
Add alert to detect bonds with a single link
add playbook with workaround for 'tc mirred to Houston'
Current instructions have a recursive copy:

``cp: cannot copy a directory, '/var/lib/libvirt/images', into itself, '/var/lib/libvirt/images/backup/images'``
@markgoddard markgoddard requested a review from a team as a code owner April 25, 2024 11:02
@markgoddard markgoddard self-assigned this Apr 25, 2024
@markgoddard markgoddard merged commit ef27027 into stackhpc/2024.1 Apr 25, 2024
@markgoddard markgoddard deleted the 2024.1-2023.1-merge branch April 25, 2024 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.