Skip to content

zed: yoga merge #1094

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 15 additions & 3 deletions doc/source/configuration/monitoring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,8 @@ depending on your configuration, you may need set the
``kolla_enable_prometheus_ceph_mgr_exporter`` variable to ``true`` in order to
enable the ceph mgr exporter.

.. _os-capacity:

OpenStack Capacity
==================

Expand All @@ -160,9 +162,19 @@ project domain name in ``stackhpc-monitoring.yml``:
stackhpc_os_capacity_openstack_region_name: <openstack_region_name>

Additionally, you should ensure these credentials have the correct permissions
for the exporter. If you are deploying in a cloud with internal TLS, you may be required
to disable certificate verification for the OpenStack Capacity exporter
if your certificate is not signed by a trusted CA.
for the exporter.

If you are deploying in a cloud with internal TLS, you may be required
to provide a CA certificate for the OpenStack Capacity exporter if your
certificate is not signed by a trusted CA. For example, to use a CA certificate
named ``vault.crt`` that is also added to the Kolla containers:

.. code-block:: yaml

stackhpc_os_capacity_openstack_cacert: "{{ kayobe_env_config_path }}/kolla/certificates/ca/vault.crt"

Alternatively, to disable certificate verification for the OpenStack Capacity
exporter:

.. code-block:: yaml

Expand Down
27 changes: 27 additions & 0 deletions doc/source/configuration/release-train.rst
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,33 @@ By default, HashiCorp images (Consul and Vault) are not synced from Docker Hub
to the local Pulp. To sync these images, set ``stackhpc_sync_hashicorp_images``
to ``true``.

Custom container images
-----------------------

A custom list of container images can be synced to the local Pulp using the
``stackhpc_pulp_repository_container_repos_extra`` and
``stackhpc_pulp_distribution_container_extra`` variables.

.. code-block:: yaml

# List of extra container image repositories.
stackhpc_pulp_repository_container_repos_extra:
- name: "certbot/certbot"
url: "https://registry-1.docker.io"
policy: on_demand
proxy_url: "{{ pulp_proxy_url }}"
state: present
include_tags: "nightly"
required: True

# List of extra container image distributions.
stackhpc_pulp_distribution_container_extra:
- name: certbot
repository: certbot/certbot
base_path: certbot/certbot
state: present
required: True

Usage
=====

Expand Down
2 changes: 2 additions & 0 deletions doc/source/configuration/vault.rst
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,8 @@ Enable the required TLS variables in kayobe and kolla
# Whether TLS is enabled for the internal API endpoints. Default is 'no'.
kolla_enable_tls_internal: yes

See :ref:`os-capacity` for information on adding CA certificates to the trust store when deploying the OpenStack Capacity exporter.

2. Set the following in etc/kayobe/kolla/globals.yml or if environments are being used etc/kayobe/environments/$KAYOBE_ENVIRONMENT/kolla/globals.yml

.. code-block::
Expand Down
2 changes: 1 addition & 1 deletion doc/source/operations/secret-rotation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ process easier.

This was previously mitigated with a change to the StackHPC fork of
Kolla-Ansible, which has since been reverted due to an unforeseen issue. See
`here <https://github.com/stackhpc/kolla-ansible/pull/503>` for more
`here <https://github.com/stackhpc/kolla-ansible/pull/503>`__ for more
details.

#. A change to Nova, to automate :ref:`this<nova-change>` step to change the
Expand Down
12 changes: 12 additions & 0 deletions etc/kayobe/ansible/deploy-os-capacity-exporter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
delegate_to: localhost
register: credential
when: stackhpc_enable_os_capacity
changed_when: false

- name: Set facts for admin credentials
ansible.builtin.set_fact:
Expand All @@ -43,6 +44,16 @@
src: templates/os_capacity-clouds.yml.j2
dest: /opt/kayobe/os-capacity/clouds.yaml
when: stackhpc_enable_os_capacity
register: clouds_yaml_result

- name: Copy CA certificate to OpenStack Capacity nodes
ansible.builtin.copy:
src: "{{ stackhpc_os_capacity_openstack_cacert }}"
dest: /opt/kayobe/os-capacity/cacert.pem
when:
- stackhpc_enable_os_capacity
- stackhpc_os_capacity_openstack_cacert | length > 0
register: cacert_result

- name: Ensure os_capacity container is running
community.docker.docker_container:
Expand All @@ -56,6 +67,7 @@
source: /opt/kayobe/os-capacity/
target: /etc/openstack/
network_mode: host
restart: "{{ clouds_yaml_result is changed or cacert_result is changed }}"
restart_policy: unless-stopped
become: true
when: stackhpc_enable_os_capacity
3 changes: 3 additions & 0 deletions etc/kayobe/ansible/templates/os_capacity-clouds.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ clouds:
interface: "internal"
identity_api_version: 3
auth_type: "password"
{% if stackhpc_os_capacity_openstack_cacert | length > 0 %}
cacert: /etc/openstack/cacert.pem
{% endif %}
{% if not stackhpc_os_capacity_openstack_verify | bool %}
verify: False
{% endif %}
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
---
# Path to a CA certificate file to trust in the OpenStack Capacity exporter.
stackhpc_os_capacity_openstack_cacert: "{{ kayobe_env_config_path }}/kolla/certificates/ca/vault.crt"
1 change: 1 addition & 0 deletions etc/kayobe/kolla.yml
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,7 @@ kolla_build_blocks:
ARG prometheus_msteams_sha256sum=0f4df9ee31e655d1ec876ea2c53ab5ae5b07143ef21b9190e61b4d52839e135c
ARG prometheus_msteams_url=https://github.com/prometheus-msteams/prometheus-msteams/releases/download/v${prometheus_msteams_version}/prometheus-msteams-linux-{{debian_arch}}
{% endraw %}

# Dict mapping image customization variable names to their values.
# Each variable takes the form:
# <image name>_<customization>_<operation>
Expand Down
18 changes: 18 additions & 0 deletions etc/kayobe/kolla/config/prometheus/system.rules
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,24 @@ groups:
summary: "Prometheus exporter at {{ $labels.instance }} reports low memory"
description: "Available memory is {{ $value }} GiB."

- alert: LowSwapSpace
expr: (node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes) < {% endraw %}{{ alertmanager_node_free_swap_warning_threshold_ratio }}{% raw %}
for: 1m
labels:
severity: warning
annotations:
summary: "Swap space at {{ $labels.instance }} reports low memory"
description: "Available swap space is {{ $value | humanizePercentage }}. Running out of swap space causes OOM Kills."

- alert: LowSwapSpace
expr: (node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes) < {% endraw %}{{ alertmanager_node_free_swap_critical_threshold_ratio }}{% raw %}
for: 1m
labels:
severity: critical
annotations:
summary: "Swap space at {{ $labels.instance }} reports low memory"
description: "Available swap space is {{ $value | humanizePercentage }}. Running out of swap space causes OOM Kills."

- alert: HostOomKillDetected
expr: increase(node_vmstat_oom_kill[5m]) > 0
for: 5m
Expand Down
12 changes: 10 additions & 2 deletions etc/kayobe/pulp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -651,14 +651,22 @@ stackhpc_pulp_distribution_container_hashicorp:
state: present
required: "{{ stackhpc_sync_hashicorp_images | bool }}"

# List of extra container image repositories.
stackhpc_pulp_repository_container_repos_extra: []

# List of extra container image distributions.
stackhpc_pulp_distribution_container_extra: []

# List of container image repositories.
stackhpc_pulp_repository_container_repos: >-
{{ (stackhpc_pulp_repository_container_repos_kolla +
stackhpc_pulp_repository_container_repos_ceph +
stackhpc_pulp_repository_container_repos_hashicorp) | selectattr('required') }}
stackhpc_pulp_repository_container_repos_hashicorp +
stackhpc_pulp_repository_container_repos_extra) | selectattr('required') }}

# List of container image distributions.
stackhpc_pulp_distribution_container: >-
{{ (stackhpc_pulp_distribution_container_kolla +
stackhpc_pulp_distribution_container_ceph +
stackhpc_pulp_distribution_container_hashicorp) | selectattr('required') }}
stackhpc_pulp_distribution_container_hashicorp +
stackhpc_pulp_distribution_container_extra) | selectattr('required') }}
9 changes: 9 additions & 0 deletions etc/kayobe/stackhpc-monitoring.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,12 @@ alertmanager_low_memory_threshold_gib: 5
# link. Change to false to disable this alert.
alertmanager_warn_network_bond_single_link: true

# Threshold to trigger an LowSwapSpace alert on swap space depletion (ratio).
# When the ratio of free swap space is lower than each of these values, warning
# and critical alerts will be triggered respectively.
alertmanager_node_free_swap_warning_threshold_ratio: 0.25
alertmanager_node_free_swap_critical_threshold_ratio: 0.1

###############################################################################
# Exporter configuration

Expand All @@ -20,6 +26,9 @@ alertmanager_warn_network_bond_single_link: true
# targets being templated during deployment.
stackhpc_enable_os_capacity: true

# Path to a CA certificate file to trust in the OpenStack Capacity exporter.
stackhpc_os_capacity_openstack_cacert: ""

# Whether TLS certificate verification is enabled for the OpenStack Capacity
# exporter during Keystone authentication.
stackhpc_os_capacity_openstack_verify: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
features:
- |
Added two alerts (Warning and critical) that are triggered when the ratio
of (free_swap_sppace / total_swap_space) is below thresholds.
Each threshold can be modified by alterting value of
``alertmanager_node_free_swap_warning_threshold_ratio`` and
``alertmanager_node_free_swap_critical_threshold_ratio``.

Currently this solution has limitation of having one-size fits all policy.
This can cause unwanted alerts for the hosts which utilise swap heavily
Therefore it is recommended to tune the thresholds or apply silence rules
for the needs.
4 changes: 4 additions & 0 deletions releasenotes/notes/os-capacity-cacert-8b800b22d84ae0b1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
features:
- |
Adds support for providing a CA certificate for OpenStack Capacity exporter.
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
features:
- |
Allows to synchronise a custom list of containers to Pulp using the
``stackhpc_pulp_repository_container_repos_extra`` and
``stackhpc_pulp_distribution_container_extra`` variables.
Loading