Skip to content

Commit 7e96cb3

Browse files
committed
Merge stackhpc/yoga into stackhpc/zed
2 parents e54a5f4 + 9a5cc9e commit 7e96cb3

14 files changed

+124
-6
lines changed

doc/source/configuration/monitoring.rst

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,8 @@ depending on your configuration, you may need set the
137137
``kolla_enable_prometheus_ceph_mgr_exporter`` variable to ``true`` in order to
138138
enable the ceph mgr exporter.
139139

140+
.. _os-capacity:
141+
140142
OpenStack Capacity
141143
==================
142144

@@ -160,9 +162,19 @@ project domain name in ``stackhpc-monitoring.yml``:
160162
stackhpc_os_capacity_openstack_region_name: <openstack_region_name>
161163
162164
Additionally, you should ensure these credentials have the correct permissions
163-
for the exporter. If you are deploying in a cloud with internal TLS, you may be required
164-
to disable certificate verification for the OpenStack Capacity exporter
165-
if your certificate is not signed by a trusted CA.
165+
for the exporter.
166+
167+
If you are deploying in a cloud with internal TLS, you may be required
168+
to provide a CA certificate for the OpenStack Capacity exporter if your
169+
certificate is not signed by a trusted CA. For example, to use a CA certificate
170+
named ``vault.crt`` that is also added to the Kolla containers:
171+
172+
.. code-block:: yaml
173+
174+
stackhpc_os_capacity_openstack_cacert: "{{ kayobe_env_config_path }}/kolla/certificates/ca/vault.crt"
175+
176+
Alternatively, to disable certificate verification for the OpenStack Capacity
177+
exporter:
166178

167179
.. code-block:: yaml
168180

doc/source/configuration/release-train.rst

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,33 @@ By default, HashiCorp images (Consul and Vault) are not synced from Docker Hub
147147
to the local Pulp. To sync these images, set ``stackhpc_sync_hashicorp_images``
148148
to ``true``.
149149

150+
Custom container images
151+
-----------------------
152+
153+
A custom list of container images can be synced to the local Pulp using the
154+
``stackhpc_pulp_repository_container_repos_extra`` and
155+
``stackhpc_pulp_distribution_container_extra`` variables.
156+
157+
.. code-block:: yaml
158+
159+
# List of extra container image repositories.
160+
stackhpc_pulp_repository_container_repos_extra:
161+
- name: "certbot/certbot"
162+
url: "https://registry-1.docker.io"
163+
policy: on_demand
164+
proxy_url: "{{ pulp_proxy_url }}"
165+
state: present
166+
include_tags: "nightly"
167+
required: True
168+
169+
# List of extra container image distributions.
170+
stackhpc_pulp_distribution_container_extra:
171+
- name: certbot
172+
repository: certbot/certbot
173+
base_path: certbot/certbot
174+
state: present
175+
required: True
176+
150177
Usage
151178
=====
152179

doc/source/configuration/vault.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,8 @@ Enable the required TLS variables in kayobe and kolla
196196
# Whether TLS is enabled for the internal API endpoints. Default is 'no'.
197197
kolla_enable_tls_internal: yes
198198
199+
See :ref:`os-capacity` for information on adding CA certificates to the trust store when deploying the OpenStack Capacity exporter.
200+
199201
2. Set the following in etc/kayobe/kolla/globals.yml or if environments are being used etc/kayobe/environments/$KAYOBE_ENVIRONMENT/kolla/globals.yml
200202

201203
.. code-block::

doc/source/operations/secret-rotation.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ process easier.
4646

4747
This was previously mitigated with a change to the StackHPC fork of
4848
Kolla-Ansible, which has since been reverted due to an unforeseen issue. See
49-
`here <https://github.com/stackhpc/kolla-ansible/pull/503>` for more
49+
`here <https://github.com/stackhpc/kolla-ansible/pull/503>`__ for more
5050
details.
5151

5252
#. A change to Nova, to automate :ref:`this<nova-change>` step to change the

etc/kayobe/ansible/deploy-os-capacity-exporter.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
delegate_to: localhost
2828
register: credential
2929
when: stackhpc_enable_os_capacity
30+
changed_when: false
3031

3132
- name: Set facts for admin credentials
3233
ansible.builtin.set_fact:
@@ -43,6 +44,16 @@
4344
src: templates/os_capacity-clouds.yml.j2
4445
dest: /opt/kayobe/os-capacity/clouds.yaml
4546
when: stackhpc_enable_os_capacity
47+
register: clouds_yaml_result
48+
49+
- name: Copy CA certificate to OpenStack Capacity nodes
50+
ansible.builtin.copy:
51+
src: "{{ stackhpc_os_capacity_openstack_cacert }}"
52+
dest: /opt/kayobe/os-capacity/cacert.pem
53+
when:
54+
- stackhpc_enable_os_capacity
55+
- stackhpc_os_capacity_openstack_cacert | length > 0
56+
register: cacert_result
4657

4758
- name: Ensure os_capacity container is running
4859
community.docker.docker_container:
@@ -56,6 +67,7 @@
5667
source: /opt/kayobe/os-capacity/
5768
target: /etc/openstack/
5869
network_mode: host
70+
restart: "{{ clouds_yaml_result is changed or cacert_result is changed }}"
5971
restart_policy: unless-stopped
6072
become: true
6173
when: stackhpc_enable_os_capacity

etc/kayobe/ansible/templates/os_capacity-clouds.yml.j2

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@ clouds:
1010
interface: "internal"
1111
identity_api_version: 3
1212
auth_type: "password"
13+
{% if stackhpc_os_capacity_openstack_cacert | length > 0 %}
14+
cacert: /etc/openstack/cacert.pem
15+
{% endif %}
1316
{% if not stackhpc_os_capacity_openstack_verify | bool %}
1417
verify: False
1518
{% endif %}
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
---
2+
# Path to a CA certificate file to trust in the OpenStack Capacity exporter.
3+
stackhpc_os_capacity_openstack_cacert: "{{ kayobe_env_config_path }}/kolla/certificates/ca/vault.crt"

etc/kayobe/kolla.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -339,6 +339,7 @@ kolla_build_blocks:
339339
ARG prometheus_msteams_sha256sum=0f4df9ee31e655d1ec876ea2c53ab5ae5b07143ef21b9190e61b4d52839e135c
340340
ARG prometheus_msteams_url=https://github.com/prometheus-msteams/prometheus-msteams/releases/download/v${prometheus_msteams_version}/prometheus-msteams-linux-{{debian_arch}}
341341
{% endraw %}
342+
342343
# Dict mapping image customization variable names to their values.
343344
# Each variable takes the form:
344345
# <image name>_<customization>_<operation>

etc/kayobe/kolla/config/prometheus/system.rules

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,24 @@ groups:
2424
summary: "Prometheus exporter at {{ $labels.instance }} reports low memory"
2525
description: "Available memory is {{ $value }} GiB."
2626

27+
- alert: LowSwapSpace
28+
expr: (node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes) < {% endraw %}{{ alertmanager_node_free_swap_warning_threshold_ratio }}{% raw %}
29+
for: 1m
30+
labels:
31+
severity: warning
32+
annotations:
33+
summary: "Swap space at {{ $labels.instance }} reports low memory"
34+
description: "Available swap space is {{ $value | humanizePercentage }}. Running out of swap space causes OOM Kills."
35+
36+
- alert: LowSwapSpace
37+
expr: (node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes) < {% endraw %}{{ alertmanager_node_free_swap_critical_threshold_ratio }}{% raw %}
38+
for: 1m
39+
labels:
40+
severity: critical
41+
annotations:
42+
summary: "Swap space at {{ $labels.instance }} reports low memory"
43+
description: "Available swap space is {{ $value | humanizePercentage }}. Running out of swap space causes OOM Kills."
44+
2745
- alert: HostOomKillDetected
2846
expr: increase(node_vmstat_oom_kill[5m]) > 0
2947
for: 5m

etc/kayobe/pulp.yml

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -651,14 +651,22 @@ stackhpc_pulp_distribution_container_hashicorp:
651651
state: present
652652
required: "{{ stackhpc_sync_hashicorp_images | bool }}"
653653

654+
# List of extra container image repositories.
655+
stackhpc_pulp_repository_container_repos_extra: []
656+
657+
# List of extra container image distributions.
658+
stackhpc_pulp_distribution_container_extra: []
659+
654660
# List of container image repositories.
655661
stackhpc_pulp_repository_container_repos: >-
656662
{{ (stackhpc_pulp_repository_container_repos_kolla +
657663
stackhpc_pulp_repository_container_repos_ceph +
658-
stackhpc_pulp_repository_container_repos_hashicorp) | selectattr('required') }}
664+
stackhpc_pulp_repository_container_repos_hashicorp +
665+
stackhpc_pulp_repository_container_repos_extra) | selectattr('required') }}
659666
660667
# List of container image distributions.
661668
stackhpc_pulp_distribution_container: >-
662669
{{ (stackhpc_pulp_distribution_container_kolla +
663670
stackhpc_pulp_distribution_container_ceph +
664-
stackhpc_pulp_distribution_container_hashicorp) | selectattr('required') }}
671+
stackhpc_pulp_distribution_container_hashicorp +
672+
stackhpc_pulp_distribution_container_extra) | selectattr('required') }}

etc/kayobe/stackhpc-monitoring.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,12 @@ alertmanager_low_memory_threshold_gib: 5
1212
# link. Change to false to disable this alert.
1313
alertmanager_warn_network_bond_single_link: true
1414

15+
# Threshold to trigger an LowSwapSpace alert on swap space depletion (ratio).
16+
# When the ratio of free swap space is lower than each of these values, warning
17+
# and critical alerts will be triggered respectively.
18+
alertmanager_node_free_swap_warning_threshold_ratio: 0.25
19+
alertmanager_node_free_swap_critical_threshold_ratio: 0.1
20+
1521
###############################################################################
1622
# Exporter configuration
1723

@@ -20,6 +26,9 @@ alertmanager_warn_network_bond_single_link: true
2026
# targets being templated during deployment.
2127
stackhpc_enable_os_capacity: true
2228

29+
# Path to a CA certificate file to trust in the OpenStack Capacity exporter.
30+
stackhpc_os_capacity_openstack_cacert: ""
31+
2332
# Whether TLS certificate verification is enabled for the OpenStack Capacity
2433
# exporter during Keystone authentication.
2534
stackhpc_os_capacity_openstack_verify: true
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
---
2+
features:
3+
- |
4+
Added two alerts (Warning and critical) that are triggered when the ratio
5+
of (free_swap_sppace / total_swap_space) is below thresholds.
6+
Each threshold can be modified by alterting value of
7+
``alertmanager_node_free_swap_warning_threshold_ratio`` and
8+
``alertmanager_node_free_swap_critical_threshold_ratio``.
9+
10+
Currently this solution has limitation of having one-size fits all policy.
11+
This can cause unwanted alerts for the hosts which utilise swap heavily
12+
Therefore it is recommended to tune the thresholds or apply silence rules
13+
for the needs.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
features:
3+
- |
4+
Adds support for providing a CA certificate for OpenStack Capacity exporter.
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
features:
3+
- |
4+
Allows to synchronise a custom list of containers to Pulp using the
5+
``stackhpc_pulp_repository_container_repos_extra`` and
6+
``stackhpc_pulp_distribution_container_extra`` variables.

0 commit comments

Comments
 (0)