Skip to content

Commit 9c2469f

Browse files
committed
update alertmanager comments
1 parent ce9ed5b commit 9c2469f

File tree

4 files changed

+8
-12
lines changed

4 files changed

+8
-12
lines changed

ansible/roles/alertmanager/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,8 @@ Alertmanager is enabled by default on the `control` node in the
2121
In general usage may only require:
2222
- Adding the `control` node into the `alertmanager` group in `environments/site/groups`
2323
if upgrading an existing environment.
24-
- Enabling the Slack integration (see below).
24+
- Enabling the Slack integration (see section below).
25+
- Possibly setting `alertmanager_web_external_url`.
2526

2627
## Role variables
2728

ansible/roles/alertmanager/defaults/main.yml

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,11 @@ alertmanager_web_listen_addresses:
1515
- ":{{ alertmanager_port }}"
1616
alertmanager_web_external_url: "http://localhost:{{ alertmanager_port}}/" # TODO: is this right??
1717

18-
alertmanager_data_retention: '120h' # --data.retention # How long to keep data for
19-
alertmanager_data_maintenance_interval: '15m' # --data.maintenance-interval: Interval between garbage collection and snapshotting to disk of the silences and the notification logs
18+
alertmanager_data_retention: '120h'
19+
alertmanager_data_maintenance_interval: '15m'
2020
alertmanager_config_flags: {} # other command-line parameters as shown by `man alertmanager`
21-
# TODO: data retention?
2221
alertmanager_config_template: alertmanager.yml.j2
2322

24-
2523
# everything below here is interpolated into alertmanager_config_default:
2624

2725
# Uncomment below and add Slack bot app creds for Slack integration
@@ -43,7 +41,3 @@ alertmanager_config_default:
4341
receivers: "{{ alertmanager_receivers }}"
4442

4543
alertmanager_config_extra: {} # top-level only
46-
47-
48-
# TODO: routes??
49-
# TODO: see PR with additional alerts

ansible/roles/alertmanager/tasks/configure.yml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,6 @@
99
- "{{ alertmanager_config_file | dirname }}"
1010
- "{{ alertmanager_storage_path }}"
1111

12-
# TODO: selinux?
13-
1412
- name: Create alertmanager service file with immutable options
1513
template:
1614
src: alertmanager.service.j2
@@ -20,7 +18,7 @@
2018
mode: u=rw,go=r
2119
register: _alertmanager_service
2220
notify: Restart alertmanager
23-
# TODO: how do we cope with the binary changing?
21+
2422

2523
- name: Template alertmanager config
2624
ansible.builtin.template:

environments/common/files/prometheus/rules/slurm.rules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,6 @@ groups:
1414
description: '{{ $value }} Slurm nodes are in fail status'
1515
summary: 'At least one Slurm node is failed.'
1616
expr: "slurm_nodes_fail > 0\n"
17+
18+
# TODO: alert on slurm_scheduler_dbd_queue_size - see vpenso exporter, man sdiag, and MaxDBDMsgs
19+
# but node its dynamic

0 commit comments

Comments
 (0)