Skip to content

Commit 9cd31f2

Browse files
committed
Reword Health Checks intro paragraph + others
* mention exit code when ping succeeds * add rabbitmqctl eval temporary alternative for Stage 5 * make 0 stand out more, the current font makes it look like a capital O @michaelklishin I didn't deploy this, letting you reword or revert
1 parent b32aa6c commit 9cd31f2

File tree

1 file changed

+22
-16
lines changed

1 file changed

+22
-16
lines changed

site/monitoring.md

Lines changed: 22 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -420,18 +420,20 @@ to most systems:
420420

421421
## <a id="health-checks" class="anchor" href="#health-checks">Health Checks</a>
422422

423-
A health check is a [periodically executed](#monitoring-frequency) command
424-
or set of commands that collect a few essential metrics of a RabbitMQ node or cluster.
425-
Just like with human or veterinary health checks, there's a variety of checks that
426-
can be performed and some are more intrusive than others. Different checks also have
427-
a different probability of reporting [false positives](https://en.wikipedia.org/wiki/False_positives_and_false_negatives)
428-
(a scenario when a node is reported as unhealthy even when it is actually healthy).
429-
430-
Health checks therefore should be thought of as a range of options, starting with the most
431-
basic and virtually never producing false positives to increasingly more comprehensive,
432-
intrusive, and opinionated checks that have a probability of false positives that should be
433-
taken into account. Health checks can verify the state of an individual node or the entire cluster. The former
434-
kind is known as node health checks and the latter as cluster health checks.
423+
A health check is a [periodically executed](#monitoring-frequency) command that
424+
tries to determine whether an aspect of the RabbitMQ service is operating
425+
normally.
426+
427+
There is a series of health checks that can be performed, starting
428+
with the most basic and virtually never producing [false
429+
positives](https://en.wikipedia.org/wiki/False_positives_and_false_negatives),
430+
to increasingly more comprehensive, intrusive, and opinionated that have a
431+
higher probability of false positives. In other words, the more comprehensive a
432+
health check is, the less conclusive the result will be.
433+
434+
Health checks can verify the state of an
435+
individual node (node health checks), or the entire cluster (cluster health
436+
checks).
435437

436438
### <a id="individual-checks" class="anchor" href="#individual-checks">Individual Node Checks</a>
437439

@@ -453,14 +455,14 @@ The most basic check ensures that the runtime is running
453455
and (indirectly) that CLI tools can authenticate to it.
454456

455457
Except for the CLI tool authentication
456-
part, the probability of false positives can be considered approaching 0
458+
part, the probability of false positives can be considered approaching `0`
457459
except for upgrades and maintenance windows.
458460

459461
[`rabbitmq-diganostics ping`](/rabbitmq-diagnostics.8.html) performs this check:
460462

461463
<pre class="lang-bash">
462464
rabbitmq-diagnostics ping -q
463-
# =&gt; Ping succeeded
465+
# =&gt; Ping succeeded if exit code is 0
464466
</pre>
465467

466468
#### Stage 2
@@ -477,7 +479,7 @@ rabbitmq-diagnostics -q status
477479
</pre>
478480

479481
This is a common way of sanity checking a node.
480-
The probability of false positives can be considered approaching 0
482+
The probability of false positives can be considered approaching `0`
481483
except for upgrades and maintenance windows.
482484

483485
#### Stage 3
@@ -610,7 +612,11 @@ maintenance windows can raise significantly.
610612

611613
Includes all checks in stage 3 plus checks that there are no failed [virtual hosts](/vhosts.html).
612614

613-
RabbitMQ CLI tools currently do not provide a dedicated command for this check.
615+
RabbitMQ CLI tools currently do not provide a dedicated command for this check, but here is an example that could be used in the meantime:
616+
<pre class="lang-bash">
617+
rabbitmqctl eval '[true = rabbit_vhost:is_running_on_all_nodes(VHost) || VHost <- rabbit_vhost:list()], all_vhosts_are_running_on_all_nodes.'
618+
all_vhosts_are_running_on_all_nodes
619+
</pre>
614620

615621
The probability of false positives is generally low except for systems that are under
616622
high CPU load.

0 commit comments

Comments
 (0)