Skip to content

Commit 3d62845

Browse files
authored
DOCSP-17952 documents Health Observers (#824)
* adds Health Observer params section * adds Health Observers params section * concept * nav * format fix * adds progressMonitor * active fault * subsections * double backticks * intro * typo * overview examples * title update * rename * lexicographic ordering * tidy * tidy * extra i * review feedback * remove Params section from overview * backtick * include-ify notes re ``values`` arrays * one more time * sets up variables for more consistent usage * update toc * address fact-progress-monitor-fields.rst build error * partial * incorporates tech review feedback * updates setParameter config file examples
1 parent 349c764 commit 3d62845

17 files changed

+419
-0
lines changed

snooty.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ toc_landing_pages = [
1919
"/administration/backup-sharded-clusters",
2020
"/administration/configuration-and-maintenance",
2121
"/administration/connection-pool-overview",
22+
"/administration/health-managers",
2223
"/administration/install-community",
2324
"/administration/install-enterprise-linux",
2425
"/administration/install-enterprise",

source/administration/analyzing-mongodb-performance.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -262,4 +262,5 @@ analyzing or debugging issues with support from MongoDB Inc. engineers.
262262
/administration/connection-pool-overview
263263
/tutorial/manage-the-database-profiler
264264
/tutorial/transparent-huge-pages
265+
/administration/health-managers
265266
/reference/ulimit
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
.. _health-managers-overview:
2+
3+
.. include:: /includes/health-manager-short-names.rst
4+
5+
==================================================
6+
Manage Sharded Cluster Health with Health Managers
7+
==================================================
8+
9+
.. default-domain:: mongodb
10+
11+
.. contents:: On this page
12+
:local:
13+
:backlinks: none
14+
:depth: 1
15+
:class: singlecol
16+
17+
This document describes how to use |HMS| to monitor and manage sharded
18+
cluster health issues.
19+
20+
Overview
21+
--------
22+
23+
A |HM| runs health checks on a :term:`health manager facet`
24+
at a specified :ref:`intensity level
25+
<health-managers-intensity-levels>`. |HM| checks
26+
run at specified time intervals. A |HM| can be configured to
27+
move a failing :ref:`mongos <mongos>` out of a cluster automatically.
28+
:ref:`Progress Monitor <health-managers-progress-monitor>` ensures
29+
that |HM| checks do not become stuck or unresponsive.
30+
31+
.. _health-managers-facets:
32+
33+
Health Manager Facets
34+
~~~~~~~~~~~~~~~~~~~~~
35+
36+
The following table shows the available |HM| facets:
37+
38+
.. include:: /includes/fact-health-manager-facets.rst
39+
40+
.. _health-managers-intensity-levels:
41+
42+
Health Manager Intensity Levels
43+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
44+
45+
The following table shows the available |HM| intensity levels:
46+
47+
.. include:: /includes/fact-health-manager-intensities.rst
48+
49+
.. _health-managers-active-fault:
50+
51+
Active Fault Duration
52+
---------------------
53+
54+
When a failure is detected and the |HM| intensity level
55+
is set to ``critical``, the |HM| waits the amount of time specified by
56+
:parameter:`activeFaultDurationSecs` before stopping and moving the
57+
:ref:`mongos <mongos>` out of the cluster automatically.
58+
59+
.. _health-managers-progress-monitor:
60+
61+
Progress Monitor
62+
----------------
63+
64+
.. include:: /includes/fact-progressMonitor.rst
65+
66+
``progressMonitor`` Fields
67+
~~~~~~~~~~~~~~~~~~~~~~~~~~
68+
69+
.. include:: /includes/fact-progress-monitor-fields.rst
70+
71+
Examples
72+
--------
73+
74+
The following examples show how |HMS| can be configured. For
75+
information on |HM| parameters, see :ref:`health-manager-parameters`.
76+
77+
Intensity
78+
~~~~~~~~~
79+
80+
.. include:: /includes/example-healthMonitoringIntensities.rst
81+
82+
.. include:: /includes/fact-healthMonitoringIntensities-values-array.rst
83+
84+
See :parameter:`healthMonitoringIntensities` for details.
85+
86+
Intervals
87+
~~~~~~~~~
88+
89+
.. include:: /includes/example-healthMonitoringIntervals.rst
90+
91+
.. include:: /includes/fact-healthMonitoringIntervals-values-array.rst
92+
93+
See :parameter:`healthMonitoringIntervals` for details.
94+
95+
Active Fault Duration
96+
~~~~~~~~~~~~~~~~~~~~~
97+
98+
.. include:: /includes/example-activeFaultDurationSecs.rst
99+
100+
See :parameter:`activeFaultDurationSecs` for details.
101+
102+
Progress Monitor
103+
~~~~~~~~~~~~~~~~
104+
105+
.. include:: /includes/fact-progressMonitor.rst
106+
107+
.. include:: /includes/example-progress-monitor.rst
108+
109+
See :parameter:`progressMonitor` for details.
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
For example, to set the duration from failure to crash to five
2+
minutes, issue the following at startup:
3+
4+
.. code-block:: bash
5+
6+
mongos --setParameter activeFaultDurationSecs=300
7+
8+
Or if using the :dbcommand:`setParameter` command in a
9+
:binary:`~bin.mongosh` session that is connected to a running
10+
:binary:`~bin.mongos`:
11+
12+
.. code-block:: javascript
13+
14+
db.adminCommand(
15+
{
16+
setParameter: 1,
17+
activeFaultDurationSecs: 300
18+
}
19+
)
20+
21+
22+
Parameters set with :dbcommand:`setParameter` do not persist across
23+
restarts. See the :ref:`setParameter page
24+
<setParameter-commands-not-persistent>` for details.
25+
26+
To make this setting persistent, set ``activeFaultDurationSecs``
27+
in your :ref:`mongos config file <configuration-options>` using the
28+
:setting:`setParameter` option as in the following example:
29+
30+
.. code-block:: yaml
31+
32+
setParameter:
33+
activeFaultDurationSecs: 300
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
For example, to set the ``dns`` |HM| facet to the
2+
``critical`` intensity level, issue the following at startup:
3+
4+
.. code-block:: bash
5+
6+
mongos --setParameter 'healthMonitoringIntensities={ values:[ { type:"dns", intensity: "critical"} ] }'
7+
8+
Or if using the :dbcommand:`setParameter` command in a
9+
:binary:`~bin.mongosh` session that is connected to a running
10+
:binary:`~bin.mongos`:
11+
12+
.. code-block:: javascript
13+
14+
db.adminCommand(
15+
{
16+
setParameter: 1,
17+
healthMonitoringIntensities: { values: [ { type: "dns", intensity: "critical" } ] } } )
18+
}
19+
)
20+
21+
Parameters set with :dbcommand:`setParameter` do not persist across
22+
restarts. See the :ref:`setParameter page
23+
<setParameter-commands-not-persistent>` for details.
24+
25+
To make this setting persistent, set ``healthMonitoringIntensities``
26+
in your :ref:`mongos config file <configuration-options>` using the
27+
:setting:`setParameter` option as in the following example:
28+
29+
.. code-block:: yaml
30+
31+
setParameter:
32+
healthMonitoringIntensities: "{ values:[ { type:\"dns\", intensity: \"critical\"} ] }"
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
For example, to set the ``ldap`` |HM| facet to the
2+
run health checks every 30 seconds, issue the following at startup:
3+
4+
.. code-block:: bash
5+
6+
mongos --setParameter 'healthMonitoringIntervals={ values:[ { type:"ldap", interval: "30000"} ] }'
7+
8+
Or if using the :dbcommand:`setParameter` command in a
9+
:binary:`~bin.mongosh` session that is connected to a running
10+
:binary:`~bin.mongos`:
11+
12+
.. code-block:: javascript
13+
14+
db.adminCommand(
15+
{
16+
setParameter: 1,
17+
healthMonitoringIntervals: { values: [ { type: "ldap", interval: "30000" } ] } } )
18+
}
19+
)
20+
21+
Parameters set with :dbcommand:`setParameter` do not persist across
22+
restarts. See the :ref:`setParameter page
23+
<setParameter-commands-not-persistent>` for details.
24+
25+
To make this setting persistent, set ``healthMonitoringIntervals``
26+
in your :ref:`mongos config file <configuration-options>` using the
27+
:setting:`setParameter` option as in the following example:
28+
29+
.. code-block:: yaml
30+
31+
setParameter:
32+
healthMonitoringIntervals: "{ values: [{type: \"ldap\", interval: 200}] }"
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
To set the ``interval`` to 1000 milliseconds and the ``deadline``
2+
to 300 seconds, issue the following at startup:
3+
4+
.. code-block:: bash
5+
6+
mongos --setParameter 'progressMonitor={"interval": 1000, "deadline": 300}'
7+
8+
Or if using the :dbcommand:`setParameter` command in a
9+
:binary:`~bin.mongosh` session that is connected to a running
10+
:binary:`~bin.mongos`:
11+
12+
.. code-block:: javascript
13+
14+
db.adminCommand(
15+
{
16+
setParameter: 1,
17+
progressMonitor: { interval: 1000, deadline: 300 } )
18+
}
19+
)
20+
21+
Parameters set with :dbcommand:`setParameter` do not persist across
22+
restarts. See the :ref:`setParameter page
23+
<setParameter-commands-not-persistent>` for details.
24+
25+
To make this setting persistent, set ``progressMonitor``
26+
in your :ref:`mongos config file <configuration-options>` using the
27+
:setting:`setParameter` option as in the following example:
28+
29+
.. code-block:: yaml
30+
31+
setParameter:
32+
progressMonitor: "{ interval: 1000, deadline: 300 }"
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
.. list-table::
2+
:header-rows: 1
3+
:widths: 25 75
4+
5+
* - Facet
6+
7+
- What the Health Observer Checks
8+
9+
* - ``configServer``
10+
11+
- Cluster health issues related to connectivity to the config server.
12+
13+
* - ``dns``
14+
15+
- Cluster health issues related to DNS availability and functionality.
16+
17+
* - ``ldap``
18+
19+
- Cluster health issues related to LDAP availability and functionality.
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
.. list-table::
2+
:header-rows: 1
3+
:widths: 25 75
4+
5+
* - Intensity Level
6+
7+
- Description
8+
9+
* - ``critical``
10+
11+
- The |HM| on this facet is enabled and has the ability to move the
12+
failing :ref:`mongos <mongos>` out of the cluster if an error
13+
occurs. The |HM| waits the amount of time specified by
14+
:parameter:`activeFaultDurationSecs` before stopping and moving
15+
the :ref:`mongos <mongos>` out of the cluster automatically.
16+
17+
* - ``non-critical``
18+
19+
- The |HM| on this facet is enabled and logs
20+
errors, but the :ref:`mongos <mongos>` remains in the cluster if
21+
errors are encountered.
22+
23+
* - ``off``
24+
25+
- The |HM| on this facet is disabled. The :ref:`mongos
26+
<mongos>` does not perform any health checks on this facet. This
27+
is the default intensity level.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
``healthMonitoringIntensities`` accepts an array of documents,
2+
``values``. Each document in ``values`` takes two fields:
3+
4+
- ``type``, the |HM| facet
5+
- ``intensity``, the intensity level
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
``healthMonitoringIntervals`` accepts an array of documents,
2+
``values``. Each document in ``values`` takes two fields:
3+
4+
- ``type``, the |HM| facet
5+
- ``interval``, the time interval it runs at, in milliseconds
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
.. list-table::
2+
:header-rows: 1
3+
:widths: 25 50 25
4+
5+
* - Field
6+
7+
- Description
8+
9+
- Units
10+
11+
* - ``interval``
12+
13+
- How often to ensure |HMS| are not stuck or unresponsive.
14+
15+
- Milliseconds
16+
17+
* - ``deadline``
18+
19+
- Timeout before automatically failing the :ref:`mongos <mongos>`
20+
if a |HM| check is not making progress.
21+
22+
- Seconds
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
:ref:`Progress Monitor <health-managers-progress-monitor>` runs tests
2+
to ensure that |HM| checks do not become stuck or
3+
unresponsive. Progress Monitor runs these tests in intervals specified
4+
by ``interval``. If a health check begins but does not complete within
5+
the timeout given by ``deadline``, Progress Monitor stops the
6+
:ref:`mongos <mongos>` and removes it from the cluster.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
.. |HM| replace:: Health Manager
2+
.. |HMS| replace:: Health Managers
3+
.. |HMREF| replace:: :ref:`<health-managers-overview>`

source/reference/command/setParameter.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@ Definition
2828
For the available parameters, including examples, see
2929
:doc:`/reference/parameters`.
3030

31+
.. setParameter-commands-not-persistent:
32+
3133
Behavior
3234
--------
3335

source/reference/glossary.txt

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -394,6 +394,21 @@ Glossary
394394
"buckets" of objects grouped by a second criterion. See
395395
:doc:`/core/geohaystack`.
396396

397+
health manager
398+
A health manager runs health checks on a :term:`health manager
399+
facet` at a specified :ref:`intensity level
400+
<health-managers-intensity-levels>`. Health manager checks run at
401+
specified time intervals. A health manager can be configured to
402+
move a failing :ref:`mongos <mongos>` out of a cluster
403+
automatically.
404+
405+
health manager facet
406+
A specific set of features and functionality that a :term:`health
407+
manager` can be configured to run health checks against. For
408+
example, you can configure a health manager to monitor and
409+
manage DNS or LDAP cluster health issues automatically. See
410+
:ref:`health-managers-facets` for details.
411+
397412
hidden member
398413
A :term:`replica set` member that cannot become :term:`primary`
399414
and are invisible to client applications. See

0 commit comments

Comments
 (0)