|
| 1 | +============== |
| 2 | +Upgrading Ceph |
| 3 | +============== |
| 4 | + |
| 5 | +This section describes show to upgrade from one version of Ceph to another. |
| 6 | +The Ceph upgrade procedure is described :ceph-doc:`here <cephadm/upgrade>`. |
| 7 | + |
| 8 | +The Ceph release series is not strictly dependent upon the StackHPC OpenStack |
| 9 | +release, however this configuration does define a default Ceph release series |
| 10 | +and container image tag. The default release series is currently |ceph_series|. |
| 11 | + |
| 12 | +Prerequisites |
| 13 | +============= |
| 14 | + |
| 15 | +Before starting the upgrade, ensure any appropriate prerequisites are |
| 16 | +satisfied. These will be specific to each deployment, but here are some |
| 17 | +suggestions: |
| 18 | + |
| 19 | +* Ensure that expected test suites are passing, e.g. Tempest. |
| 20 | +* Resolve any Prometheus alerts. |
| 21 | +* Check for unexpected ``ERROR`` or ``CRITICAL`` messages in OpenSearch |
| 22 | + Dashboard. |
| 23 | +* Check Grafana dashboards. |
| 24 | + |
| 25 | +Consider whether the Ceph cluster needs to be upgraded within or outside of a |
| 26 | +maintenance/change window. |
| 27 | + |
| 28 | +Preparation |
| 29 | +=========== |
| 30 | + |
| 31 | +Ensure that the local Kayobe configuration environment is up to date. |
| 32 | + |
| 33 | +If you wish to use a different Ceph release series, set |
| 34 | +``cephadm_ceph_release``. |
| 35 | + |
| 36 | +If you wish to use different Ceph container image tags, set the following |
| 37 | +variables: |
| 38 | + |
| 39 | +* ``cephadm_image_tag`` (`tags <https://quay.io/repository/ceph/ceph?tab=tags&tag=latest>`__) |
| 40 | +* ``cephadm_haproxy_image_tag`` (`tags <https://quay.io/repository/ceph/haproxy?tab=tags&tag=latest>`__) |
| 41 | +* ``cephadm_keepalived_image_tag`` (`tags <https://quay.io/repository/ceph/keepalived?tab=tags&tag=latest>`__) |
| 42 | + |
| 43 | +Be sure to use a tag that `matches the release series |
| 44 | +<https://docs.ceph.com/en/latest/releases/>`__. |
| 45 | + |
| 46 | +Upgrading Host Packages |
| 47 | +======================= |
| 48 | + |
| 49 | +Prior to upgrading the Ceph storage cluster, it may be desirable to upgrade |
| 50 | +system packages on the hosts. |
| 51 | + |
| 52 | +Note that these commands do not affect packages installed in containers, only |
| 53 | +those installed on the host. |
| 54 | + |
| 55 | +In order to avoid downtime, it is important to control how package updates are |
| 56 | +rolled out. In general, Ceph monitor hosts should be updated *one by one*. For |
| 57 | +Ceph OSD hosts it may be possible to update packages in batches of hosts, |
| 58 | +provided there is sufficient capacity to maintain data availability. |
| 59 | + |
| 60 | +For each host or batch of hosts, perform the following steps. |
| 61 | + |
| 62 | +Place the host or batch of hosts into maintenance mode: |
| 63 | + |
| 64 | +.. code-block:: console |
| 65 | +
|
| 66 | + sudo cephadm shell -- ceph orch host maintenance enter <host> |
| 67 | +
|
| 68 | +To update all eligible packages, use ``*``, escaping if necessary: |
| 69 | + |
| 70 | +.. code-block:: console |
| 71 | +
|
| 72 | + kayobe overcloud host package update --packages "*" --limit <host> |
| 73 | +
|
| 74 | +If the kernel has been upgraded, reboot the host or batch of hosts to pick up |
| 75 | +the change: |
| 76 | + |
| 77 | +.. code-block:: console |
| 78 | +
|
| 79 | + kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml -l <host> |
| 80 | +
|
| 81 | +Remove the host or batch of hosts from maintenance mode: |
| 82 | + |
| 83 | +.. code-block:: console |
| 84 | +
|
| 85 | + sudo cephadm shell -- ceph orch host maintenance exit <host> |
| 86 | +
|
| 87 | +Wait for Ceph health to return to ``HEALTH_OK``: |
| 88 | + |
| 89 | +.. code-block:: console |
| 90 | +
|
| 91 | + ceph -s |
| 92 | +
|
| 93 | +Wait for Prometheus alerts and errors in OpenSearch Dashboard to resolve, or |
| 94 | +address them. |
| 95 | + |
| 96 | +Once happy that the system has been restored to full health, move onto the next |
| 97 | +host or batch or hosts. |
| 98 | + |
| 99 | +Sync container images |
| 100 | +===================== |
| 101 | + |
| 102 | +If using the local Pulp server to host Ceph images |
| 103 | +(``stackhpc_sync_ceph_images`` is ``true``), sync the new Ceph images into the |
| 104 | +local Pulp: |
| 105 | + |
| 106 | +.. code-block:: console |
| 107 | +
|
| 108 | + kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-container-{sync,publish}.yml -e stackhpc_pulp_images_kolla_filter=none |
| 109 | +
|
| 110 | +Upgrade Ceph services |
| 111 | +===================== |
| 112 | + |
| 113 | +Start the upgrade. If using the local Pulp server to host Ceph images: |
| 114 | + |
| 115 | +.. code-block:: console |
| 116 | +
|
| 117 | + sudo cephadm shell -- ceph orch upgrade start --image <registry>/ceph/ceph:<tag> |
| 118 | +
|
| 119 | +Otherwise: |
| 120 | + |
| 121 | +.. code-block:: console |
| 122 | +
|
| 123 | + sudo cephadm shell -- ceph orch upgrade start --image quay.io/ceph/ceph:<tag> |
| 124 | +
|
| 125 | +The tag should match the ``cephadm_image_tag`` variable set in `preparation |
| 126 | +<#preparation>`_. The registry should be the address and port of the local Pulp |
| 127 | +server. |
| 128 | + |
| 129 | +Check the update status: |
| 130 | + |
| 131 | +.. code-block:: console |
| 132 | +
|
| 133 | + ceph orch upgrade status |
| 134 | +
|
| 135 | +Wait for Ceph health to return to ``HEALTH_OK``: |
| 136 | + |
| 137 | +.. code-block:: console |
| 138 | +
|
| 139 | + ceph -s |
| 140 | +
|
| 141 | +Watch the cephadm logs: |
| 142 | + |
| 143 | +.. code-block:: console |
| 144 | +
|
| 145 | + ceph -W cephadm |
| 146 | +
|
| 147 | +Upgrade Cephadm |
| 148 | +=============== |
| 149 | + |
| 150 | +Update the Cephadm package: |
| 151 | + |
| 152 | +.. code-block:: console |
| 153 | +
|
| 154 | + kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/cephadm-deploy.yml -e cephadm_package_update=true |
| 155 | +
|
| 156 | +Testing |
| 157 | +======= |
| 158 | + |
| 159 | +At this point it is recommended to perform a thorough test of the system to |
| 160 | +catch any unexpected issues. This may include: |
| 161 | + |
| 162 | +* Check Prometheus, OpenSearch Dashboards and Grafana |
| 163 | +* Smoke tests |
| 164 | +* All applicable tempest tests |
| 165 | +* Horizon UI inspection |
| 166 | + |
| 167 | +Cleaning up |
| 168 | +=========== |
| 169 | + |
| 170 | +Prune unused container images: |
| 171 | + |
| 172 | +.. code-block:: console |
| 173 | +
|
| 174 | + kayobe overcloud host command run -b --command "docker image prune -a -f" -l ceph |
0 commit comments