|
| 1 | +====================================== |
| 2 | +Bare Metal Compute Hardware Management |
| 3 | +====================================== |
| 4 | + |
| 5 | +Bare metal compute nodes are managed by the Ironic services. |
| 6 | +This section describes elements of the configuration of this service. |
| 7 | + |
| 8 | +.. _ironic-node-lifecycle: |
| 9 | + |
| 10 | +Ironic node life cycle |
| 11 | +---------------------- |
| 12 | + |
| 13 | +The deployment process is documented in the `Ironic User Guide <https://docs.openstack.org/ironic/latest/user/index.html>`__. |
| 14 | +OpenStack deployment uses the |
| 15 | +`direct deploy method <https://docs.openstack.org/ironic/latest/user/index.html#example-1-pxe-boot-and-direct-deploy-process>`__. |
| 16 | + |
| 17 | +The Ironic state machine can be found `here <https://docs.openstack.org/ironic/latest/user/states.html>`__. The rest of |
| 18 | +this documentation refers to these states and assumes that you have familiarity. |
| 19 | + |
| 20 | +High level overview of state transitions |
| 21 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 22 | + |
| 23 | +The following section attempts to describe the state transitions for various Ironic operations at a high level. |
| 24 | +It focuses on trying to describe the steps where dynamic switch reconfiguration is triggered. |
| 25 | +For a more detailed overview, refer to the :ref:`ironic-node-lifecycle` section. |
| 26 | + |
| 27 | +Provisioning |
| 28 | +~~~~~~~~~~~~ |
| 29 | + |
| 30 | +Provisioning starts when an instance is created in Nova using a bare metal flavor. |
| 31 | + |
| 32 | +- Node starts in the available state (available) |
| 33 | +- User provisions an instance (deploying) |
| 34 | +- Ironic will switch the node onto the provisioning network (deploying) |
| 35 | +- Ironic will power on the node and will await a callback (wait-callback) |
| 36 | +- Ironic will image the node with an operating system using the image provided at creation (deploying) |
| 37 | +- Ironic switches the node onto the tenant network(s) via neutron (deploying) |
| 38 | +- Transition node to active state (active) |
| 39 | + |
| 40 | +.. _baremetal-management-deprovisioning: |
| 41 | + |
| 42 | +Deprovisioning |
| 43 | +~~~~~~~~~~~~~~ |
| 44 | + |
| 45 | +Deprovisioning starts when an instance created in Nova using a bare metal flavor is destroyed. |
| 46 | + |
| 47 | +If automated cleaning is enabled, it occurs when nodes are deprovisioned. |
| 48 | + |
| 49 | +- Node starts in active state (active) |
| 50 | +- User deletes instance (deleting) |
| 51 | +- Ironic will remove the node from any tenant network(s) (deleting) |
| 52 | +- Ironic will switch the node onto the cleaning network (deleting) |
| 53 | +- Ironic will power on the node and will await a callback (clean-wait) |
| 54 | +- Node boots into Ironic Python Agent and issues callback, Ironic starts cleaning (cleaning) |
| 55 | +- Ironic removes node from cleaning network (cleaning) |
| 56 | +- Node transitions to available (available) |
| 57 | + |
| 58 | +If automated cleaning is disabled. |
| 59 | + |
| 60 | +- Node starts in active state (active) |
| 61 | +- User deletes instance (deleting) |
| 62 | +- Ironic will remove the node from any tenant network(s) (deleting) |
| 63 | +- Node transitions to available (available) |
| 64 | + |
| 65 | +Cleaning |
| 66 | +~~~~~~~~ |
| 67 | + |
| 68 | +Manual cleaning is not part of the regular state transitions when using Nova, however nodes can be manually cleaned by administrators. |
| 69 | + |
| 70 | +- Node starts in the manageable state (manageable) |
| 71 | +- User triggers cleaning with API (cleaning) |
| 72 | +- Ironic will switch the node onto the cleaning network (cleaning) |
| 73 | +- Ironic will power on the node and will await a callback (clean-wait) |
| 74 | +- Node boots into Ironic Python Agent and issues callback, Ironic starts cleaning (cleaning) |
| 75 | +- Ironic removes node from cleaning network (cleaning) |
| 76 | +- Node transitions back to the manageable state (manageable) |
| 77 | + |
| 78 | +Rescuing |
| 79 | +~~~~~~~~ |
| 80 | + |
| 81 | +Feature not used. The required rescue network is not currently configured. |
| 82 | + |
| 83 | +Baremetal networking |
| 84 | +-------------------- |
| 85 | + |
| 86 | +Baremetal networking with the Neutron Networking Generic Switch ML2 driver requires a combination of static and dynamic switch configuration. |
| 87 | + |
| 88 | +.. _static-switch-config: |
| 89 | + |
| 90 | +Static switch configuration |
| 91 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 92 | + |
| 93 | +Static physical network configuration is managed via Kayobe. |
| 94 | + |
| 95 | +.. TODO: Fill in the switch configuration |
| 96 | +
|
| 97 | +- Some initial switch configuration is required before networking generic switch can take over the management of an interface. |
| 98 | + First, LACP must be configured on the switch ports attached to the baremetal node, e.g: |
| 99 | + |
| 100 | + .. code-block:: shell |
| 101 | +
|
| 102 | + The interface is then partially configured: |
| 103 | +
|
| 104 | + .. code-block:: shell |
| 105 | +
|
| 106 | + For :ref:`ironic-node-discovery` to work, you need to manually switch the port to the provisioning network: |
| 107 | +
|
| 108 | + **NOTE**: You only need to do this if Ironic isn't aware of the node. |
| 109 | + |
| 110 | +Configuration with kayobe |
| 111 | +^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 112 | + |
| 113 | +Kayobe can be used to apply the :ref:`static-switch-config`. |
| 114 | + |
| 115 | +- Upstream documentation can be found `here <https://docs.openstack.org/kayobe/latest/configuration/reference/physical-network.html>`__. |
| 116 | +- Kayobe does all the switch configuration that isn't :ref:`dynamically updated using Ironic <dynamic-switch-configuration>`. |
| 117 | +- Optionally switches the node onto the provisioning network (when using ``--enable-discovery``) |
| 118 | + |
| 119 | + + NOTE: This is a dangerous operation as it can wipe out the dynamic VLAN configuration applied by neutron/ironic. |
| 120 | + You should only run this when initially enrolling a node, and should always use the ``interface-description-limit`` option. For example: |
| 121 | + |
| 122 | + .. code-block:: |
| 123 | +
|
| 124 | + kayobe physical network configure --interface-description-limit <description> --group switches --display --enable-discovery |
| 125 | +
|
| 126 | + In this example, ``--display`` is used to preview the switch configuration without applying it. |
| 127 | + |
| 128 | +.. TODO: Fill in information about how switches are configured in kayobe-config, with links |
| 129 | +
|
| 130 | +- Configuration is done using a combination of ``group_vars`` and ``host_vars`` |
| 131 | + |
| 132 | +.. _dynamic-switch-configuration: |
| 133 | + |
| 134 | +Dynamic switch configuration |
| 135 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 136 | + |
| 137 | +Ironic dynamically configures the switches using the Neutron `Networking Generic Switch <https://docs.openstack.org/networking-generic-switch/latest/>`_ ML2 driver. |
| 138 | + |
| 139 | +- Used to toggle the baremetal nodes onto different networks |
| 140 | + |
| 141 | + + Can use any VLAN network defined in OpenStack, providing that the VLAN has been trunked to the controllers |
| 142 | + as this is required for DHCP to function. |
| 143 | + + See :ref:`ironic-node-lifecycle`. This attempts to illustrate when any switch reconfigurations happen. |
| 144 | + |
| 145 | +- Only configures VLAN membership of the switch interfaces or port groups. To prevent conflicts with the static switch configuration, |
| 146 | + the convention used is: after the node is in service in Ironic, VLAN membership should not be manually adjusted and |
| 147 | + should be left to be controlled by ironic i.e *don't* use ``--enable-discovery`` without an interface limit when configuring the |
| 148 | + switches with kayobe. |
| 149 | +- Ironic is configured to use the neutron networking driver. |
| 150 | + |
| 151 | +.. _ngs-commands: |
| 152 | + |
| 153 | +Commands that NGS will execute |
| 154 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 155 | + |
| 156 | +Networking Generic Switch is mainly concerned with toggling the ports onto different VLANs. It |
| 157 | +cannot fully configure the switch. |
| 158 | + |
| 159 | +.. TODO: Fill in the switch configuration |
| 160 | +
|
| 161 | +- Switching the port onto the provisioning network |
| 162 | + |
| 163 | + .. code-block:: shell |
| 164 | +
|
| 165 | +- Switching the port onto the tenant network. |
| 166 | + |
| 167 | + .. code-block:: shell |
| 168 | +
|
| 169 | +- When deleting the instance, the VLANs are removed from the port. Using: |
| 170 | + |
| 171 | + .. code-block:: shell |
| 172 | +
|
| 173 | +NGS will save the configuration after each reconfiguration (by default). |
| 174 | + |
| 175 | +Ports managed by NGS |
| 176 | +^^^^^^^^^^^^^^^^^^^^ |
| 177 | + |
| 178 | +The command below extracts a list of port UUID, node UUID and switch port information. |
| 179 | + |
| 180 | +.. code-block:: bash |
| 181 | +
|
| 182 | + openstack baremetal port list --field uuid --field node_uuid --field local_link_connection --format value |
| 183 | +
|
| 184 | +NGS will manage VLAN membership for ports when the ``local_link_connection`` fields match one of the switches in ``ml2_conf.ini``. |
| 185 | +The rest of the switch configuration is static. |
| 186 | +The switch configuration that NGS will apply to these ports is detailed in :ref:`dynamic-switch-configuration`. |
| 187 | + |
| 188 | +.. _ironic-node-discovery: |
| 189 | + |
| 190 | +Ironic node discovery |
| 191 | +--------------------- |
| 192 | + |
| 193 | +Discovery is a process used to automatically enrol new nodes in Ironic. |
| 194 | +It works by PXE booting the nodes into the Ironic Python Agent (IPA) ramdisk. |
| 195 | +This ramdisk will collect hardware and networking configuration from the node in a process known as introspection. |
| 196 | +This data is used to populate the baremetal node object in Ironic. |
| 197 | +The series of steps you need to take to enrol a new node is as follows: |
| 198 | + |
| 199 | +- Configure credentials on the BMC. These are needed for Ironic to be able to perform power control actions. |
| 200 | + |
| 201 | +- Controllers should have network connectivity with the target BMC. |
| 202 | + |
| 203 | +- (If kayobe manages physical network) Add any additional switch configuration to kayobe config. |
| 204 | + The minimal switch configuration that kayobe needs to know about is described in :ref:`tor-switch-configuration`. |
| 205 | + |
| 206 | +- Apply any :ref:`static switch configration <static-switch-config>`. This performs the initial |
| 207 | + setup of the switchports that is needed before Ironic can take over. The static configuration |
| 208 | + will not be modified by Ironic, so it should be safe to reapply at any point. See :ref:`ngs-commands` |
| 209 | + for details about the switch configuation that Networking Generic Switch will apply. |
| 210 | + |
| 211 | +- (If kayobe manages physical network) Put the node onto the provisioning network by using the |
| 212 | + ``--enable-discovery`` flag and either ``--interface-description-limit`` or ``--interface-limit`` |
| 213 | + (do not run this command without one of these limits). See :ref:`static-switch-config`. |
| 214 | + |
| 215 | + * This is only necessary to initially discover the node. Once the node is in registered in Ironic, |
| 216 | + it will take over control of the the VLAN membership. See :ref:`dynamic-switch-configuration`. |
| 217 | + |
| 218 | + * This provides ethernet connectivity with the controllers over the `workload provisioning` network |
| 219 | + |
| 220 | +- (If kayobe doesn't manage physical network) Put the node onto the provisioning network. |
| 221 | + |
| 222 | +.. TODO: link to the relevant file in kayobe config |
| 223 | +
|
| 224 | +- Add node to the kayobe inventory. |
| 225 | + |
| 226 | +.. TODO: Fill in details about necessary BIOS & RAID config |
| 227 | +
|
| 228 | +- Apply any necesary BIOS & RAID configuration. |
| 229 | + |
| 230 | +.. TODO: Fill in details about how to trigger a PXE boot |
| 231 | +
|
| 232 | +- PXE boot the node. |
| 233 | + |
| 234 | +- If the discovery process is successful, the node will appear in Ironic and will get populated with the necessary information from the hardware inspection process. |
| 235 | + |
| 236 | +.. TODO: Link to the Kayobe inventory in the repo |
| 237 | +
|
| 238 | +- Add node to the Kayobe inventory in the ``baremetal-compute`` group. |
| 239 | + |
| 240 | +- The node will begin in the ``enroll`` state, and must be moved first to ``manageable``, then ``available`` before it can be used. |
| 241 | + |
| 242 | + If Ironic automated cleaning is enabled, the node must complete a cleaning process before it can reach the available state. |
| 243 | + |
| 244 | + * Use Kayobe to attempt to move the node to the ``available`` state. |
| 245 | + |
| 246 | + .. code-block:: console |
| 247 | +
|
| 248 | + source etc/kolla/public-openrc.sh |
| 249 | + kayobe baremetal compute provide --limit <node> |
| 250 | +
|
| 251 | +- Once the node is in the ``available`` state, Nova will make the node available for scheduling. This happens periodically, and typically takes around a minute. |
| 252 | + |
| 253 | +.. _tor-switch-configuration: |
| 254 | + |
| 255 | +Top of Rack (ToR) switch configuration |
| 256 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 257 | + |
| 258 | +Networking Generic Switch must be aware of the Top-of-Rack switch connected to the new node. |
| 259 | +Switches managed by NGS are configured in ``ml2_conf.ini``. |
| 260 | + |
| 261 | +.. TODO: Fill in details about how switches are added to NGS config in kayobe-config |
| 262 | +
|
| 263 | +After adding switches to the NGS configuration, Neutron must be redeployed. |
| 264 | + |
| 265 | +Considerations when booting baremetal compared to VMs |
| 266 | +------------------------------------------------------ |
| 267 | + |
| 268 | +- You can only use networks of type: vlan |
| 269 | +- Without using trunk ports, it is only possible to directly attach one network to each port or port group of an instance. |
| 270 | + |
| 271 | + * To access other networks you can use routers |
| 272 | + * You can still attach floating IPs |
| 273 | + |
| 274 | +- Instances take much longer to provision (expect at least 15 mins) |
| 275 | +- When booting an instance use one of the flavors that maps to a baremetal node via the RESOURCE_CLASS configured on the flavor. |
0 commit comments