Skip to content

Commit 14fdc2a

Browse files
committed
Update baremental management doc
1 parent 3318ba6 commit 14fdc2a

File tree

2 files changed

+276
-0
lines changed

2 files changed

+276
-0
lines changed

source/baremetal_management.rst

Lines changed: 275 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,275 @@
1+
======================================
2+
Bare Metal Compute Hardware Management
3+
======================================
4+
5+
Bare metal compute nodes are managed by the Ironic services.
6+
This section describes elements of the configuration of this service.
7+
8+
.. _ironic-node-lifecycle:
9+
10+
Ironic node life cycle
11+
----------------------
12+
13+
The deployment process is documented in the `Ironic User Guide <https://docs.openstack.org/ironic/latest/user/index.html>`__.
14+
OpenStack deployment uses the
15+
`direct deploy method <https://docs.openstack.org/ironic/latest/user/index.html#example-1-pxe-boot-and-direct-deploy-process>`__.
16+
17+
The Ironic state machine can be found `here <https://docs.openstack.org/ironic/latest/user/states.html>`__. The rest of
18+
this documentation refers to these states and assumes that you have familiarity.
19+
20+
High level overview of state transitions
21+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
22+
23+
The following section attempts to describe the state transitions for various Ironic operations at a high level.
24+
It focuses on trying to describe the steps where dynamic switch reconfiguration is triggered.
25+
For a more detailed overview, refer to the :ref:`ironic-node-lifecycle` section.
26+
27+
Provisioning
28+
~~~~~~~~~~~~
29+
30+
Provisioning starts when an instance is created in Nova using a bare metal flavor.
31+
32+
- Node starts in the available state (available)
33+
- User provisions an instance (deploying)
34+
- Ironic will switch the node onto the provisioning network (deploying)
35+
- Ironic will power on the node and will await a callback (wait-callback)
36+
- Ironic will image the node with an operating system using the image provided at creation (deploying)
37+
- Ironic switches the node onto the tenant network(s) via neutron (deploying)
38+
- Transition node to active state (active)
39+
40+
.. _baremetal-management-deprovisioning:
41+
42+
Deprovisioning
43+
~~~~~~~~~~~~~~
44+
45+
Deprovisioning starts when an instance created in Nova using a bare metal flavor is destroyed.
46+
47+
If automated cleaning is enabled, it occurs when nodes are deprovisioned.
48+
49+
- Node starts in active state (active)
50+
- User deletes instance (deleting)
51+
- Ironic will remove the node from any tenant network(s) (deleting)
52+
- Ironic will switch the node onto the cleaning network (deleting)
53+
- Ironic will power on the node and will await a callback (clean-wait)
54+
- Node boots into Ironic Python Agent and issues callback, Ironic starts cleaning (cleaning)
55+
- Ironic removes node from cleaning network (cleaning)
56+
- Node transitions to available (available)
57+
58+
If automated cleaning is disabled.
59+
60+
- Node starts in active state (active)
61+
- User deletes instance (deleting)
62+
- Ironic will remove the node from any tenant network(s) (deleting)
63+
- Node transitions to available (available)
64+
65+
Cleaning
66+
~~~~~~~~
67+
68+
Manual cleaning is not part of the regular state transitions when using Nova, however nodes can be manually cleaned by administrators.
69+
70+
- Node starts in the manageable state (manageable)
71+
- User triggers cleaning with API (cleaning)
72+
- Ironic will switch the node onto the cleaning network (cleaning)
73+
- Ironic will power on the node and will await a callback (clean-wait)
74+
- Node boots into Ironic Python Agent and issues callback, Ironic starts cleaning (cleaning)
75+
- Ironic removes node from cleaning network (cleaning)
76+
- Node transitions back to the manageable state (manageable)
77+
78+
Rescuing
79+
~~~~~~~~
80+
81+
Feature not used. The required rescue network is not currently configured.
82+
83+
Baremetal networking
84+
--------------------
85+
86+
Baremetal networking with the Neutron Networking Generic Switch ML2 driver requires a combination of static and dynamic switch configuration.
87+
88+
.. _static-switch-config:
89+
90+
Static switch configuration
91+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
92+
93+
Static physical network configuration is managed via Kayobe.
94+
95+
.. TODO: Fill in the switch configuration
96+
97+
- Some initial switch configuration is required before networking generic switch can take over the management of an interface.
98+
First, LACP must be configured on the switch ports attached to the baremetal node, e.g:
99+
100+
.. code-block:: shell
101+
102+
The interface is then partially configured:
103+
104+
.. code-block:: shell
105+
106+
For :ref:`ironic-node-discovery` to work, you need to manually switch the port to the provisioning network:
107+
108+
**NOTE**: You only need to do this if Ironic isn't aware of the node.
109+
110+
Configuration with kayobe
111+
^^^^^^^^^^^^^^^^^^^^^^^^^
112+
113+
Kayobe can be used to apply the :ref:`static-switch-config`.
114+
115+
- Upstream documentation can be found `here <https://docs.openstack.org/kayobe/latest/configuration/reference/physical-network.html>`__.
116+
- Kayobe does all the switch configuration that isn't :ref:`dynamically updated using Ironic <dynamic-switch-configuration>`.
117+
- Optionally switches the node onto the provisioning network (when using ``--enable-discovery``)
118+
119+
+ NOTE: This is a dangerous operation as it can wipe out the dynamic VLAN configuration applied by neutron/ironic.
120+
You should only run this when initially enrolling a node, and should always use the ``interface-description-limit`` option. For example:
121+
122+
.. code-block::
123+
124+
kayobe physical network configure --interface-description-limit <description> --group switches --display --enable-discovery
125+
126+
In this example, ``--display`` is used to preview the switch configuration without applying it.
127+
128+
.. TODO: Fill in information about how switches are configured in kayobe-config, with links
129+
130+
- Configuration is done using a combination of ``group_vars`` and ``host_vars``
131+
132+
.. _dynamic-switch-configuration:
133+
134+
Dynamic switch configuration
135+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
136+
137+
Ironic dynamically configures the switches using the Neutron `Networking Generic Switch <https://docs.openstack.org/networking-generic-switch/latest/>`_ ML2 driver.
138+
139+
- Used to toggle the baremetal nodes onto different networks
140+
141+
+ Can use any VLAN network defined in OpenStack, providing that the VLAN has been trunked to the controllers
142+
as this is required for DHCP to function.
143+
+ See :ref:`ironic-node-lifecycle`. This attempts to illustrate when any switch reconfigurations happen.
144+
145+
- Only configures VLAN membership of the switch interfaces or port groups. To prevent conflicts with the static switch configuration,
146+
the convention used is: after the node is in service in Ironic, VLAN membership should not be manually adjusted and
147+
should be left to be controlled by ironic i.e *don't* use ``--enable-discovery`` without an interface limit when configuring the
148+
switches with kayobe.
149+
- Ironic is configured to use the neutron networking driver.
150+
151+
.. _ngs-commands:
152+
153+
Commands that NGS will execute
154+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
155+
156+
Networking Generic Switch is mainly concerned with toggling the ports onto different VLANs. It
157+
cannot fully configure the switch.
158+
159+
.. TODO: Fill in the switch configuration
160+
161+
- Switching the port onto the provisioning network
162+
163+
.. code-block:: shell
164+
165+
- Switching the port onto the tenant network.
166+
167+
.. code-block:: shell
168+
169+
- When deleting the instance, the VLANs are removed from the port. Using:
170+
171+
.. code-block:: shell
172+
173+
NGS will save the configuration after each reconfiguration (by default).
174+
175+
Ports managed by NGS
176+
^^^^^^^^^^^^^^^^^^^^
177+
178+
The command below extracts a list of port UUID, node UUID and switch port information.
179+
180+
.. code-block:: bash
181+
182+
openstack baremetal port list --field uuid --field node_uuid --field local_link_connection --format value
183+
184+
NGS will manage VLAN membership for ports when the ``local_link_connection`` fields match one of the switches in ``ml2_conf.ini``.
185+
The rest of the switch configuration is static.
186+
The switch configuration that NGS will apply to these ports is detailed in :ref:`dynamic-switch-configuration`.
187+
188+
.. _ironic-node-discovery:
189+
190+
Ironic node discovery
191+
---------------------
192+
193+
Discovery is a process used to automatically enrol new nodes in Ironic.
194+
It works by PXE booting the nodes into the Ironic Python Agent (IPA) ramdisk.
195+
This ramdisk will collect hardware and networking configuration from the node in a process known as introspection.
196+
This data is used to populate the baremetal node object in Ironic.
197+
The series of steps you need to take to enrol a new node is as follows:
198+
199+
- Configure credentials on the BMC. These are needed for Ironic to be able to perform power control actions.
200+
201+
- Controllers should have network connectivity with the target BMC.
202+
203+
- (If kayobe manages physical network) Add any additional switch configuration to kayobe config.
204+
The minimal switch configuration that kayobe needs to know about is described in :ref:`tor-switch-configuration`.
205+
206+
- Apply any :ref:`static switch configration <static-switch-config>`. This performs the initial
207+
setup of the switchports that is needed before Ironic can take over. The static configuration
208+
will not be modified by Ironic, so it should be safe to reapply at any point. See :ref:`ngs-commands`
209+
for details about the switch configuation that Networking Generic Switch will apply.
210+
211+
- (If kayobe manages physical network) Put the node onto the provisioning network by using the
212+
``--enable-discovery`` flag and either ``--interface-description-limit`` or ``--interface-limit``
213+
(do not run this command without one of these limits). See :ref:`static-switch-config`.
214+
215+
* This is only necessary to initially discover the node. Once the node is in registered in Ironic,
216+
it will take over control of the the VLAN membership. See :ref:`dynamic-switch-configuration`.
217+
218+
* This provides ethernet connectivity with the controllers over the `workload provisioning` network
219+
220+
- (If kayobe doesn't manage physical network) Put the node onto the provisioning network.
221+
222+
.. TODO: link to the relevant file in kayobe config
223+
224+
- Add node to the kayobe inventory.
225+
226+
.. TODO: Fill in details about necessary BIOS & RAID config
227+
228+
- Apply any necesary BIOS & RAID configuration.
229+
230+
.. TODO: Fill in details about how to trigger a PXE boot
231+
232+
- PXE boot the node.
233+
234+
- If the discovery process is successful, the node will appear in Ironic and will get populated with the necessary information from the hardware inspection process.
235+
236+
.. TODO: Link to the Kayobe inventory in the repo
237+
238+
- Add node to the Kayobe inventory in the ``baremetal-compute`` group.
239+
240+
- The node will begin in the ``enroll`` state, and must be moved first to ``manageable``, then ``available`` before it can be used.
241+
242+
If Ironic automated cleaning is enabled, the node must complete a cleaning process before it can reach the available state.
243+
244+
* Use Kayobe to attempt to move the node to the ``available`` state.
245+
246+
.. code-block:: console
247+
248+
source etc/kolla/public-openrc.sh
249+
kayobe baremetal compute provide --limit <node>
250+
251+
- Once the node is in the ``available`` state, Nova will make the node available for scheduling. This happens periodically, and typically takes around a minute.
252+
253+
.. _tor-switch-configuration:
254+
255+
Top of Rack (ToR) switch configuration
256+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
257+
258+
Networking Generic Switch must be aware of the Top-of-Rack switch connected to the new node.
259+
Switches managed by NGS are configured in ``ml2_conf.ini``.
260+
261+
.. TODO: Fill in details about how switches are added to NGS config in kayobe-config
262+
263+
After adding switches to the NGS configuration, Neutron must be redeployed.
264+
265+
Considerations when booting baremetal compared to VMs
266+
------------------------------------------------------
267+
268+
- You can only use networks of type: vlan
269+
- Without using trunk ports, it is only possible to directly attach one network to each port or port group of an instance.
270+
271+
* To access other networks you can use routers
272+
* You can still attach floating IPs
273+
274+
- Instances take much longer to provision (expect at least 15 mins)
275+
- When booting an instance use one of the flavors that maps to a baremetal node via the RESOURCE_CLASS configured on the flavor.

source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Contents
2121
overview_of_system
2222
working_with_kayobe
2323
access_to_services
24+
baremetal_management
2425
physical_network
2526

2627
Indices and search

0 commit comments

Comments
 (0)