|
| 1 | +========================== |
| 2 | +Driver BDM Data Structures |
| 3 | +========================== |
| 4 | + |
| 5 | +In addition to the :doc:`API BDM data format </user/block-device-mapping>` |
| 6 | +there are also several internal data structures within Nova that map out how |
| 7 | +block devices are attached to instances. This document aims to outline the two |
| 8 | +general data structures and two additional specific data structures used by the |
| 9 | +libvirt virt driver. |
| 10 | + |
| 11 | +.. note:: |
| 12 | + |
| 13 | + This document is based on an email to the openstack-dev mailing |
| 14 | + list by Matthew Booth below provided as a primer for developers working on |
| 15 | + virt drivers and interacting with these data structures. |
| 16 | + |
| 17 | + http://lists.openstack.org/pipermail/openstack-dev/2016-June/097529.html |
| 18 | + |
| 19 | +.. note:: |
| 20 | + |
| 21 | + References to local disks in the following document refer to any |
| 22 | + disk directly managed by nova compute. If nova is configured to use RBD or |
| 23 | + NFS for instance disks then these disks won't actually be local, but they |
| 24 | + are still managed locally and referred to as local disks. As opposed to RBD |
| 25 | + volumes provided by Cinder that are not considered local. |
| 26 | + |
| 27 | +Generic BDM data structures |
| 28 | +=========================== |
| 29 | + |
| 30 | +``BlockDeviceMapping`` |
| 31 | +---------------------- |
| 32 | + |
| 33 | +The 'top level' data structure is the ``BlockDeviceMapping`` (BDM) object. It |
| 34 | +is a ``NovaObject``, persisted in the DB. Current code creates a BDM object for |
| 35 | +every disk associated with an instance, whether it is a volume or not. |
| 36 | + |
| 37 | +The BDM object describes properties of each disk as specified by the user. It |
| 38 | +is initially from a user request, for more details on the format of these |
| 39 | +requests please see the :doc:`Block Device Mapping in Nova |
| 40 | +<../user/block-device-mapping>` document. |
| 41 | + |
| 42 | +The Compute API transforms and consolidates all BDMs to ensure that all disks, |
| 43 | +explicit or implicit, have a BDM, and then persists them. Look in |
| 44 | +``nova.objects.block_device`` for all BDM fields, but in essence they contain |
| 45 | +information like (source_type='image', destination_type='local', |
| 46 | +image_id='<image uuid'>), or equivalents describing ephemeral disks, swap disks |
| 47 | +or volumes, and some associated data. |
| 48 | + |
| 49 | +.. note:: |
| 50 | + |
| 51 | + BDM objects are typically stored in variables called ``bdm`` with lists |
| 52 | + in ``bdms``, although this is obviously not guaranteed (and unfortunately |
| 53 | + not always true: ``bdm`` in ``libvirt.block_device`` is usually a |
| 54 | + ``DriverBlockDevice`` object). This is a useful reading aid (except when |
| 55 | + it's proactively confounding), as there is also something else typically |
| 56 | + called ``block_device_mapping`` which is not a ``BlockDeviceMapping`` |
| 57 | + object. |
| 58 | + |
| 59 | +``block_device_info`` |
| 60 | +--------------------- |
| 61 | + |
| 62 | +Drivers do not directly use BDM objects. Instead, they are transformed into a |
| 63 | +different driver-specific representation. This representation is normally |
| 64 | +called ``block_device_info``, and is generated by |
| 65 | +``virt.driver.get_block_device_info()``. Its output is based on data in BDMs. |
| 66 | +``block_device_info`` is a dict containing: |
| 67 | + |
| 68 | +``root_device_name`` |
| 69 | + Hypervisor's notion of the root device's name |
| 70 | +``ephemerals`` |
| 71 | + A list of all ephemeral disks |
| 72 | +``block_device_mapping`` |
| 73 | + A list of all cinder volumes |
| 74 | +``swap`` |
| 75 | + A swap disk, or None if there is no swap disk |
| 76 | + |
| 77 | +The disks are represented in one of two ways, depending on the specific |
| 78 | +driver currently in use. There's the 'new' representation, used by the libvirt |
| 79 | +and vmwareAPI drivers, and the 'legacy' representation used by all other |
| 80 | +drivers. The legacy representation is a plain dict. It does not contain the |
| 81 | +same information as the new representation. |
| 82 | + |
| 83 | +The new representation involves subclasses of |
| 84 | +``nova.block_device.DriverBlockDevice``. As well as containing different |
| 85 | +fields, the new representation significantly also retains a reference to the |
| 86 | +underlying BDM object. This means that by manipulating the |
| 87 | +``DriverBlockDevice`` object, the driver is able to persist data to the BDM |
| 88 | +object in the DB. |
| 89 | + |
| 90 | +.. note:: |
| 91 | + |
| 92 | + Common usage is to pull ``block_device_mapping`` out of this |
| 93 | + dict into a variable called ``block_device_mapping``. This is not a |
| 94 | + ``BlockDeviceMapping`` object, or list of them. |
| 95 | + |
| 96 | +.. note:: |
| 97 | + |
| 98 | + If ``block_device_info`` was passed to the driver by compute manager, it |
| 99 | + was probably generated by ``_get_instance_block_device_info()``. |
| 100 | + By default, this function filters out all cinder volumes from |
| 101 | + ``block_device_mapping`` which don't currently have ``connection_info``. |
| 102 | + In other contexts this filtering will not have happened, and |
| 103 | + ``block_device_mapping`` will contain all volumes. |
| 104 | + |
| 105 | +.. note:: |
| 106 | + |
| 107 | + Unlike BDMs, ``block_device_info`` does not currently represent all |
| 108 | + disks that an instance might have. Significantly, it will not contain any |
| 109 | + representation of an image-backed local disk, i.e. the root disk of a |
| 110 | + typical instance which isn't boot-from-volume. Other representations used |
| 111 | + by the libvirt driver explicitly reconstruct this missing disk. |
| 112 | + |
| 113 | +libvirt driver specific BDM data structures |
| 114 | +=========================================== |
| 115 | + |
| 116 | +``instance_disk_info`` |
| 117 | +---------------------- |
| 118 | + |
| 119 | +The virt driver API defines a method ``get_instance_disk_info``, which returns |
| 120 | +a JSON blob. The compute manager calls this and passes the data over RPC |
| 121 | +between calls without ever looking at it. This is driver-specific opaque data. |
| 122 | +It is also only used by the libvirt driver, despite being part of the API for |
| 123 | +all drivers. Other drivers do not return any data. The most interesting aspect |
| 124 | +of ``instance_disk_info`` is that it is generated from the libvirt XML, not |
| 125 | +from nova's state. |
| 126 | + |
| 127 | +.. note:: |
| 128 | + |
| 129 | + ``instance_disk_info`` is often named ``disk_info`` in code, which |
| 130 | + is unfortunate as this clashes with the normal naming of the next |
| 131 | + structure. Occasionally the two are used in the same block of code. |
| 132 | + |
| 133 | +.. note:: |
| 134 | + |
| 135 | + RBD disks (including non-volume disks) and cinder volumes |
| 136 | + are not included in ``instance_disk_info``. |
| 137 | + |
| 138 | +``instance_disk_info`` is a list of dicts for some of an instance's disks. Each |
| 139 | +dict contains the following: |
| 140 | + |
| 141 | +``type`` |
| 142 | + libvirt's notion of the disk's type |
| 143 | +``path`` |
| 144 | + libvirt's notion of the disk's path |
| 145 | +``virt_disk_size`` |
| 146 | + The disk's virtual size in bytes (the size the guest OS sees) |
| 147 | +``backing_file`` |
| 148 | + libvirt's notion of the backing file path |
| 149 | +``disk_size`` |
| 150 | + The file size of path, in bytes. |
| 151 | +``over_committed_disk_size`` |
| 152 | + As-yet-unallocated disk size, in bytes. |
| 153 | + |
| 154 | +``disk_info`` |
| 155 | +------------- |
| 156 | + |
| 157 | +.. note:: |
| 158 | + |
| 159 | + As opposed to ``instance_disk_info``, which is frequently called |
| 160 | + ``disk_info``. |
| 161 | + |
| 162 | +This data structure is actually described pretty well in the comment block at |
| 163 | +the top of ``nova.virt.libvirt.blockinfo``. It is internal to the libvirt |
| 164 | +driver. It contains: |
| 165 | + |
| 166 | +``disk_bus`` |
| 167 | + The default bus used by disks |
| 168 | +``cdrom_bus`` |
| 169 | + The default bus used by cdrom drives |
| 170 | +``mapping`` |
| 171 | + Defined below |
| 172 | + |
| 173 | +``mapping`` is a dict which maps disk names to a dict describing how that disk |
| 174 | +should be passed to libvirt. This mapping contains every disk connected to the |
| 175 | +instance, both local and volumes. |
| 176 | + |
| 177 | +First, a note on disk naming. Local disk names used by the libvirt driver are |
| 178 | +well defined. They are: |
| 179 | + |
| 180 | +``disk`` |
| 181 | + The root disk |
| 182 | +``disk.local`` |
| 183 | + The flavor-defined ephemeral disk |
| 184 | +``disk.ephX`` |
| 185 | + Where X is a zero-based index for BDM defined ephemeral disks |
| 186 | +``disk.swap`` |
| 187 | + The swap disk |
| 188 | +``disk.config`` |
| 189 | + The config disk |
| 190 | + |
| 191 | +These names are hardcoded, reliable, and used in lots of places. |
| 192 | + |
| 193 | +In ``disk_info``, volumes are keyed by device name, eg 'vda', 'vdb'. Different |
| 194 | +buses will be named differently, approximately according to legacy Linux |
| 195 | +device naming. |
| 196 | + |
| 197 | +Additionally, ``disk_info`` will contain a mapping for 'root', which is the |
| 198 | +root disk. This will duplicate one of the other entries, either 'disk' or a |
| 199 | +volume mapping. |
| 200 | + |
| 201 | +Each dict within the ``mapping`` dict contains the following 3 required fields |
| 202 | +of bus, dev and type with two optional fields of format and ``boot_index``: |
| 203 | + |
| 204 | +``bus``: |
| 205 | + The guest bus type ('ide', 'virtio', 'scsi', etc) |
| 206 | +``dev``: |
| 207 | + The device name 'vda', 'hdc', 'sdf', 'xvde' etc |
| 208 | +``type``: |
| 209 | + Type of device eg 'disk', 'cdrom', 'floppy' |
| 210 | +``format`` |
| 211 | + Which format to apply to the device if applicable |
| 212 | +``boot_index`` |
| 213 | + Number designating the boot order of the device |
| 214 | + |
| 215 | +.. note:: |
| 216 | + |
| 217 | + ``BlockDeviceMapping`` and ``DriverBlockDevice`` store boot index |
| 218 | + zero-based. However, libvirt's boot index is 1-based, so the value stored |
| 219 | + here is 1-based. |
| 220 | + |
| 221 | +.. todo:: |
| 222 | + |
| 223 | + Add a section for the per disk ``disk.info`` file within instance |
| 224 | + directory when using the libvirt driver. |
0 commit comments