|
| 1 | + |
| 2 | +Immutable biovecs and biovec iterators: |
| 3 | +======================================= |
| 4 | + |
| 5 | +Kent Overstreet < [email protected]> |
| 6 | + |
| 7 | +As of 3.13, biovecs should never be modified after a bio has been submitted. |
| 8 | +Instead, we have a new struct bvec_iter which represents a range of a biovec - |
| 9 | +the iterator will be modified as the bio is completed, not the biovec. |
| 10 | + |
| 11 | +More specifically, old code that needed to partially complete a bio would |
| 12 | +update bi_sector and bi_size, and advance bi_idx to the next biovec. If it |
| 13 | +ended up partway through a biovec, it would increment bv_offset and decrement |
| 14 | +bv_len by the number of bytes completed in that biovec. |
| 15 | + |
| 16 | +In the new scheme of things, everything that must be mutated in order to |
| 17 | +partially complete a bio is segregated into struct bvec_iter: bi_sector, |
| 18 | +bi_size and bi_idx have been moved there; and instead of modifying bv_offset |
| 19 | +and bv_len, struct bvec_iter has bi_bvec_done, which represents the number of |
| 20 | +bytes completed in the current bvec. |
| 21 | + |
| 22 | +There are a bunch of new helper macros for hiding the gory details - in |
| 23 | +particular, presenting the illusion of partially completed biovecs so that |
| 24 | +normal code doesn't have to deal with bi_bvec_done. |
| 25 | + |
| 26 | + * Driver code should no longer refer to biovecs directly; we now have |
| 27 | + bio_iovec() and bio_iovec_iter() macros that return literal struct biovecs, |
| 28 | + constructed from the raw biovecs but taking into account bi_bvec_done and |
| 29 | + bi_size. |
| 30 | + |
| 31 | + bio_for_each_segment() has been updated to take a bvec_iter argument |
| 32 | + instead of an integer (that corresponded to bi_idx); for a lot of code the |
| 33 | + conversion just required changing the types of the arguments to |
| 34 | + bio_for_each_segment(). |
| 35 | + |
| 36 | + * Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a |
| 37 | + wrapper around bio_advance_iter() that operates on bio->bi_iter, and also |
| 38 | + advances the bio integrity's iter if present. |
| 39 | + |
| 40 | + There is a lower level advance function - bvec_iter_advance() - which takes |
| 41 | + a pointer to a biovec, not a bio; this is used by the bio integrity code. |
| 42 | + |
| 43 | +What's all this get us? |
| 44 | +======================= |
| 45 | + |
| 46 | +Having a real iterator, and making biovecs immutable, has a number of |
| 47 | +advantages: |
| 48 | + |
| 49 | + * Before, iterating over bios was very awkward when you weren't processing |
| 50 | + exactly one bvec at a time - for example, bio_copy_data() in fs/bio.c, |
| 51 | + which copies the contents of one bio into another. Because the biovecs |
| 52 | + wouldn't necessarily be the same size, the old code was tricky convoluted - |
| 53 | + it had to walk two different bios at the same time, keeping both bi_idx and |
| 54 | + and offset into the current biovec for each. |
| 55 | + |
| 56 | + The new code is much more straightforward - have a look. This sort of |
| 57 | + pattern comes up in a lot of places; a lot of drivers were essentially open |
| 58 | + coding bvec iterators before, and having common implementation considerably |
| 59 | + simplifies a lot of code. |
| 60 | + |
| 61 | + * Before, any code that might need to use the biovec after the bio had been |
| 62 | + completed (perhaps to copy the data somewhere else, or perhaps to resubmit |
| 63 | + it somewhere else if there was an error) had to save the entire bvec array |
| 64 | + - again, this was being done in a fair number of places. |
| 65 | + |
| 66 | + * Biovecs can be shared between multiple bios - a bvec iter can represent an |
| 67 | + arbitrary range of an existing biovec, both starting and ending midway |
| 68 | + through biovecs. This is what enables efficient splitting of arbitrary |
| 69 | + bios. Note that this means we _only_ use bi_size to determine when we've |
| 70 | + reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes |
| 71 | + bi_size into account when constructing biovecs. |
| 72 | + |
| 73 | + * Splitting bios is now much simpler. The old bio_split() didn't even work on |
| 74 | + bios with more than a single bvec! Now, we can efficiently split arbitrary |
| 75 | + size bios - because the new bio can share the old bio's biovec. |
| 76 | + |
| 77 | + Care must be taken to ensure the biovec isn't freed while the split bio is |
| 78 | + still using it, in case the original bio completes first, though. Using |
| 79 | + bio_chain() when splitting bios helps with this. |
| 80 | + |
| 81 | + * Submitting partially completed bios is now perfectly fine - this comes up |
| 82 | + occasionally in stacking block drivers and various code (e.g. md and |
| 83 | + bcache) had some ugly workarounds for this. |
| 84 | + |
| 85 | + It used to be the case that submitting a partially completed bio would work |
| 86 | + fine to _most_ devices, but since accessing the raw bvec array was the |
| 87 | + norm, not all drivers would respect bi_idx and those would break. Now, |
| 88 | + since all drivers _must_ go through the bvec iterator - and have been |
| 89 | + audited to make sure they are - submitting partially completed bios is |
| 90 | + perfectly fine. |
| 91 | + |
| 92 | +Other implications: |
| 93 | +=================== |
| 94 | + |
| 95 | + * Almost all usage of bi_idx is now incorrect and has been removed; instead, |
| 96 | + where previously you would have used bi_idx you'd now use a bvec_iter, |
| 97 | + probably passing it to one of the helper macros. |
| 98 | + |
| 99 | + I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you |
| 100 | + now use bio_iter_iovec(), which takes a bvec_iter and returns a |
| 101 | + literal struct bio_vec - constructed on the fly from the raw biovec but |
| 102 | + taking into account bi_bvec_done (and bi_size). |
| 103 | + |
| 104 | + * bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that |
| 105 | + doesn't actually own the bio. The reason is twofold: firstly, it's not |
| 106 | + actually needed for iterating over the bio anymore - we only use bi_size. |
| 107 | + Secondly, when cloning a bio and reusing (a portion of) the original bio's |
| 108 | + biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate |
| 109 | + over all the biovecs in the new bio - which is silly as it's not needed. |
| 110 | + |
| 111 | + So, don't use bi_vcnt anymore. |
0 commit comments