Skip to content

Commit 3b99107

Browse files
committed
Merge tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe: "This is the main block updates for 5.3. Nothing earth shattering or major in here, just fixes, additions, and improvements all over the map. This contains: - Series of documentation fixes (Bart) - Optimization of the blk-mq ctx get/put (Bart) - null_blk removal race condition fix (Bob) - req/bio_op() cleanups (Chaitanya) - Series cleaning up the segment accounting, and request/bio mapping (Christoph) - Series cleaning up the page getting/putting for bios (Christoph) - block cgroup cleanups and moving it to where it is used (Christoph) - block cgroup fixes (Tejun) - Series of fixes and improvements to bcache, most notably a write deadlock fix (Coly) - blk-iolatency STS_AGAIN and accounting fixes (Dennis) - Series of improvements and fixes to BFQ (Douglas, Paolo) - debugfs_create() return value check removal for drbd (Greg) - Use struct_size(), where appropriate (Gustavo) - Two lighnvm fixes (Heiner, Geert) - MD fixes, including a read balance and corruption fix (Guoqing, Marcos, Xiao, Yufen) - block opal shadow mbr additions (Jonas, Revanth) - sbitmap compare-and-exhange improvemnts (Pavel) - Fix for potential bio->bi_size overflow (Ming) - NVMe pull requests: - improved PCIe suspent support (Keith Busch) - error injection support for the admin queue (Akinobu Mita) - Fibre Channel discovery improvements (James Smart) - tracing improvements including nvmetc tracing support (Minwoo Im) - misc fixes and cleanups (Anton Eidelman, Minwoo Im, Chaitanya Kulkarni)" - Various little fixes and improvements to drivers and core" * tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block: (153 commits) blk-iolatency: fix STS_AGAIN handling block: nr_phys_segments needs to be zero for REQ_OP_WRITE_ZEROES blk-mq: simplify blk_mq_make_request() blk-mq: remove blk_mq_put_ctx() sbitmap: Replace cmpxchg with xchg block: fix .bi_size overflow block: sed-opal: check size of shadow mbr block: sed-opal: ioctl for writing to shadow mbr block: sed-opal: add ioctl for done-mark of shadow mbr block: never take page references for ITER_BVEC direct-io: use bio_release_pages in dio_bio_complete block_dev: use bio_release_pages in bio_unmap_user block_dev: use bio_release_pages in blkdev_bio_end_io iomap: use bio_release_pages in iomap_dio_bio_end_io block: use bio_release_pages in bio_map_user_iov block: use bio_release_pages in bio_unmap_user block: optionally mark pages dirty in bio_release_pages block: move the BIO_NO_PAGE_REF check into bio_release_pages block: skd_main.c: Remove call to memset after dma_alloc_coherent block: mtip32xx: Remove call to memset after dma_alloc_coherent ...
2 parents 0415052 + c9b3007 commit 3b99107

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

104 files changed

+3368
-1554
lines changed

Documentation/block/bfq-iosched.txt

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -38,13 +38,13 @@ stack). To give an idea of the limits with BFQ, on slow or average
3838
CPUs, here are, first, the limits of BFQ for three different CPUs, on,
3939
respectively, an average laptop, an old desktop, and a cheap embedded
4040
system, in case full hierarchical support is enabled (i.e.,
41-
CONFIG_BFQ_GROUP_IOSCHED is set), but CONFIG_DEBUG_BLK_CGROUP is not
41+
CONFIG_BFQ_GROUP_IOSCHED is set), but CONFIG_BFQ_CGROUP_DEBUG is not
4242
set (Section 4-2):
4343
- Intel i7-4850HQ: 400 KIOPS
4444
- AMD A8-3850: 250 KIOPS
4545
- ARM CortexTM-A53 Octa-core: 80 KIOPS
4646

47-
If CONFIG_DEBUG_BLK_CGROUP is set (and of course full hierarchical
47+
If CONFIG_BFQ_CGROUP_DEBUG is set (and of course full hierarchical
4848
support is enabled), then the sustainable throughput with BFQ
4949
decreases, because all blkio.bfq* statistics are created and updated
5050
(Section 4-2). For BFQ, this leads to the following maximum
@@ -537,19 +537,19 @@ or io.bfq.weight.
537537

538538
As for cgroups-v1 (blkio controller), the exact set of stat files
539539
created, and kept up-to-date by bfq, depends on whether
540-
CONFIG_DEBUG_BLK_CGROUP is set. If it is set, then bfq creates all
540+
CONFIG_BFQ_CGROUP_DEBUG is set. If it is set, then bfq creates all
541541
the stat files documented in
542542
Documentation/cgroup-v1/blkio-controller.rst. If, instead,
543-
CONFIG_DEBUG_BLK_CGROUP is not set, then bfq creates only the files
543+
CONFIG_BFQ_CGROUP_DEBUG is not set, then bfq creates only the files
544544
blkio.bfq.io_service_bytes
545545
blkio.bfq.io_service_bytes_recursive
546546
blkio.bfq.io_serviced
547547
blkio.bfq.io_serviced_recursive
548548

549-
The value of CONFIG_DEBUG_BLK_CGROUP greatly influences the maximum
549+
The value of CONFIG_BFQ_CGROUP_DEBUG greatly influences the maximum
550550
throughput sustainable with bfq, because updating the blkio.bfq.*
551551
stats is rather costly, especially for some of the stats enabled by
552-
CONFIG_DEBUG_BLK_CGROUP.
552+
CONFIG_BFQ_CGROUP_DEBUG.
553553

554554
Parameters to set
555555
-----------------

Documentation/block/biodoc.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -436,7 +436,6 @@ struct bio {
436436
struct bvec_iter bi_iter; /* current index into bio_vec array */
437437

438438
unsigned int bi_size; /* total size in bytes */
439-
unsigned short bi_phys_segments; /* segments after physaddr coalesce*/
440439
unsigned short bi_hw_segments; /* segments after DMA remapping */
441440
unsigned int bi_max; /* max bio_vecs we can hold
442441
used as index into pool */

Documentation/block/queue-sysfs.txt

Lines changed: 43 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,15 @@ add_random (RW)
1414
This file allows to turn off the disk entropy contribution. Default
1515
value of this file is '1'(on).
1616

17+
chunk_sectors (RO)
18+
------------------
19+
This has different meaning depending on the type of the block device.
20+
For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors
21+
of the RAID volume stripe segment. For a zoned block device, either host-aware
22+
or host-managed, chunk_sectors indicates the size in 512B sectors of the zones
23+
of the device, with the eventual exception of the last zone of the device which
24+
may be smaller.
25+
1726
dax (RO)
1827
--------
1928
This file indicates whether the device supports Direct Access (DAX),
@@ -43,6 +52,16 @@ large discards are issued, setting this value lower will make Linux issue
4352
smaller discards and potentially help reduce latencies induced by large
4453
discard operations.
4554

55+
discard_zeroes_data (RO)
56+
------------------------
57+
Obsolete. Always zero.
58+
59+
fua (RO)
60+
--------
61+
Whether or not the block driver supports the FUA flag for write requests.
62+
FUA stands for Force Unit Access. If the FUA flag is set that means that
63+
write requests must bypass the volatile cache of the storage device.
64+
4665
hw_sector_size (RO)
4766
-------------------
4867
This is the hardware sector size of the device, in bytes.
@@ -83,14 +102,19 @@ logical_block_size (RO)
83102
-----------------------
84103
This is the logical block size of the device, in bytes.
85104

105+
max_discard_segments (RO)
106+
-------------------------
107+
The maximum number of DMA scatter/gather entries in a discard request.
108+
86109
max_hw_sectors_kb (RO)
87110
----------------------
88111
This is the maximum number of kilobytes supported in a single data transfer.
89112

90113
max_integrity_segments (RO)
91114
---------------------------
92-
When read, this file shows the max limit of integrity segments as
93-
set by block layer which a hardware controller can handle.
115+
Maximum number of elements in a DMA scatter/gather list with integrity
116+
data that will be submitted by the block layer core to the associated
117+
block driver.
94118

95119
max_sectors_kb (RW)
96120
-------------------
@@ -100,11 +124,12 @@ size allowed by the hardware.
100124

101125
max_segments (RO)
102126
-----------------
103-
Maximum number of segments of the device.
127+
Maximum number of elements in a DMA scatter/gather list that is submitted
128+
to the associated block driver.
104129

105130
max_segment_size (RO)
106131
---------------------
107-
Maximum segment size of the device.
132+
Maximum size in bytes of a single element in a DMA scatter/gather list.
108133

109134
minimum_io_size (RO)
110135
--------------------
@@ -132,6 +157,12 @@ per-block-cgroup request pool. IOW, if there are N block cgroups,
132157
each request queue may have up to N request pools, each independently
133158
regulated by nr_requests.
134159

160+
nr_zones (RO)
161+
-------------
162+
For zoned block devices (zoned attribute indicating "host-managed" or
163+
"host-aware"), this indicates the total number of zones of the device.
164+
This is always 0 for regular block devices.
165+
135166
optimal_io_size (RO)
136167
--------------------
137168
This is the optimal IO size reported by the device.
@@ -185,8 +216,8 @@ This is the number of bytes the device can write in a single write-same
185216
command. A value of '0' means write-same is not supported by this
186217
device.
187218

188-
wb_lat_usec (RW)
189-
----------------
219+
wbt_lat_usec (RW)
220+
-----------------
190221
If the device is registered for writeback throttling, then this file shows
191222
the target minimum read latency. If this latency is exceeded in a given
192223
window of time (see wb_window_usec), then the writeback throttling will start
@@ -201,6 +232,12 @@ blk-throttle makes decision based on the samplings. Lower time means cgroups
201232
have more smooth throughput, but higher CPU overhead. This exists only when
202233
CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
203234

235+
write_zeroes_max_bytes (RO)
236+
---------------------------
237+
For block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of
238+
bytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES
239+
is not supported.
240+
204241
zoned (RO)
205242
----------
206243
This indicates if the device is a zoned block device and the zone model of the
@@ -213,19 +250,4 @@ devices are described in the ZBC (Zoned Block Commands) and ZAC
213250
do not support zone commands, they will be treated as regular block devices
214251
and zoned will report "none".
215252

216-
nr_zones (RO)
217-
-------------
218-
For zoned block devices (zoned attribute indicating "host-managed" or
219-
"host-aware"), this indicates the total number of zones of the device.
220-
This is always 0 for regular block devices.
221-
222-
chunk_sectors (RO)
223-
------------------
224-
This has different meaning depending on the type of the block device.
225-
For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors
226-
of the RAID volume stripe segment. For a zoned block device, either host-aware
227-
or host-managed, chunk_sectors indicates the size in 512B sectors of the zones
228-
of the device, with the eventual exception of the last zone of the device which
229-
may be smaller.
230-
231253
Jens Axboe <[email protected]>, February 2009

Documentation/cgroup-v1/blkio-controller.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ Various user visible config options
8282
CONFIG_BLK_CGROUP
8383
- Block IO controller.
8484

85-
CONFIG_DEBUG_BLK_CGROUP
85+
CONFIG_BFQ_CGROUP_DEBUG
8686
- Debug help. Right now some additional stats file show up in cgroup
8787
if this option is enabled.
8888

@@ -202,13 +202,13 @@ Proportional weight policy files
202202
write, sync or async.
203203

204204
- blkio.avg_queue_size
205-
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
205+
- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
206206
The average queue size for this cgroup over the entire time of this
207207
cgroup's existence. Queue size samples are taken each time one of the
208208
queues of this cgroup gets a timeslice.
209209

210210
- blkio.group_wait_time
211-
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
211+
- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
212212
This is the amount of time the cgroup had to wait since it became busy
213213
(i.e., went from 0 to 1 request queued) to get a timeslice for one of
214214
its queues. This is different from the io_wait_time which is the
@@ -219,7 +219,7 @@ Proportional weight policy files
219219
got a timeslice and will not include the current delta.
220220

221221
- blkio.empty_time
222-
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
222+
- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
223223
This is the amount of time a cgroup spends without any pending
224224
requests when not being served, i.e., it does not include any time
225225
spent idling for one of the queues of the cgroup. This is in
@@ -228,7 +228,7 @@ Proportional weight policy files
228228
time it had a pending request and will not include the current delta.
229229

230230
- blkio.idle_time
231-
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
231+
- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
232232
This is the amount of time spent by the IO scheduler idling for a
233233
given cgroup in anticipation of a better request than the existing ones
234234
from other queues/cgroups. This is in nanoseconds. If this is read
@@ -237,7 +237,7 @@ Proportional weight policy files
237237
the current delta.
238238

239239
- blkio.dequeue
240-
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y. This
240+
- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This
241241
gives the statistics about how many a times a group was dequeued
242242
from service tree of the device. First two fields specify the major
243243
and minor number of the device and third field specifies the number

Documentation/fault-injection/nvme-fault-injection.txt

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,3 +114,59 @@ R13: ffff88011a3c9680 R14: 0000000000000000 R15: 0000000000000000
114114
cpu_startup_entry+0x6f/0x80
115115
start_secondary+0x187/0x1e0
116116
secondary_startup_64+0xa5/0xb0
117+
118+
Example 3: Inject an error into the 10th admin command
119+
------------------------------------------------------
120+
121+
echo 100 > /sys/kernel/debug/nvme0/fault_inject/probability
122+
echo 10 > /sys/kernel/debug/nvme0/fault_inject/space
123+
echo 1 > /sys/kernel/debug/nvme0/fault_inject/times
124+
nvme reset /dev/nvme0
125+
126+
Expected Result:
127+
128+
After NVMe controller reset, the reinitialization may or may not succeed.
129+
It depends on which admin command is actually forced to fail.
130+
131+
Message from dmesg:
132+
133+
nvme nvme0: resetting controller
134+
FAULT_INJECTION: forcing a failure.
135+
name fault_inject, interval 1, probability 100, space 1, times 1
136+
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0-rc2+ #2
137+
Hardware name: MSI MS-7A45/B150M MORTAR ARCTIC (MS-7A45), BIOS 1.50 04/25/2017
138+
Call Trace:
139+
<IRQ>
140+
dump_stack+0x63/0x85
141+
should_fail+0x14a/0x170
142+
nvme_should_fail+0x38/0x80 [nvme_core]
143+
nvme_irq+0x129/0x280 [nvme]
144+
? blk_mq_end_request+0xb3/0x120
145+
__handle_irq_event_percpu+0x84/0x1a0
146+
handle_irq_event_percpu+0x32/0x80
147+
handle_irq_event+0x3b/0x60
148+
handle_edge_irq+0x7f/0x1a0
149+
handle_irq+0x20/0x30
150+
do_IRQ+0x4e/0xe0
151+
common_interrupt+0xf/0xf
152+
</IRQ>
153+
RIP: 0010:cpuidle_enter_state+0xc5/0x460
154+
Code: ff e8 8f 5f 86 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 69 03 00 00 31 ff e8 62 aa 8c ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 37 03 00 00 4c 8b 45 d0 4c 2b 45 b8 48 ba cf f7 53
155+
RSP: 0018:ffffffff88c03dd0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdc
156+
RAX: ffff9dac25a2ac80 RBX: ffffffff88d53760 RCX: 000000000000001f
157+
RDX: 0000000000000000 RSI: 000000002d958403 RDI: 0000000000000000
158+
RBP: ffffffff88c03e18 R08: fffffff75e35ffb7 R09: 00000a49a56c0b48
159+
R10: ffffffff88c03da0 R11: 0000000000001b0c R12: ffff9dac25a34d00
160+
R13: 0000000000000006 R14: 0000000000000006 R15: ffffffff88d53760
161+
cpuidle_enter+0x2e/0x40
162+
call_cpuidle+0x23/0x40
163+
do_idle+0x201/0x280
164+
cpu_startup_entry+0x1d/0x20
165+
rest_init+0xaa/0xb0
166+
arch_call_rest_init+0xe/0x1b
167+
start_kernel+0x51c/0x53b
168+
x86_64_start_reservations+0x24/0x26
169+
x86_64_start_kernel+0x74/0x77
170+
secondary_startup_64+0xa4/0xb0
171+
nvme nvme0: Could not set queue count (16385)
172+
nvme nvme0: IO queues not created

block/Kconfig.iosched

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,13 @@ config BFQ_GROUP_IOSCHED
3636
Enable hierarchical scheduling in BFQ, using the blkio
3737
(cgroups-v1) or io (cgroups-v2) controller.
3838

39+
config BFQ_CGROUP_DEBUG
40+
bool "BFQ IO controller debugging"
41+
depends on BFQ_GROUP_IOSCHED
42+
---help---
43+
Enable some debugging help. Currently it exports additional stat
44+
files in a cgroup which can be useful for debugging.
45+
3946
endmenu
4047

4148
endif

0 commit comments

Comments
 (0)