@@ -16,14 +16,16 @@ throughput. So, when needed for achieving a lower latency, BFQ builds
16
16
schedules that may lead to a lower throughput. If your main or only
17
17
goal, for a given device, is to achieve the maximum-possible
18
18
throughput at all times, then do switch off all low-latency heuristics
19
- for that device, by setting low_latency to 0. Full details in Section 3.
19
+ for that device, by setting low_latency to 0. See Section 3 for
20
+ details on how to configure BFQ for the desired tradeoff between
21
+ latency and throughput, or on how to maximize throughput.
20
22
21
23
On average CPUs, the current version of BFQ can handle devices
22
24
performing at most ~30K IOPS; at most ~50 KIOPS on faster CPUs. As a
23
25
reference, 30-50 KIOPS correspond to very high bandwidths with
24
26
sequential I/O (e.g., 8-12 GB/s if I/O requests are 256 KB large), and
25
- to 120-200 MB/s with 4KB random I/O. BFQ has not yet been tested on
26
- multi-queue devices.
27
+ to 120-200 MB/s with 4KB random I/O. BFQ is currently being tested on
28
+ multi-queue devices too .
27
29
28
30
The table of contents follow. Impatients can just jump to Section 3.
29
31
@@ -33,7 +35,7 @@ CONTENTS
33
35
1-1 Personal systems
34
36
1-2 Server systems
35
37
2. How does BFQ work?
36
- 3. What are BFQ's tunable ?
38
+ 3. What are BFQ's tunables and how to properly configure BFQ ?
37
39
4. BFQ group scheduling
38
40
4-1 Service guarantees provided
39
41
4-2 Interface
@@ -145,19 +147,28 @@ plus a lot of code, are borrowed from CFQ.
145
147
contrast, BFQ may idle the device for a short time interval,
146
148
giving the process the chance to go on being served if it issues
147
149
a new request in time. Device idling typically boosts the
148
- throughput on rotational devices, if processes do synchronous
149
- and sequential I/O. In addition, under BFQ, device idling is
150
- also instrumental in guaranteeing the desired throughput
151
- fraction to processes issuing sync requests (see the description
152
- of the slice_idle tunable in this document, or [1, 2], for more
153
- details).
150
+ throughput on rotational devices and on non-queueing flash-based
151
+ devices, if processes do synchronous and sequential I/O. In
152
+ addition, under BFQ, device idling is also instrumental in
153
+ guaranteeing the desired throughput fraction to processes
154
+ issuing sync requests (see the description of the slice_idle
155
+ tunable in this document, or [1, 2], for more details).
154
156
155
157
- With respect to idling for service guarantees, if several
156
158
processes are competing for the device at the same time, but
157
- all processes (and groups, after the following commit) have
158
- the same weight, then BFQ guarantees the expected throughput
159
- distribution without ever idling the device. Throughput is
160
- thus as high as possible in this common scenario.
159
+ all processes and groups have the same weight, then BFQ
160
+ guarantees the expected throughput distribution without ever
161
+ idling the device. Throughput is thus as high as possible in
162
+ this common scenario.
163
+
164
+ - On flash-based storage with internal queueing of commands
165
+ (typically NCQ), device idling happens to be always detrimental
166
+ for throughput. So, with these devices, BFQ performs idling
167
+ only when strictly needed for service guarantees, i.e., for
168
+ guaranteeing low latency or fairness. In these cases, overall
169
+ throughput may be sub-optimal. No solution currently exists to
170
+ provide both strong service guarantees and optimal throughput
171
+ on devices with internal queueing.
161
172
162
173
- If low-latency mode is enabled (default configuration), BFQ
163
174
executes some special heuristics to detect interactive and soft
@@ -191,10 +202,7 @@ plus a lot of code, are borrowed from CFQ.
191
202
- Queues are scheduled according to a variant of WF2Q+, named
192
203
B-WF2Q+, and implemented using an augmented rb-tree to preserve an
193
204
O(log N) overall complexity. See [2] for more details. B-WF2Q+ is
194
- also ready for hierarchical scheduling. However, for a cleaner
195
- logical breakdown, the code that enables and completes
196
- hierarchical support is provided in the next commit, which focuses
197
- exactly on this feature.
205
+ also ready for hierarchical scheduling, details in Section 4.
198
206
199
207
- B-WF2Q+ guarantees a tight deviation with respect to an ideal,
200
208
perfectly fair, and smooth service. In particular, B-WF2Q+
@@ -249,13 +257,24 @@ plus a lot of code, are borrowed from CFQ.
249
257
the Idle class, to prevent it from starving.
250
258
251
259
252
- 3. What are BFQ's tunable?
253
- ==========================
260
+ 3. What are BFQ's tunables and how to properly configure BFQ?
261
+ =============================================================
262
+
263
+ Most BFQ tunables affect service guarantees (basically latency and
264
+ fairness) and throughput. For full details on how to choose the
265
+ desired tradeoff between service guarantees and throughput, see the
266
+ parameters slice_idle, strict_guarantees and low_latency. For details
267
+ on how to maximise throughput, see slice_idle, timeout_sync and
268
+ max_budget. The other performance-related parameters have been
269
+ inherited from, and have been preserved mostly for compatibility with
270
+ CFQ. So far, no performance improvement has been reported after
271
+ changing the latter parameters in BFQ.
254
272
255
- The tunables back_seek-max, back_seek_penalty, fifo_expire_async and
256
- fifo_expire_sync below are the same as in CFQ. Their description is
257
- just copied from that for CFQ. Some considerations in the description
258
- of slice_idle are copied from CFQ too.
273
+ In particular, the tunables back_seek-max, back_seek_penalty,
274
+ fifo_expire_async and fifo_expire_sync below are the same as in
275
+ CFQ. Their description is just copied from that for CFQ. Some
276
+ considerations in the description of slice_idle are copied from CFQ
277
+ too.
259
278
260
279
per-process ioprio and weight
261
280
-----------------------------
@@ -285,15 +304,17 @@ number of seeks and see improved throughput.
285
304
286
305
Setting slice_idle to 0 will remove all the idling on queues and one
287
306
should see an overall improved throughput on faster storage devices
288
- like multiple SATA/SAS disks in hardware RAID configuration.
307
+ like multiple SATA/SAS disks in hardware RAID configuration, as well
308
+ as flash-based storage with internal command queueing (and
309
+ parallelism).
289
310
290
311
So depending on storage and workload, it might be useful to set
291
312
slice_idle=0. In general for SATA/SAS disks and software RAID of
292
313
SATA/SAS disks keeping slice_idle enabled should be useful. For any
293
314
configurations where there are multiple spindles behind single LUN
294
- (Host based hardware RAID controller or for storage arrays), setting
295
- slice_idle=0 might end up in better throughput and acceptable
296
- latencies.
315
+ (Host based hardware RAID controller or for storage arrays), or with
316
+ flash-based fast storage, setting slice_idle=0 might end up in better
317
+ throughput and acceptable latencies.
297
318
298
319
Idling is however necessary to have service guarantees enforced in
299
320
case of differentiated weights or differentiated I/O-request lengths.
@@ -312,13 +333,14 @@ There is an important flipside for idling: apart from the above cases
312
333
where it is beneficial also for throughput, idling can severely impact
313
334
throughput. One important case is random workload. Because of this
314
335
issue, BFQ tends to avoid idling as much as possible, when it is not
315
- beneficial also for throughput. As a consequence of this behavior, and
316
- of further issues described for the strict_guarantees tunable,
317
- short-term service guarantees may be occasionally violated. And, in
318
- some cases, these guarantees may be more important than guaranteeing
319
- maximum throughput. For example, in video playing/streaming, a very
320
- low drop rate may be more important than maximum throughput. In these
321
- cases, consider setting the strict_guarantees parameter.
336
+ beneficial also for throughput (as detailed in Section 2). As a
337
+ consequence of this behavior, and of further issues described for the
338
+ strict_guarantees tunable, short-term service guarantees may be
339
+ occasionally violated. And, in some cases, these guarantees may be
340
+ more important than guaranteeing maximum throughput. For example, in
341
+ video playing/streaming, a very low drop rate may be more important
342
+ than maximum throughput. In these cases, consider setting the
343
+ strict_guarantees parameter.
322
344
323
345
strict_guarantees
324
346
-----------------
@@ -420,58 +442,20 @@ The default value is 0, which enables auto-tuning: BFQ sets max_budget
420
442
to the maximum number of sectors that can be served during
421
443
timeout_sync, according to the estimated peak rate.
422
444
445
+ For specific devices, some users have occasionally reported to have
446
+ reached a higher throughput by setting max_budget explicitly, i.e., by
447
+ setting max_budget to a higher value than 0. In particular, they have
448
+ set max_budget to higher values than those to which BFQ would have set
449
+ it with auto-tuning. An alternative way to achieve this goal is to
450
+ just increase the value of timeout_sync, leaving max_budget equal to 0.
451
+
423
452
weights
424
453
-------
425
454
426
455
Read-only parameter, used to show the weights of the currently active
427
456
BFQ queues.
428
457
429
458
430
- wr_ tunables
431
- ------------
432
-
433
- BFQ exports a few parameters to control/tune the behavior of
434
- low-latency heuristics.
435
-
436
- wr_coeff
437
-
438
- Factor by which the weight of a weight-raised queue is multiplied. If
439
- the queue is deemed soft real-time, then the weight is further
440
- multiplied by an additional, constant factor.
441
-
442
- wr_max_time
443
-
444
- Maximum duration of a weight-raising period for an interactive task
445
- (ms). If set to zero (default value), then this value is computed
446
- automatically, as a function of the peak rate of the device. In any
447
- case, when the value of this parameter is read, it always reports the
448
- current duration, regardless of whether it has been set manually or
449
- computed automatically.
450
-
451
- wr_max_softrt_rate
452
-
453
- Maximum service rate below which a queue is deemed to be associated
454
- with a soft real-time application, and is then weight-raised
455
- accordingly (sectors/sec).
456
-
457
- wr_min_idle_time
458
-
459
- Minimum idle period after which interactive weight-raising may be
460
- reactivated for a queue (in ms).
461
-
462
- wr_rt_max_time
463
-
464
- Maximum weight-raising duration for soft real-time queues (in ms). The
465
- start time from which this duration is considered is automatically
466
- moved forward if the queue is detected to be still soft real-time
467
- before the current soft real-time weight-raising period finishes.
468
-
469
- wr_min_inter_arr_async
470
-
471
- Minimum period between I/O request arrivals after which weight-raising
472
- may be reactivated for an already busy async queue (in ms).
473
-
474
-
475
459
4. Group scheduling with BFQ
476
460
============================
477
461
0 commit comments