@@ -17,12 +17,14 @@ MBA (Memory Bandwidth Allocation) - "mba"
17
17
18
18
To use the feature mount the file system:
19
19
20
- # mount -t resctrl resctrl [-o cdp[,cdpl2]] /sys/fs/resctrl
20
+ # mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps] ] /sys/fs/resctrl
21
21
22
22
mount options are:
23
23
24
24
"cdp": Enable code/data prioritization in L3 cache allocations.
25
25
"cdpl2": Enable code/data prioritization in L2 cache allocations.
26
+ "mba_MBps": Enable the MBA Software Controller(mba_sc) to specify MBA
27
+ bandwidth in MBps
26
28
27
29
L2 and L3 CDP are controlled seperately.
28
30
@@ -270,10 +272,11 @@ and 0xA are not. On a system with a 20-bit mask each bit represents 5%
270
272
of the capacity of the cache. You could partition the cache into four
271
273
equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
272
274
273
- Memory bandwidth(b/w) percentage
274
- --------------------------------
275
- For Memory b/w resource, user controls the resource by indicating the
276
- percentage of total memory b/w.
275
+ Memory bandwidth Allocation and monitoring
276
+ ------------------------------------------
277
+
278
+ For Memory bandwidth resource, by default the user controls the resource
279
+ by indicating the percentage of total memory bandwidth.
277
280
278
281
The minimum bandwidth percentage value for each cpu model is predefined
279
282
and can be looked up through "info/MB/min_bandwidth". The bandwidth
@@ -285,7 +288,47 @@ to the next control step available on the hardware.
285
288
The bandwidth throttling is a core specific mechanism on some of Intel
286
289
SKUs. Using a high bandwidth and a low bandwidth setting on two threads
287
290
sharing a core will result in both threads being throttled to use the
288
- low bandwidth.
291
+ low bandwidth. The fact that Memory bandwidth allocation(MBA) is a core
292
+ specific mechanism where as memory bandwidth monitoring(MBM) is done at
293
+ the package level may lead to confusion when users try to apply control
294
+ via the MBA and then monitor the bandwidth to see if the controls are
295
+ effective. Below are such scenarios:
296
+
297
+ 1. User may *not* see increase in actual bandwidth when percentage
298
+ values are increased:
299
+
300
+ This can occur when aggregate L2 external bandwidth is more than L3
301
+ external bandwidth. Consider an SKL SKU with 24 cores on a package and
302
+ where L2 external is 10GBps (hence aggregate L2 external bandwidth is
303
+ 240GBps) and L3 external bandwidth is 100GBps. Now a workload with '20
304
+ threads, having 50% bandwidth, each consuming 5GBps' consumes the max L3
305
+ bandwidth of 100GBps although the percentage value specified is only 50%
306
+ << 100%. Hence increasing the bandwidth percentage will not yeild any
307
+ more bandwidth. This is because although the L2 external bandwidth still
308
+ has capacity, the L3 external bandwidth is fully used. Also note that
309
+ this would be dependent on number of cores the benchmark is run on.
310
+
311
+ 2. Same bandwidth percentage may mean different actual bandwidth
312
+ depending on # of threads:
313
+
314
+ For the same SKU in #1, a 'single thread, with 10% bandwidth' and '4
315
+ thread, with 10% bandwidth' can consume upto 10GBps and 40GBps although
316
+ they have same percentage bandwidth of 10%. This is simply because as
317
+ threads start using more cores in an rdtgroup, the actual bandwidth may
318
+ increase or vary although user specified bandwidth percentage is same.
319
+
320
+ In order to mitigate this and make the interface more user friendly,
321
+ resctrl added support for specifying the bandwidth in MBps as well. The
322
+ kernel underneath would use a software feedback mechanism or a "Software
323
+ Controller(mba_sc)" which reads the actual bandwidth using MBM counters
324
+ and adjust the memowy bandwidth percentages to ensure
325
+
326
+ "actual bandwidth < user specified bandwidth".
327
+
328
+ By default, the schemata would take the bandwidth percentage values
329
+ where as user can switch to the "MBA software controller" mode using
330
+ a mount option 'mba_MBps'. The schemata format is specified in the below
331
+ sections.
289
332
290
333
L3 schemata file details (code and data prioritization disabled)
291
334
----------------------------------------------------------------
@@ -308,13 +351,20 @@ schemata format is always:
308
351
309
352
L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
310
353
311
- Memory b/w Allocation details
312
- -----------------------------
354
+ Memory bandwidth Allocation (default mode)
355
+ ------------------------------------------
313
356
314
357
Memory b/w domain is L3 cache.
315
358
316
359
MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
317
360
361
+ Memory bandwidth Allocation specified in MBps
362
+ ---------------------------------------------
363
+
364
+ Memory bandwidth domain is L3 cache.
365
+
366
+ MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
367
+
318
368
Reading/writing the schemata file
319
369
---------------------------------
320
370
Reading the schemata file will show the state of all resources
@@ -358,6 +408,15 @@ allocations can overlap or not. The allocations specifies the maximum
358
408
b/w that the group may be able to use and the system admin can configure
359
409
the b/w accordingly.
360
410
411
+ If the MBA is specified in MB(megabytes) then user can enter the max b/w in MB
412
+ rather than the percentage values.
413
+
414
+ # echo "L3:0=3;1=c\nMB:0=1024;1=500" > /sys/fs/resctrl/p0/schemata
415
+ # echo "L3:0=3;1=3\nMB:0=1024;1=500" > /sys/fs/resctrl/p1/schemata
416
+
417
+ In the above example the tasks in "p1" and "p0" on socket 0 would use a max b/w
418
+ of 1024MB where as on socket 1 they would use 500MB.
419
+
361
420
Example 2
362
421
---------
363
422
Again two sockets, but this time with a more realistic 20-bit mask.
0 commit comments