Skip to content

Commit d6c64a4

Browse files
Vikas ShivappaKAGA-KOKO
authored andcommitted
x86/intel_rdt/mba_sc: Documentation for MBA software controller(mba_sc)
Add documentation about the feedback loop mechanism (MBA software controller) which lets the user specify the memory bandwidth allocation in MBps. This includes some changes to "schemata" formati with examples. Signed-off-by: Vikas Shivappa <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
1 parent 73fcb1a commit d6c64a4

File tree

1 file changed

+67
-8
lines changed

1 file changed

+67
-8
lines changed

Documentation/x86/intel_rdt_ui.txt

Lines changed: 67 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -17,12 +17,14 @@ MBA (Memory Bandwidth Allocation) - "mba"
1717

1818
To use the feature mount the file system:
1919

20-
# mount -t resctrl resctrl [-o cdp[,cdpl2]] /sys/fs/resctrl
20+
# mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps]] /sys/fs/resctrl
2121

2222
mount options are:
2323

2424
"cdp": Enable code/data prioritization in L3 cache allocations.
2525
"cdpl2": Enable code/data prioritization in L2 cache allocations.
26+
"mba_MBps": Enable the MBA Software Controller(mba_sc) to specify MBA
27+
bandwidth in MBps
2628

2729
L2 and L3 CDP are controlled seperately.
2830

@@ -270,10 +272,11 @@ and 0xA are not. On a system with a 20-bit mask each bit represents 5%
270272
of the capacity of the cache. You could partition the cache into four
271273
equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
272274

273-
Memory bandwidth(b/w) percentage
274-
--------------------------------
275-
For Memory b/w resource, user controls the resource by indicating the
276-
percentage of total memory b/w.
275+
Memory bandwidth Allocation and monitoring
276+
------------------------------------------
277+
278+
For Memory bandwidth resource, by default the user controls the resource
279+
by indicating the percentage of total memory bandwidth.
277280

278281
The minimum bandwidth percentage value for each cpu model is predefined
279282
and can be looked up through "info/MB/min_bandwidth". The bandwidth
@@ -285,7 +288,47 @@ to the next control step available on the hardware.
285288
The bandwidth throttling is a core specific mechanism on some of Intel
286289
SKUs. Using a high bandwidth and a low bandwidth setting on two threads
287290
sharing a core will result in both threads being throttled to use the
288-
low bandwidth.
291+
low bandwidth. The fact that Memory bandwidth allocation(MBA) is a core
292+
specific mechanism where as memory bandwidth monitoring(MBM) is done at
293+
the package level may lead to confusion when users try to apply control
294+
via the MBA and then monitor the bandwidth to see if the controls are
295+
effective. Below are such scenarios:
296+
297+
1. User may *not* see increase in actual bandwidth when percentage
298+
values are increased:
299+
300+
This can occur when aggregate L2 external bandwidth is more than L3
301+
external bandwidth. Consider an SKL SKU with 24 cores on a package and
302+
where L2 external is 10GBps (hence aggregate L2 external bandwidth is
303+
240GBps) and L3 external bandwidth is 100GBps. Now a workload with '20
304+
threads, having 50% bandwidth, each consuming 5GBps' consumes the max L3
305+
bandwidth of 100GBps although the percentage value specified is only 50%
306+
<< 100%. Hence increasing the bandwidth percentage will not yeild any
307+
more bandwidth. This is because although the L2 external bandwidth still
308+
has capacity, the L3 external bandwidth is fully used. Also note that
309+
this would be dependent on number of cores the benchmark is run on.
310+
311+
2. Same bandwidth percentage may mean different actual bandwidth
312+
depending on # of threads:
313+
314+
For the same SKU in #1, a 'single thread, with 10% bandwidth' and '4
315+
thread, with 10% bandwidth' can consume upto 10GBps and 40GBps although
316+
they have same percentage bandwidth of 10%. This is simply because as
317+
threads start using more cores in an rdtgroup, the actual bandwidth may
318+
increase or vary although user specified bandwidth percentage is same.
319+
320+
In order to mitigate this and make the interface more user friendly,
321+
resctrl added support for specifying the bandwidth in MBps as well. The
322+
kernel underneath would use a software feedback mechanism or a "Software
323+
Controller(mba_sc)" which reads the actual bandwidth using MBM counters
324+
and adjust the memowy bandwidth percentages to ensure
325+
326+
"actual bandwidth < user specified bandwidth".
327+
328+
By default, the schemata would take the bandwidth percentage values
329+
where as user can switch to the "MBA software controller" mode using
330+
a mount option 'mba_MBps'. The schemata format is specified in the below
331+
sections.
289332

290333
L3 schemata file details (code and data prioritization disabled)
291334
----------------------------------------------------------------
@@ -308,13 +351,20 @@ schemata format is always:
308351

309352
L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
310353

311-
Memory b/w Allocation details
312-
-----------------------------
354+
Memory bandwidth Allocation (default mode)
355+
------------------------------------------
313356

314357
Memory b/w domain is L3 cache.
315358

316359
MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
317360

361+
Memory bandwidth Allocation specified in MBps
362+
---------------------------------------------
363+
364+
Memory bandwidth domain is L3 cache.
365+
366+
MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
367+
318368
Reading/writing the schemata file
319369
---------------------------------
320370
Reading the schemata file will show the state of all resources
@@ -358,6 +408,15 @@ allocations can overlap or not. The allocations specifies the maximum
358408
b/w that the group may be able to use and the system admin can configure
359409
the b/w accordingly.
360410

411+
If the MBA is specified in MB(megabytes) then user can enter the max b/w in MB
412+
rather than the percentage values.
413+
414+
# echo "L3:0=3;1=c\nMB:0=1024;1=500" > /sys/fs/resctrl/p0/schemata
415+
# echo "L3:0=3;1=3\nMB:0=1024;1=500" > /sys/fs/resctrl/p1/schemata
416+
417+
In the above example the tasks in "p1" and "p0" on socket 0 would use a max b/w
418+
of 1024MB where as on socket 1 they would use 500MB.
419+
361420
Example 2
362421
---------
363422
Again two sockets, but this time with a more realistic 20-bit mask.

0 commit comments

Comments
 (0)