Skip to content

Commit 8cfd814

Browse files
committed
cgroup: implement cgroup v2 thread support
This patch implements cgroup v2 thread support. The goal of the thread mode is supporting hierarchical accounting and control at thread granularity while staying inside the resource domain model which allows coordination across different resource controllers and handling of anonymous resource consumptions. A cgroup is always created as a domain and can be made threaded by writing to the "cgroup.type" file. When a cgroup becomes threaded, it becomes a member of a threaded subtree which is anchored at the closest ancestor which isn't threaded. The threads of the processes which are in a threaded subtree can be placed anywhere without being restricted by process granularity or no-internal-process constraint. Note that the threads aren't allowed to escape to a different threaded subtree. To be used inside a threaded subtree, a controller should explicitly support threaded mode and be able to handle internal competition in the way which is appropriate for the resource. The root of a threaded subtree, the nearest ancestor which isn't threaded, is called the threaded domain and serves as the resource domain for the whole subtree. This is the last cgroup where domain controllers are operational and where all the domain-level resource consumptions in the subtree are accounted. This allows threaded controllers to operate at thread granularity when requested while staying inside the scope of system-level resource distribution. As the root cgroup is exempt from the no-internal-process constraint, it can serve as both a threaded domain and a parent to normal cgroups, so, unlike non-root cgroups, the root cgroup can have both domain and threaded children. Internally, in a threaded subtree, each css_set has its ->dom_cset pointing to a matching css_set which belongs to the threaded domain. This ensures that thread root level cgroup_subsys_state for all threaded controllers are readily accessible for domain-level operations. This patch enables threaded mode for the pids and perf_events controllers. Neither has to worry about domain-level resource consumptions and it's enough to simply set the flag. For more details on the interface and behavior of the thread mode, please refer to the section 2-2-2 in Documentation/cgroup-v2.txt added by this patch. v5: - Dropped silly no-op ->dom_cgrp init from cgroup_create(). Spotted by Waiman. - Documentation updated as suggested by Waiman. - cgroup.type content slightly reformatted. - Mark the debug controller threaded. v4: - Updated to the general idea of marking specific cgroups domain/threaded as suggested by PeterZ. v3: - Dropped "join" and always make mixed children join the parent's threaded subtree. v2: - After discussions with Waiman, support for mixed thread mode is added. This should address the issue that Peter pointed out where any nesting should be avoided for thread subtrees while coexisting with other domain cgroups. - Enabling / disabling thread mode now piggy backs on the existing control mask update mechanism. - Bug fixes and cleanup. Signed-off-by: Tejun Heo <[email protected]> Cc: Waiman Long <[email protected]> Cc: Peter Zijlstra <[email protected]>
1 parent 450ee0c commit 8cfd814

File tree

8 files changed

+522
-40
lines changed

8 files changed

+522
-40
lines changed

Documentation/cgroup-v2.txt

Lines changed: 170 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,9 @@ v1 is available under Documentation/cgroup-v1/.
1818
1-2. What is cgroup?
1919
2. Basic Operations
2020
2-1. Mounting
21-
2-2. Organizing Processes
21+
2-2. Organizing Processes and Threads
22+
2-2-1. Processes
23+
2-2-2. Threads
2224
2-3. [Un]populated Notification
2325
2-4. Controlling Controllers
2426
2-4-1. Enabling and Disabling
@@ -167,8 +169,11 @@ cgroup v2 currently supports the following mount options.
167169
Delegation section for details.
168170

169171

170-
Organizing Processes
171-
--------------------
172+
Organizing Processes and Threads
173+
--------------------------------
174+
175+
Processes
176+
~~~~~~~~~
172177

173178
Initially, only the root cgroup exists to which all processes belong.
174179
A child cgroup can be created by creating a sub-directory::
@@ -219,6 +224,104 @@ is removed subsequently, " (deleted)" is appended to the path::
219224
0::/test-cgroup/test-cgroup-nested (deleted)
220225

221226

227+
Threads
228+
~~~~~~~
229+
230+
cgroup v2 supports thread granularity for a subset of controllers to
231+
support use cases requiring hierarchical resource distribution across
232+
the threads of a group of processes. By default, all threads of a
233+
process belong to the same cgroup, which also serves as the resource
234+
domain to host resource consumptions which are not specific to a
235+
process or thread. The thread mode allows threads to be spread across
236+
a subtree while still maintaining the common resource domain for them.
237+
238+
Controllers which support thread mode are called threaded controllers.
239+
The ones which don't are called domain controllers.
240+
241+
Marking a cgroup threaded makes it join the resource domain of its
242+
parent as a threaded cgroup. The parent may be another threaded
243+
cgroup whose resource domain is further up in the hierarchy. The root
244+
of a threaded subtree, that is, the nearest ancestor which is not
245+
threaded, is called threaded domain or thread root interchangeably and
246+
serves as the resource domain for the entire subtree.
247+
248+
Inside a threaded subtree, threads of a process can be put in
249+
different cgroups and are not subject to the no internal process
250+
constraint - threaded controllers can be enabled on non-leaf cgroups
251+
whether they have threads in them or not.
252+
253+
As the threaded domain cgroup hosts all the domain resource
254+
consumptions of the subtree, it is considered to have internal
255+
resource consumptions whether there are processes in it or not and
256+
can't have populated child cgroups which aren't threaded. Because the
257+
root cgroup is not subject to no internal process constraint, it can
258+
serve both as a threaded domain and a parent to domain cgroups.
259+
260+
The current operation mode or type of the cgroup is shown in the
261+
"cgroup.type" file which indicates whether the cgroup is a normal
262+
domain, a domain which is serving as the domain of a threaded subtree,
263+
or a threaded cgroup.
264+
265+
On creation, a cgroup is always a domain cgroup and can be made
266+
threaded by writing "threaded" to the "cgroup.type" file. The
267+
operation is single direction::
268+
269+
# echo threaded > cgroup.type
270+
271+
Once threaded, the cgroup can't be made a domain again. To enable the
272+
thread mode, the following conditions must be met.
273+
274+
- As the cgroup will join the parent's resource domain. The parent
275+
must either be a valid (threaded) domain or a threaded cgroup.
276+
277+
- The cgroup must be empty. No enabled controllers, child cgroups or
278+
processes.
279+
280+
Topology-wise, a cgroup can be in an invalid state. Please consider
281+
the following toplogy::
282+
283+
A (threaded domain) - B (threaded) - C (domain, just created)
284+
285+
C is created as a domain but isn't connected to a parent which can
286+
host child domains. C can't be used until it is turned into a
287+
threaded cgroup. "cgroup.type" file will report "domain (invalid)" in
288+
these cases. Operations which fail due to invalid topology use
289+
EOPNOTSUPP as the errno.
290+
291+
A domain cgroup is turned into a threaded domain when one of its child
292+
cgroup becomes threaded or threaded controllers are enabled in the
293+
"cgroup.subtree_control" file while there are processes in the cgroup.
294+
A threaded domain reverts to a normal domain when the conditions
295+
clear.
296+
297+
When read, "cgroup.threads" contains the list of the thread IDs of all
298+
threads in the cgroup. Except that the operations are per-thread
299+
instead of per-process, "cgroup.threads" has the same format and
300+
behaves the same way as "cgroup.procs". While "cgroup.threads" can be
301+
written to in any cgroup, as it can only move threads inside the same
302+
threaded domain, its operations are confined inside each threaded
303+
subtree.
304+
305+
The threaded domain cgroup serves as the resource domain for the whole
306+
subtree, and, while the threads can be scattered across the subtree,
307+
all the processes are considered to be in the threaded domain cgroup.
308+
"cgroup.procs" in a threaded domain cgroup contains the PIDs of all
309+
processes in the subtree and is not readable in the subtree proper.
310+
However, "cgroup.procs" can be written to from anywhere in the subtree
311+
to migrate all threads of the matching process to the cgroup.
312+
313+
Only threaded controllers can be enabled in a threaded subtree. When
314+
a threaded controller is enabled inside a threaded subtree, it only
315+
accounts for and controls resource consumptions associated with the
316+
threads in the cgroup and its descendants. All consumptions which
317+
aren't tied to a specific thread belong to the threaded domain cgroup.
318+
319+
Because a threaded subtree is exempt from no internal process
320+
constraint, a threaded controller must be able to handle competition
321+
between threads in a non-leaf cgroup and its child cgroups. Each
322+
threaded controller defines how such competitions are handled.
323+
324+
222325
[Un]populated Notification
223326
--------------------------
224327

@@ -302,15 +405,15 @@ disabled if one or more children have it enabled.
302405
No Internal Process Constraint
303406
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
304407

305-
Non-root cgroups can only distribute resources to their children when
306-
they don't have any processes of their own. In other words, only
307-
cgroups which don't contain any processes can have controllers enabled
308-
in their "cgroup.subtree_control" files.
408+
Non-root cgroups can distribute domain resources to their children
409+
only when they don't have any processes of their own. In other words,
410+
only domain cgroups which don't contain any processes can have domain
411+
controllers enabled in their "cgroup.subtree_control" files.
309412

310-
This guarantees that, when a controller is looking at the part of the
311-
hierarchy which has it enabled, processes are always only on the
312-
leaves. This rules out situations where child cgroups compete against
313-
internal processes of the parent.
413+
This guarantees that, when a domain controller is looking at the part
414+
of the hierarchy which has it enabled, processes are always only on
415+
the leaves. This rules out situations where child cgroups compete
416+
against internal processes of the parent.
314417

315418
The root cgroup is exempt from this restriction. Root contains
316419
processes and anonymous resource consumption which can't be associated
@@ -334,10 +437,10 @@ Model of Delegation
334437
~~~~~~~~~~~~~~~~~~~
335438

336439
A cgroup can be delegated in two ways. First, to a less privileged
337-
user by granting write access of the directory and its "cgroup.procs"
338-
and "cgroup.subtree_control" files to the user. Second, if the
339-
"nsdelegate" mount option is set, automatically to a cgroup namespace
340-
on namespace creation.
440+
user by granting write access of the directory and its "cgroup.procs",
441+
"cgroup.threads" and "cgroup.subtree_control" files to the user.
442+
Second, if the "nsdelegate" mount option is set, automatically to a
443+
cgroup namespace on namespace creation.
341444

342445
Because the resource control interface files in a given directory
343446
control the distribution of the parent's resources, the delegatee
@@ -644,6 +747,29 @@ Core Interface Files
644747

645748
All cgroup core files are prefixed with "cgroup."
646749

750+
cgroup.type
751+
752+
A read-write single value file which exists on non-root
753+
cgroups.
754+
755+
When read, it indicates the current type of the cgroup, which
756+
can be one of the following values.
757+
758+
- "domain" : A normal valid domain cgroup.
759+
760+
- "domain threaded" : A threaded domain cgroup which is
761+
serving as the root of a threaded subtree.
762+
763+
- "domain invalid" : A cgroup which is in an invalid state.
764+
It can't be populated or have controllers enabled. It may
765+
be allowed to become a threaded cgroup.
766+
767+
- "threaded" : A threaded cgroup which is a member of a
768+
threaded subtree.
769+
770+
A cgroup can be turned into a threaded cgroup by writing
771+
"threaded" to this file.
772+
647773
cgroup.procs
648774
A read-write new-line separated values file which exists on
649775
all cgroups.
@@ -666,6 +792,35 @@ All cgroup core files are prefixed with "cgroup."
666792
When delegating a sub-hierarchy, write access to this file
667793
should be granted along with the containing directory.
668794

795+
In a threaded cgroup, reading this file fails with EOPNOTSUPP
796+
as all the processes belong to the thread root. Writing is
797+
supported and moves every thread of the process to the cgroup.
798+
799+
cgroup.threads
800+
A read-write new-line separated values file which exists on
801+
all cgroups.
802+
803+
When read, it lists the TIDs of all threads which belong to
804+
the cgroup one-per-line. The TIDs are not ordered and the
805+
same TID may show up more than once if the thread got moved to
806+
another cgroup and then back or the TID got recycled while
807+
reading.
808+
809+
A TID can be written to migrate the thread associated with the
810+
TID to the cgroup. The writer should match all of the
811+
following conditions.
812+
813+
- It must have write access to the "cgroup.threads" file.
814+
815+
- The cgroup that the thread is currently in must be in the
816+
same resource domain as the destination cgroup.
817+
818+
- It must have write access to the "cgroup.procs" file of the
819+
common ancestor of the source and destination cgroups.
820+
821+
When delegating a sub-hierarchy, write access to this file
822+
should be granted along with the containing directory.
823+
669824
cgroup.controllers
670825
A read-only space separated values file which exists on all
671826
cgroups.

include/linux/cgroup-defs.h

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -521,6 +521,18 @@ struct cgroup_subsys {
521521
*/
522522
bool implicit_on_dfl:1;
523523

524+
/*
525+
* If %true, the controller, supports threaded mode on the default
526+
* hierarchy. In a threaded subtree, both process granularity and
527+
* no-internal-process constraint are ignored and a threaded
528+
* controllers should be able to handle that.
529+
*
530+
* Note that as an implicit controller is automatically enabled on
531+
* all cgroups on the default hierarchy, it should also be
532+
* threaded. implicit && !threaded is not supported.
533+
*/
534+
bool threaded:1;
535+
524536
/*
525537
* If %false, this subsystem is properly hierarchical -
526538
* configuration, resource accounting and restriction on a parent

kernel/cgroup/cgroup-internal.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,7 @@ struct dentry *cgroup_do_mount(struct file_system_type *fs_type, int flags,
170170
struct cgroup_root *root, unsigned long magic,
171171
struct cgroup_namespace *ns);
172172

173-
bool cgroup_may_migrate_to(struct cgroup *dst_cgrp);
173+
int cgroup_migrate_vet_dst(struct cgroup *dst_cgrp);
174174
void cgroup_migrate_finish(struct cgroup_mgctx *mgctx);
175175
void cgroup_migrate_add_src(struct css_set *src_cset, struct cgroup *dst_cgrp,
176176
struct cgroup_mgctx *mgctx);

kernel/cgroup/cgroup-v1.c

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,8 +99,9 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
9999
if (cgroup_on_dfl(to))
100100
return -EINVAL;
101101

102-
if (!cgroup_may_migrate_to(to))
103-
return -EBUSY;
102+
ret = cgroup_migrate_vet_dst(to);
103+
if (ret)
104+
return ret;
104105

105106
mutex_lock(&cgroup_mutex);
106107

0 commit comments

Comments
 (0)