Skip to content

Commit 34a9304

Browse files
committed
Merge branch 'for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo: - cgroup v2 interface is now official. It's no longer hidden behind a devel flag and can be mounted using the new cgroup2 fs type. Unfortunately, cpu v2 interface hasn't made it yet due to the discussion around in-process hierarchical resource distribution and only memory and io controllers can be used on the v2 interface at the moment. - The existing documentation which has always been a bit of mess is relocated under Documentation/cgroup-v1/. Documentation/cgroup-v2.txt is added as the authoritative documentation for the v2 interface. - Some features are added through for-4.5-ancestor-test branch to enable netfilter xt_cgroup match to use cgroup v2 paths. The actual netfilter changes will be merged through the net tree which pulled in the said branch. - Various cleanups * 'for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: rename cgroup documentations cgroup: fix a typo. cgroup: Remove resource_counter.txt in Documentation/cgroup-legacy/00-INDEX. cgroup: demote subsystem init messages to KERN_DEBUG cgroup: Fix uninitialized variable warning cgroup: put controller Kconfig options in meaningful order cgroup: clean up the kernel configuration menu nomenclature cgroup_pids: fix a typo. Subject: cgroup: Fix incomplete dd command in blkio documentation cgroup: kill cgrp_ss_priv[CGROUP_CANFORK_COUNT] and friends cpuset: Replace all instances of time_t with time64_t cgroup: replace unified-hierarchy.txt with a proper cgroup v2 documentation cgroup: rename Documentation/cgroups/ to Documentation/cgroup-legacy/ cgroup: replace __DEVEL__sane_behavior with cgroup2 fs type
2 parents aee3bfa + 6255c46 commit 34a9304

27 files changed

+1467
-961
lines changed

Documentation/cgroups/00-INDEX renamed to Documentation/cgroup-v1/00-INDEX

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,5 @@ net_prio.txt
2424
- Network priority cgroups details and usages.
2525
pids.txt
2626
- Process number cgroups details and usages.
27-
resource_counter.txt
28-
- Resource Counter API.
2927
unified-hierarchy.txt
3028
- Description the new/next cgroup interface.

Documentation/cgroups/blkio-controller.txt renamed to Documentation/cgroup-v1/blkio-controller.txt

Lines changed: 1 addition & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -84,8 +84,7 @@ Throttling/Upper Limit policy
8484

8585
- Run dd to read a file and see if rate is throttled to 1MB/s or not.
8686

87-
# dd if=/mnt/common/zerofile of=/dev/null bs=4K count=1024
88-
# iflag=direct
87+
# dd iflag=direct if=/mnt/common/zerofile of=/dev/null bs=4K count=1024
8988
1024+0 records in
9089
1024+0 records out
9190
4194304 bytes (4.2 MB) copied, 4.0001 s, 1.0 MB/s
@@ -374,82 +373,3 @@ One can experience an overall throughput drop if you have created multiple
374373
groups and put applications in that group which are not driving enough
375374
IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle
376375
on individual groups and throughput should improve.
377-
378-
Writeback
379-
=========
380-
381-
Page cache is dirtied through buffered writes and shared mmaps and
382-
written asynchronously to the backing filesystem by the writeback
383-
mechanism. Writeback sits between the memory and IO domains and
384-
regulates the proportion of dirty memory by balancing dirtying and
385-
write IOs.
386-
387-
On traditional cgroup hierarchies, relationships between different
388-
controllers cannot be established making it impossible for writeback
389-
to operate accounting for cgroup resource restrictions and all
390-
writeback IOs are attributed to the root cgroup.
391-
392-
If both the blkio and memory controllers are used on the v2 hierarchy
393-
and the filesystem supports cgroup writeback, writeback operations
394-
correctly follow the resource restrictions imposed by both memory and
395-
blkio controllers.
396-
397-
Writeback examines both system-wide and per-cgroup dirty memory status
398-
and enforces the more restrictive of the two. Also, writeback control
399-
parameters which are absolute values - vm.dirty_bytes and
400-
vm.dirty_background_bytes - are distributed across cgroups according
401-
to their current writeback bandwidth.
402-
403-
There's a peculiarity stemming from the discrepancy in ownership
404-
granularity between memory controller and writeback. While memory
405-
controller tracks ownership per page, writeback operates on inode
406-
basis. cgroup writeback bridges the gap by tracking ownership by
407-
inode but migrating ownership if too many foreign pages, pages which
408-
don't match the current inode ownership, have been encountered while
409-
writing back the inode.
410-
411-
This is a conscious design choice as writeback operations are
412-
inherently tied to inodes making strictly following page ownership
413-
complicated and inefficient. The only use case which suffers from
414-
this compromise is multiple cgroups concurrently dirtying disjoint
415-
regions of the same inode, which is an unlikely use case and decided
416-
to be unsupported. Note that as memory controller assigns page
417-
ownership on the first use and doesn't update it until the page is
418-
released, even if cgroup writeback strictly follows page ownership,
419-
multiple cgroups dirtying overlapping areas wouldn't work as expected.
420-
In general, write-sharing an inode across multiple cgroups is not well
421-
supported.
422-
423-
Filesystem support for cgroup writeback
424-
---------------------------------------
425-
426-
A filesystem can make writeback IOs cgroup-aware by updating
427-
address_space_operations->writepage[s]() to annotate bio's using the
428-
following two functions.
429-
430-
* wbc_init_bio(@wbc, @bio)
431-
432-
Should be called for each bio carrying writeback data and associates
433-
the bio with the inode's owner cgroup. Can be called anytime
434-
between bio allocation and submission.
435-
436-
* wbc_account_io(@wbc, @page, @bytes)
437-
438-
Should be called for each data segment being written out. While
439-
this function doesn't care exactly when it's called during the
440-
writeback session, it's the easiest and most natural to call it as
441-
data segments are added to a bio.
442-
443-
With writeback bio's annotated, cgroup support can be enabled per
444-
super_block by setting MS_CGROUPWB in ->s_flags. This allows for
445-
selective disabling of cgroup writeback support which is helpful when
446-
certain filesystem features, e.g. journaled data mode, are
447-
incompatible.
448-
449-
wbc_init_bio() binds the specified bio to its cgroup. Depending on
450-
the configuration, the bio may be executed at a lower priority and if
451-
the writeback session is holding shared resources, e.g. a journal
452-
entry, may lead to priority inversion. There is no one easy solution
453-
for the problem. Filesystems can try to work around specific problem
454-
cases by skipping wbc_init_bio() or using bio_associate_blkcg()
455-
directly.
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)