Skip to content

Commit 05d4532

Browse files
joshuahahnakpm00
authored andcommitted
memcg/hugetlb: add hugeTLB counters to memcg
This patch introduces a new counter to memory.stat that tracks hugeTLB usage, only if hugeTLB accounting is done to memory.current. This feature is enabled the same way hugeTLB accounting is enabled, via the memory_hugetlb_accounting mount flag for cgroupsv2. 1. Why is this patch necessary? Currently, memcg hugeTLB accounting is an opt-in feature [1] that adds hugeTLB usage to memory.current. However, the metric is not reported in memory.stat. Given that users often interpret memory.stat as a breakdown of the value reported in memory.current, the disparity between the two reports can be confusing. This patch solves this problem by including the metric in memory.stat as well, but only if it is also reported in memory.current (it would also be confusing if the value was reported in memory.stat, but not in memory.current) Aside from the consistency between the two files, we also see benefits in observability. Userspace might be interested in the hugeTLB footprint of cgroups for many reasons. For instance, system admins might want to verify that hugeTLB usage is distributed as expected across tasks: i.e. memory-intensive tasks are using more hugeTLB pages than tasks that don't consume a lot of memory, or are seen to fault frequently. Note that this is separate from wanting to inspect the distribution for limiting purposes (in which case, hugeTLB controller makes more sense). 2. We already have a hugeTLB controller. Why not use that? It is true that hugeTLB tracks the exact value that we want. In fact, by enabling the hugeTLB controller, we get all of the observability benefits that I mentioned above, and users can check the total hugeTLB usage, verify if it is distributed as expected, etc. With this said, there are 2 problems: (a) They are still not reported in memory.stat, which means the disparity between the memcg reports are still there. (b) We cannot reasonably expect users to enable the hugeTLB controller just for the sake of hugeTLB usage reporting, especially since they don't have any use for hugeTLB usage enforcing [2]. 3. Implementation Details: In the alloc / free hugetlb functions, we call lruvec_stat_mod_folio regardless of whether memcg accounts hugetlb. mem_cgroup_commit_charge which is called from alloc_hugetlb_folio will set memcg for the folio only if the CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING cgroup mount option is used, so lruvec_stat_mod_folio accounts per-memcg hugetlb counters only if the feature is enabled. Regardless of whether memcg accounts for hugetlb, the newly added global counter is updated and shown in /proc/vmstat. The global counter is added because vmstats is the preferred framework for cgroup stats. It makes stat items consistent between global and cgroups. It also provides a per-node breakdown, which is useful. Because it does not use cgroup-specific hooks, we also keep generic MM code separate from memcg code. [1] https://lore.kernel.org/all/[email protected]/ [2] Of course, we can't make a new patch for every feature that can be duplicated. However, since the existing solution of enabling the hugeTLB controller is an imperfect solution that still leaves a discrepancy between memory.stat and memory.curent, I think that it is reasonable to isolate the feature in this case. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joshua Hahn <[email protected]> Suggested-by: Nhat Pham <[email protected]> Suggested-by: Shakeel Butt <[email protected]> Suggested-by: Johannes Weiner <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Chris Down <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Reviewed-by: Nhat Pham <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Muchun Song <[email protected]> Cc: Zefan Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 2ea80b0 commit 05d4532

File tree

5 files changed

+24
-0
lines changed

5 files changed

+24
-0
lines changed

Documentation/admin-guide/cgroup-v2.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1655,6 +1655,11 @@ The following nested keys are defined.
16551655
pgdemote_khugepaged
16561656
Number of pages demoted by khugepaged.
16571657

1658+
hugetlb
1659+
Amount of memory used by hugetlb pages. This metric only shows
1660+
up if hugetlb usage is accounted for in memory.current (i.e.
1661+
cgroup is mounted with the memory_hugetlb_accounting option).
1662+
16581663
memory.numa_stat
16591664
A read-only nested-keyed file which exists on non-root cgroups.
16601665

include/linux/mmzone.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,9 @@ enum node_stat_item {
220220
PGDEMOTE_KSWAPD,
221221
PGDEMOTE_DIRECT,
222222
PGDEMOTE_KHUGEPAGED,
223+
#ifdef CONFIG_HUGETLB_PAGE
224+
NR_HUGETLB,
225+
#endif
223226
NR_VM_NODE_STAT_ITEMS
224227
};
225228

mm/hugetlb.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1925,6 +1925,7 @@ void free_huge_folio(struct folio *folio)
19251925
pages_per_huge_page(h), folio);
19261926
hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h),
19271927
pages_per_huge_page(h), folio);
1928+
lruvec_stat_mod_folio(folio, NR_HUGETLB, -pages_per_huge_page(h));
19281929
mem_cgroup_uncharge(folio);
19291930
if (restore_reserve)
19301931
h->resv_huge_pages++;
@@ -3093,6 +3094,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
30933094

30943095
if (!memcg_charge_ret)
30953096
mem_cgroup_commit_charge(folio, memcg);
3097+
lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h));
30963098
mem_cgroup_put(memcg);
30973099

30983100
return folio;

mm/memcontrol.c

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -315,6 +315,9 @@ static const unsigned int memcg_node_stat_items[] = {
315315
PGDEMOTE_KSWAPD,
316316
PGDEMOTE_DIRECT,
317317
PGDEMOTE_KHUGEPAGED,
318+
#ifdef CONFIG_HUGETLB_PAGE
319+
NR_HUGETLB,
320+
#endif
318321
};
319322

320323
static const unsigned int memcg_stat_items[] = {
@@ -1366,6 +1369,9 @@ static const struct memory_stat memory_stats[] = {
13661369
{ "unevictable", NR_UNEVICTABLE },
13671370
{ "slab_reclaimable", NR_SLAB_RECLAIMABLE_B },
13681371
{ "slab_unreclaimable", NR_SLAB_UNRECLAIMABLE_B },
1372+
#ifdef CONFIG_HUGETLB_PAGE
1373+
{ "hugetlb", NR_HUGETLB },
1374+
#endif
13691375

13701376
/* The memory events */
13711377
{ "workingset_refault_anon", WORKINGSET_REFAULT_ANON },
@@ -1461,6 +1467,11 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
14611467
for (i = 0; i < ARRAY_SIZE(memory_stats); i++) {
14621468
u64 size;
14631469

1470+
#ifdef CONFIG_HUGETLB_PAGE
1471+
if (unlikely(memory_stats[i].idx == NR_HUGETLB) &&
1472+
!(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING))
1473+
continue;
1474+
#endif
14641475
size = memcg_page_state_output(memcg, memory_stats[i].idx);
14651476
seq_buf_printf(s, "%s %llu\n", memory_stats[i].name, size);
14661477

mm/vmstat.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1273,6 +1273,9 @@ const char * const vmstat_text[] = {
12731273
"pgdemote_kswapd",
12741274
"pgdemote_direct",
12751275
"pgdemote_khugepaged",
1276+
#ifdef CONFIG_HUGETLB_PAGE
1277+
"nr_hugetlb",
1278+
#endif
12761279
/* system-wide enum vm_stat_item counters */
12771280
"nr_dirty_threshold",
12781281
"nr_dirty_background_threshold",

0 commit comments

Comments
 (0)