Skip to content

Commit e03a512

Browse files
npiggintorvalds
authored andcommitted
mm/large system hash: clear hashdist when only one node with memory is booted
CONFIG_NUMA on 64-bit CPUs currently enables hashdist unconditionally even when booting on single node machines. This causes the large system hashes to be allocated with vmalloc, and mapped with small pages. This change clears hashdist if only one node has come up with memory. This results in the important large inode and dentry hashes using memblock allocations. All others are within 4MB size up to about 128GB of RAM, which allows them to be allocated from the linear map on most non-NUMA images. Other big hashes like futex and TCP should eventually be moved over to the same style of allocation as those vfs caches that use HASH_EARLY if !hashdist, so they don't exceed MAX_ORDER on very large non-NUMA images. This brings dTLB misses for linux kernel tree `git diff` from ~45,000 to ~8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off (performance is in the noise, under 1% difference, page tables are likely to be well cached for this workload). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Nicholas Piggin <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent ec11408 commit e03a512

File tree

1 file changed

+18
-13
lines changed

1 file changed

+18
-13
lines changed

mm/page_alloc.c

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7534,10 +7534,28 @@ static int page_alloc_cpu_dead(unsigned int cpu)
75347534
return 0;
75357535
}
75367536

7537+
#ifdef CONFIG_NUMA
7538+
int hashdist = HASHDIST_DEFAULT;
7539+
7540+
static int __init set_hashdist(char *str)
7541+
{
7542+
if (!str)
7543+
return 0;
7544+
hashdist = simple_strtoul(str, &str, 0);
7545+
return 1;
7546+
}
7547+
__setup("hashdist=", set_hashdist);
7548+
#endif
7549+
75377550
void __init page_alloc_init(void)
75387551
{
75397552
int ret;
75407553

7554+
#ifdef CONFIG_NUMA
7555+
if (num_node_state(N_MEMORY) == 1)
7556+
hashdist = 0;
7557+
#endif
7558+
75417559
ret = cpuhp_setup_state_nocalls(CPUHP_PAGE_ALLOC_DEAD,
75427560
"mm/page_alloc:dead", NULL,
75437561
page_alloc_cpu_dead);
@@ -7922,19 +7940,6 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
79227940
return ret;
79237941
}
79247942

7925-
#ifdef CONFIG_NUMA
7926-
int hashdist = HASHDIST_DEFAULT;
7927-
7928-
static int __init set_hashdist(char *str)
7929-
{
7930-
if (!str)
7931-
return 0;
7932-
hashdist = simple_strtoul(str, &str, 0);
7933-
return 1;
7934-
}
7935-
__setup("hashdist=", set_hashdist);
7936-
#endif
7937-
79387943
#ifndef __HAVE_ARCH_RESERVED_KERNEL_PAGES
79397944
/*
79407945
* Returns the number of pages that arch has reserved but

0 commit comments

Comments
 (0)