Skip to content

Commit 8404c9f

Browse files
committed
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton: "The remainder of the main mm/ queue. 143 patches. Subsystems affected by this patch series (all mm): pagecache, hugetlb, userfaultfd, vmscan, compaction, migration, cma, ksm, vmstat, mmap, kconfig, util, memory-hotplug, zswap, zsmalloc, highmem, cleanups, and kfence" * emailed patches from Andrew Morton <[email protected]>: (143 commits) kfence: use power-efficient work queue to run delayed work kfence: maximize allocation wait timeout duration kfence: await for allocation using wait_event kfence: zero guard page after out-of-bounds access mm/process_vm_access.c: remove duplicate include mm/mempool: minor coding style tweaks mm/highmem.c: fix coding style issue btrfs: use memzero_page() instead of open coded kmap pattern iov_iter: lift memzero_page() to highmem.h mm/zsmalloc: use BUG_ON instead of if condition followed by BUG. mm/zswap.c: switch from strlcpy to strscpy arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE mm,memory_hotplug: add kernel boot option to enable memmap_on_memory acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported mm,memory_hotplug: allocate memmap from the added memory range mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count() mm,memory_hotplug: relax fully spanned sections check drivers/base/memory: introduce memory_block_{online,offline} mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove ...
2 parents a79cdfb + 36f0b35 commit 8404c9f

File tree

125 files changed

+3386
-1458
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

125 files changed

+3386
-1458
lines changed
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
What: /sys/kernel/mm/cma/
2+
Date: Feb 2021
3+
Contact: Minchan Kim <[email protected]>
4+
Description:
5+
/sys/kernel/mm/cma/ contains a subdirectory for each CMA
6+
heap name (also sometimes called CMA areas).
7+
8+
Each CMA heap subdirectory (that is, each
9+
/sys/kernel/mm/cma/<cma-heap-name> directory) contains the
10+
following items:
11+
12+
alloc_pages_success
13+
alloc_pages_fail
14+
15+
What: /sys/kernel/mm/cma/<cma-heap-name>/alloc_pages_success
16+
Date: Feb 2021
17+
Contact: Minchan Kim <[email protected]>
18+
Description:
19+
the number of pages CMA API succeeded to allocate
20+
21+
What: /sys/kernel/mm/cma/<cma-heap-name>/alloc_pages_fail
22+
Date: Feb 2021
23+
Contact: Minchan Kim <[email protected]>
24+
Description:
25+
the number of pages CMA API failed to allocate

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2804,6 +2804,23 @@
28042804
seconds. Use this parameter to check at some
28052805
other rate. 0 disables periodic checking.
28062806

2807+
memory_hotplug.memmap_on_memory
2808+
[KNL,X86,ARM] Boolean flag to enable this feature.
2809+
Format: {on | off (default)}
2810+
When enabled, runtime hotplugged memory will
2811+
allocate its internal metadata (struct pages)
2812+
from the hotadded memory which will allow to
2813+
hotadd a lot of memory without requiring
2814+
additional memory to do so.
2815+
This feature is disabled by default because it
2816+
has some implication on large (e.g. GB)
2817+
allocations in some configurations (e.g. small
2818+
memory blocks).
2819+
The state of the flag can be read in
2820+
/sys/module/memory_hotplug/parameters/memmap_on_memory.
2821+
Note that even when enabled, there are a few cases where
2822+
the feature is not effective.
2823+
28072824
memtest= [KNL,X86,ARM,PPC] Enable memtest
28082825
Format: <integer>
28092826
default : 0 <disable>

Documentation/admin-guide/mm/memory-hotplug.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -357,6 +357,15 @@ creates ZONE_MOVABLE as following.
357357
Unfortunately, there is no information to show which memory block belongs
358358
to ZONE_MOVABLE. This is TBD.
359359

360+
.. note::
361+
Techniques that rely on long-term pinnings of memory (especially, RDMA and
362+
vfio) are fundamentally problematic with ZONE_MOVABLE and, therefore, memory
363+
hot remove. Pinned pages cannot reside on ZONE_MOVABLE, to guarantee that
364+
memory can still get hot removed - be aware that pinning can fail even if
365+
there is plenty of free memory in ZONE_MOVABLE. In addition, using
366+
ZONE_MOVABLE might make page pinning more expensive, because pages have to be
367+
migrated off that zone first.
368+
360369
.. _memory_hotplug_how_to_offline_memory:
361370

362371
How to offline memory

Documentation/admin-guide/mm/userfaultfd.rst

Lines changed: 66 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -63,68 +63,93 @@ the generic ioctl available.
6363

6464
The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl
6565
defines what memory types are supported by the ``userfaultfd`` and what
66-
events, except page fault notifications, may be generated.
67-
68-
If the kernel supports registering ``userfaultfd`` ranges on hugetlbfs
69-
virtual memory areas, ``UFFD_FEATURE_MISSING_HUGETLBFS`` will be set in
70-
``uffdio_api.features``. Similarly, ``UFFD_FEATURE_MISSING_SHMEM`` will be
71-
set if the kernel supports registering ``userfaultfd`` ranges on shared
72-
memory (covering all shmem APIs, i.e. tmpfs, ``IPCSHM``, ``/dev/zero``,
73-
``MAP_SHARED``, ``memfd_create``, etc).
74-
75-
The userland application that wants to use ``userfaultfd`` with hugetlbfs
76-
or shared memory need to set the corresponding flag in
77-
``uffdio_api.features`` to enable those features.
78-
79-
If the userland desires to receive notifications for events other than
80-
page faults, it has to verify that ``uffdio_api.features`` has appropriate
81-
``UFFD_FEATURE_EVENT_*`` bits set. These events are described in more
82-
detail below in `Non-cooperative userfaultfd`_ section.
83-
84-
Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should
85-
be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to
86-
register a memory range in the ``userfaultfd`` by setting the
66+
events, except page fault notifications, may be generated:
67+
68+
- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events
69+
other than page faults are supported. These events are described in more
70+
detail below in the `Non-cooperative userfaultfd`_ section.
71+
72+
- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM``
73+
indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING``
74+
registrations for hugetlbfs and shared memory (covering all shmem APIs,
75+
i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``,
76+
etc) virtual memory areas, respectively.
77+
78+
- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports
79+
``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory
80+
areas.
81+
82+
The userland application should set the feature flags it intends to use
83+
when invoking the ``UFFDIO_API`` ioctl, to request that those features be
84+
enabled if supported.
85+
86+
Once the ``userfaultfd`` API has been enabled the ``UFFDIO_REGISTER``
87+
ioctl should be invoked (if present in the returned ``uffdio_api.ioctls``
88+
bitmask) to register a memory range in the ``userfaultfd`` by setting the
8789
uffdio_register structure accordingly. The ``uffdio_register.mode``
8890
bitmask will specify to the kernel which kind of faults to track for
89-
the range (``UFFDIO_REGISTER_MODE_MISSING`` would track missing
90-
pages). The ``UFFDIO_REGISTER`` ioctl will return the
91+
the range. The ``UFFDIO_REGISTER`` ioctl will return the
9192
``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve
9293
userfaults on the range registered. Not all ioctls will necessarily be
93-
supported for all memory types depending on the underlying virtual
94-
memory backend (anonymous memory vs tmpfs vs real filebacked
95-
mappings).
94+
supported for all memory types (e.g. anonymous memory vs. shmem vs.
95+
hugetlbfs), or all types of intercepted faults.
9696

9797
Userland can use the ``uffdio_register.ioctls`` to manage the virtual
9898
address space in the background (to add or potentially also remove
9999
memory from the ``userfaultfd`` registered range). This means a userfault
100100
could be triggering just before userland maps in the background the
101101
user-faulted page.
102102

103-
The primary ioctl to resolve userfaults is ``UFFDIO_COPY``. That
104-
atomically copies a page into the userfault registered range and wakes
105-
up the blocked userfaults
106-
(unless ``uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE`` is set).
107-
Other ioctl works similarly to ``UFFDIO_COPY``. They're atomic as in
108-
guaranteeing that nothing can see an half copied page since it'll
109-
keep userfaulting until the copy has finished.
103+
Resolving Userfaults
104+
--------------------
105+
106+
There are three basic ways to resolve userfaults:
107+
108+
- ``UFFDIO_COPY`` atomically copies some existing page contents from
109+
userspace.
110+
111+
- ``UFFDIO_ZEROPAGE`` atomically zeros the new page.
112+
113+
- ``UFFDIO_CONTINUE`` maps an existing, previously-populated page.
114+
115+
These operations are atomic in the sense that they guarantee nothing can
116+
see a half-populated page, since readers will keep userfaulting until the
117+
operation has finished.
118+
119+
By default, these wake up userfaults blocked on the range in question.
120+
They support a ``UFFDIO_*_MODE_DONTWAKE`` ``mode`` flag, which indicates
121+
that waking will be done separately at some later time.
122+
123+
Which ioctl to choose depends on the kind of page fault, and what we'd
124+
like to do to resolve it:
125+
126+
- For ``UFFDIO_REGISTER_MODE_MISSING`` faults, the fault needs to be
127+
resolved by either providing a new page (``UFFDIO_COPY``), or mapping
128+
the zero page (``UFFDIO_ZEROPAGE``). By default, the kernel would map
129+
the zero page for a missing fault. With userfaultfd, userspace can
130+
decide what content to provide before the faulting thread continues.
131+
132+
- For ``UFFDIO_REGISTER_MODE_MINOR`` faults, there is an existing page (in
133+
the page cache). Userspace has the option of modifying the page's
134+
contents before resolving the fault. Once the contents are correct
135+
(modified or not), userspace asks the kernel to map the page and let the
136+
faulting thread continue with ``UFFDIO_CONTINUE``.
110137

111138
Notes:
112139

113-
- If you requested ``UFFDIO_REGISTER_MODE_MISSING`` when registering then
114-
you must provide some kind of page in your thread after reading from
115-
the uffd. You must provide either ``UFFDIO_COPY`` or ``UFFDIO_ZEROPAGE``.
116-
The normal behavior of the OS automatically providing a zero page on
117-
an anonymous mmaping is not in place.
140+
- You can tell which kind of fault occurred by examining
141+
``pagefault.flags`` within the ``uffd_msg``, checking for the
142+
``UFFD_PAGEFAULT_FLAG_*`` flags.
118143

119144
- None of the page-delivering ioctls default to the range that you
120145
registered with. You must fill in all fields for the appropriate
121146
ioctl struct including the range.
122147

123148
- You get the address of the access that triggered the missing page
124149
event out of a struct uffd_msg that you read in the thread from the
125-
uffd. You can supply as many pages as you want with ``UFFDIO_COPY`` or
126-
``UFFDIO_ZEROPAGE``. Keep in mind that unless you used DONTWAKE then
127-
the first of any of those IOCTLs wakes up the faulting thread.
150+
uffd. You can supply as many pages as you want with these IOCTLs.
151+
Keep in mind that unless you used DONTWAKE then the first of any of
152+
those IOCTLs wakes up the faulting thread.
128153

129154
- Be sure to test for all errors including
130155
(``pollfd[0].revents & POLLERR``). This can happen, e.g. when ranges

arch/arc/Kconfig

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
config ARC
77
def_bool y
88
select ARC_TIMERS
9+
select ARCH_HAS_CACHE_LINE_SIZE
910
select ARCH_HAS_DEBUG_VM_PGTABLE
1011
select ARCH_HAS_DMA_PREP_COHERENT
1112
select ARCH_HAS_PTE_SPECIAL
@@ -28,6 +29,7 @@ config ARC
2829
select GENERIC_SMP_IDLE_THREAD
2930
select HAVE_ARCH_KGDB
3031
select HAVE_ARCH_TRACEHOOK
32+
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if ARC_MMU_V4
3133
select HAVE_DEBUG_STACKOVERFLOW
3234
select HAVE_DEBUG_KMEMLEAK
3335
select HAVE_FUTEX_CMPXCHG if FUTEX
@@ -48,9 +50,6 @@ config ARC
4850
select HAVE_ARCH_JUMP_LABEL if ISA_ARCV2 && !CPU_ENDIAN_BE32
4951
select SET_FS
5052

51-
config ARCH_HAS_CACHE_LINE_SIZE
52-
def_bool y
53-
5453
config TRACE_IRQFLAGS_SUPPORT
5554
def_bool y
5655

@@ -86,10 +85,6 @@ config STACKTRACE_SUPPORT
8685
def_bool y
8786
select STACKTRACE
8887

89-
config HAVE_ARCH_TRANSPARENT_HUGEPAGE
90-
def_bool y
91-
depends on ARC_MMU_V4
92-
9388
menu "ARC Architecture Configuration"
9489

9590
menu "ARC Platform/SoC/Board"

arch/arm/Kconfig

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ config ARM
3131
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
3232
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT if CPU_V7
3333
select ARCH_SUPPORTS_ATOMIC_RMW
34+
select ARCH_SUPPORTS_HUGETLBFS if ARM_LPAE
3435
select ARCH_USE_BUILTIN_BSWAP
3536
select ARCH_USE_CMPXCHG_LOCKREF
3637
select ARCH_USE_MEMTEST
@@ -77,6 +78,7 @@ config ARM
7778
select HAVE_ARCH_SECCOMP_FILTER if AEABI && !OABI_COMPAT
7879
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
7980
select HAVE_ARCH_TRACEHOOK
81+
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if ARM_LPAE
8082
select HAVE_ARM_SMCCC if CPU_V7
8183
select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32
8284
select HAVE_CONTEXT_TRACKING
@@ -1511,14 +1513,6 @@ config HW_PERF_EVENTS
15111513
def_bool y
15121514
depends on ARM_PMU
15131515

1514-
config SYS_SUPPORTS_HUGETLBFS
1515-
def_bool y
1516-
depends on ARM_LPAE
1517-
1518-
config HAVE_ARCH_TRANSPARENT_HUGEPAGE
1519-
def_bool y
1520-
depends on ARM_LPAE
1521-
15221516
config ARCH_WANT_GENERAL_HUGETLB
15231517
def_bool y
15241518

arch/arm64/Kconfig

Lines changed: 9 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,12 @@ config ARM64
1111
select ACPI_PPTT if ACPI
1212
select ARCH_HAS_DEBUG_WX
1313
select ARCH_BINFMT_ELF_STATE
14+
select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION
15+
select ARCH_ENABLE_MEMORY_HOTPLUG
16+
select ARCH_ENABLE_MEMORY_HOTREMOVE
17+
select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
18+
select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
19+
select ARCH_HAS_CACHE_LINE_SIZE
1420
select ARCH_HAS_DEBUG_VIRTUAL
1521
select ARCH_HAS_DEBUG_VM_PGTABLE
1622
select ARCH_HAS_DMA_PREP_COHERENT
@@ -72,6 +78,7 @@ config ARM64
7278
select ARCH_USE_QUEUED_SPINLOCKS
7379
select ARCH_USE_SYM_ANNOTATIONS
7480
select ARCH_SUPPORTS_DEBUG_PAGEALLOC
81+
select ARCH_SUPPORTS_HUGETLBFS
7582
select ARCH_SUPPORTS_MEMORY_FAILURE
7683
select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK
7784
select ARCH_SUPPORTS_LTO_CLANG if CPU_LITTLE_ENDIAN
@@ -213,6 +220,7 @@ config ARM64
213220
select SWIOTLB
214221
select SYSCTL_EXCEPTION_TRACE
215222
select THREAD_INFO_IN_TASK
223+
select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD
216224
help
217225
ARM 64-bit (AArch64) Linux support.
218226

@@ -308,10 +316,7 @@ config ZONE_DMA32
308316
bool "Support DMA32 zone" if EXPERT
309317
default y
310318

311-
config ARCH_ENABLE_MEMORY_HOTPLUG
312-
def_bool y
313-
314-
config ARCH_ENABLE_MEMORY_HOTREMOVE
319+
config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
315320
def_bool y
316321

317322
config SMP
@@ -1070,18 +1075,9 @@ config HW_PERF_EVENTS
10701075
def_bool y
10711076
depends on ARM_PMU
10721077

1073-
config SYS_SUPPORTS_HUGETLBFS
1074-
def_bool y
1075-
1076-
config ARCH_HAS_CACHE_LINE_SIZE
1077-
def_bool y
1078-
10791078
config ARCH_HAS_FILTER_PGPROT
10801079
def_bool y
10811080

1082-
config ARCH_ENABLE_SPLIT_PMD_PTLOCK
1083-
def_bool y if PGTABLE_LEVELS > 2
1084-
10851081
# Supported by clang >= 7.0
10861082
config CC_HAVE_SHADOW_CALL_STACK
10871083
def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18)
@@ -1923,14 +1919,6 @@ config SYSVIPC_COMPAT
19231919
def_bool y
19241920
depends on COMPAT && SYSVIPC
19251921

1926-
config ARCH_ENABLE_HUGEPAGE_MIGRATION
1927-
def_bool y
1928-
depends on HUGETLB_PAGE && MIGRATION
1929-
1930-
config ARCH_ENABLE_THP_MIGRATION
1931-
def_bool y
1932-
depends on TRANSPARENT_HUGEPAGE
1933-
19341922
menu "Power management options"
19351923

19361924
source "kernel/power/Kconfig"

arch/arm64/mm/hugetlbpage.c

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -252,7 +252,7 @@ void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
252252
set_pte(ptep, pte);
253253
}
254254

255-
pte_t *huge_pte_alloc(struct mm_struct *mm,
255+
pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
256256
unsigned long addr, unsigned long sz)
257257
{
258258
pgd_t *pgdp;
@@ -284,9 +284,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
284284
*/
285285
ptep = pte_alloc_map(mm, pmdp, addr);
286286
} else if (sz == PMD_SIZE) {
287-
if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) &&
288-
pud_none(READ_ONCE(*pudp)))
289-
ptep = huge_pmd_share(mm, addr, pudp);
287+
if (want_pmd_share(vma, addr) && pud_none(READ_ONCE(*pudp)))
288+
ptep = huge_pmd_share(mm, vma, addr, pudp);
290289
else
291290
ptep = (pte_t *)pmd_alloc(mm, pudp, addr);
292291
} else if (sz == (CONT_PMD_SIZE)) {

0 commit comments

Comments
 (0)