Skip to content

Commit e6b324f

Browse files
committed
Merge tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton: "Mainly MM singleton fixes. And a couple of ocfs2 regression fixes" * tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: kcov: don't lose track of remote references during softirqs mm: shmem: fix getting incorrect lruvec when replacing a shmem folio mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick mm: fix possible OOB in numa_rebuild_large_mapping() mm/migrate: fix kernel BUG at mm/compaction.c:2761! selftests: mm: make map_fixed_noreplace test names stable mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC mm: mmap: allow for the maximum number of bits for randomizing mmap_base by default gcov: add support for GCC 14 zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING mm: huge_memory: fix misused mapping_large_folio_support() for anon folios lib/alloc_tag: fix RCU imbalance in pgalloc_tag_get() lib/alloc_tag: do not register sysctl interface when CONFIG_SYSCTL=n MAINTAINERS: remove Lorenzo as vmalloc reviewer Revert "mm: init_mlocked_on_free_v3" mm/page_table_check: fix crash on ZONE_DEVICE gcc: disable '-Warray-bounds' for gcc-9 ocfs2: fix NULL pointer dereference in ocfs2_abort_trigger() ocfs2: fix NULL pointer dereference in ocfs2_journal_dirty()
2 parents 5cf81d7 + 01c8f98 commit e6b324f

29 files changed

+345
-222
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2192,12 +2192,6 @@
21922192
Format: 0 | 1
21932193
Default set by CONFIG_INIT_ON_FREE_DEFAULT_ON.
21942194

2195-
init_mlocked_on_free= [MM] Fill freed userspace memory with zeroes if
2196-
it was mlock'ed and not explicitly munlock'ed
2197-
afterwards.
2198-
Format: 0 | 1
2199-
Default set by CONFIG_INIT_MLOCKED_ON_FREE_DEFAULT_ON
2200-
22012195
init_pkru= [X86] Specify the default memory protection keys rights
22022196
register contents for all processes. 0x55555554 by
22032197
default (disallow access to all but pkey 0). Can

Documentation/userspace-api/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ Security-related interfaces
3232
seccomp_filter
3333
landlock
3434
lsm
35+
mfd_noexec
3536
spec_ctrl
3637
tee
3738

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
==================================
4+
Introduction of non-executable mfd
5+
==================================
6+
:Author:
7+
Daniel Verkamp <[email protected]>
8+
9+
10+
:Contributor:
11+
Aleksa Sarai <[email protected]>
12+
13+
Since Linux introduced the memfd feature, memfds have always had their
14+
execute bit set, and the memfd_create() syscall doesn't allow setting
15+
it differently.
16+
17+
However, in a secure-by-default system, such as ChromeOS, (where all
18+
executables should come from the rootfs, which is protected by verified
19+
boot), this executable nature of memfd opens a door for NoExec bypass
20+
and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm
21+
process created a memfd to share the content with an external process,
22+
however the memfd is overwritten and used for executing arbitrary code
23+
and root escalation. [2] lists more VRP of this kind.
24+
25+
On the other hand, executable memfd has its legit use: runc uses memfd’s
26+
seal and executable feature to copy the contents of the binary then
27+
execute them. For such a system, we need a solution to differentiate runc's
28+
use of executable memfds and an attacker's [3].
29+
30+
To address those above:
31+
- Let memfd_create() set X bit at creation time.
32+
- Let memfd be sealed for modifying X bit when NX is set.
33+
- Add a new pid namespace sysctl: vm.memfd_noexec to help applications in
34+
migrating and enforcing non-executable MFD.
35+
36+
User API
37+
========
38+
``int memfd_create(const char *name, unsigned int flags)``
39+
40+
``MFD_NOEXEC_SEAL``
41+
When MFD_NOEXEC_SEAL bit is set in the ``flags``, memfd is created
42+
with NX. F_SEAL_EXEC is set and the memfd can't be modified to
43+
add X later. MFD_ALLOW_SEALING is also implied.
44+
This is the most common case for the application to use memfd.
45+
46+
``MFD_EXEC``
47+
When MFD_EXEC bit is set in the ``flags``, memfd is created with X.
48+
49+
Note:
50+
``MFD_NOEXEC_SEAL`` implies ``MFD_ALLOW_SEALING``. In case that
51+
an app doesn't want sealing, it can add F_SEAL_SEAL after creation.
52+
53+
54+
Sysctl:
55+
========
56+
``pid namespaced sysctl vm.memfd_noexec``
57+
58+
The new pid namespaced sysctl vm.memfd_noexec has 3 values:
59+
60+
- 0: MEMFD_NOEXEC_SCOPE_EXEC
61+
memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
62+
MFD_EXEC was set.
63+
64+
- 1: MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL
65+
memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
66+
MFD_NOEXEC_SEAL was set.
67+
68+
- 2: MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED
69+
memfd_create() without MFD_NOEXEC_SEAL will be rejected.
70+
71+
The sysctl allows finer control of memfd_create for old software that
72+
doesn't set the executable bit; for example, a container with
73+
vm.memfd_noexec=1 means the old software will create non-executable memfd
74+
by default while new software can create executable memfd by setting
75+
MFD_EXEC.
76+
77+
The value of vm.memfd_noexec is passed to child namespace at creation
78+
time. In addition, the setting is hierarchical, i.e. during memfd_create,
79+
we will search from current ns to root ns and use the most restrictive
80+
setting.
81+
82+
[1] https://crbug.com/1305267
83+
84+
[2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20memfd%20escalation&can=1
85+
86+
[3] https://lwn.net/Articles/781013/

MAINTAINERS

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23974,7 +23974,6 @@ VMALLOC
2397423974
M: Andrew Morton <[email protected]>
2397523975
R: Uladzislau Rezki <[email protected]>
2397623976
R: Christoph Hellwig <[email protected]>
23977-
R: Lorenzo Stoakes <[email protected]>
2397823977
2397923978
S: Maintained
2398023979
W: http://www.linux-mm.org

arch/Kconfig

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1046,10 +1046,21 @@ config ARCH_MMAP_RND_BITS_MAX
10461046
config ARCH_MMAP_RND_BITS_DEFAULT
10471047
int
10481048

1049+
config FORCE_MAX_MMAP_RND_BITS
1050+
bool "Force maximum number of bits to use for ASLR of mmap base address"
1051+
default y if !64BIT
1052+
help
1053+
ARCH_MMAP_RND_BITS and ARCH_MMAP_RND_COMPAT_BITS represent the number
1054+
of bits to use for ASLR and if no custom value is assigned (EXPERT)
1055+
then the architecture's lower bound (minimum) value is assumed.
1056+
This toggle changes that default assumption to assume the arch upper
1057+
bound (maximum) value instead.
1058+
10491059
config ARCH_MMAP_RND_BITS
10501060
int "Number of bits to use for ASLR of mmap base address" if EXPERT
10511061
range ARCH_MMAP_RND_BITS_MIN ARCH_MMAP_RND_BITS_MAX
10521062
default ARCH_MMAP_RND_BITS_DEFAULT if ARCH_MMAP_RND_BITS_DEFAULT
1063+
default ARCH_MMAP_RND_BITS_MAX if FORCE_MAX_MMAP_RND_BITS
10531064
default ARCH_MMAP_RND_BITS_MIN
10541065
depends on HAVE_ARCH_MMAP_RND_BITS
10551066
help
@@ -1084,6 +1095,7 @@ config ARCH_MMAP_RND_COMPAT_BITS
10841095
int "Number of bits to use for ASLR of mmap base address for compatible applications" if EXPERT
10851096
range ARCH_MMAP_RND_COMPAT_BITS_MIN ARCH_MMAP_RND_COMPAT_BITS_MAX
10861097
default ARCH_MMAP_RND_COMPAT_BITS_DEFAULT if ARCH_MMAP_RND_COMPAT_BITS_DEFAULT
1098+
default ARCH_MMAP_RND_COMPAT_BITS_MAX if FORCE_MAX_MMAP_RND_BITS
10871099
default ARCH_MMAP_RND_COMPAT_BITS_MIN
10881100
depends on HAVE_ARCH_MMAP_RND_COMPAT_BITS
10891101
help

0 commit comments

Comments
 (0)