Skip to content

Commit 3527799

Browse files
committed
Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull spectre/meltdown updates from Thomas Gleixner: "The next round of updates related to melted spectrum: - The initial set of spectre V1 mitigations: - Array index speculation blocker and its usage for syscall, fdtable and the n180211 driver. - Speculation barrier and its usage in user access functions - Make indirect calls in KVM speculation safe - Blacklisting of known to be broken microcodes so IPBP/IBSR are not touched. - The initial IBPB support and its usage in context switch - The exposure of the new speculation MSRs to KVM guests. - A fix for a regression in x86/32 related to the cpu entry area - Proper whitelisting for known to be safe CPUs from the mitigations. - objtool fixes to deal proper with retpolines and alternatives - Exclude __init functions from retpolines which speeds up the boot process. - Removal of the syscall64 fast path and related cleanups and simplifications - Removal of the unpatched paravirt mode which is yet another source of indirect unproteced calls. - A new and undisputed version of the module mismatch warning - A couple of cleanup and correctness fixes all over the place Yet another step towards full mitigation. There are a few things still missing like the RBS underflow mitigation for Skylake and other small details, but that's being worked on. That said, I'm taking a belated christmas vacation for a week and hope that everything is magically solved when I'm back on Feb 12th" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KVM/x86: Add IBPB support KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL x86/pti: Mark constant arrays as __initconst x86/spectre: Simplify spectre_v2 command line parsing x86/retpoline: Avoid retpolines for built-in __init functions x86/kvm: Update spectre-v1 mitigation KVM: VMX: make MSR bitmaps per-VCPU x86/paravirt: Remove 'noreplace-paravirt' cmdline option x86/speculation: Use Indirect Branch Prediction Barrier in context switch x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable" x86/spectre: Report get_user mitigation for spectre_v1 nl80211: Sanitize array index in parse_txq_params vfs, fdtable: Prevent bounds-check bypass via speculative execution x86/syscall: Sanitize syscall table de-references under speculation x86/get_user: Use pointer masking to limit speculation ...
2 parents 0a646e9 + b2ac58f commit 3527799

38 files changed

+977
-554
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2758,8 +2758,6 @@
27582758
norandmaps Don't use address space randomization. Equivalent to
27592759
echo 0 > /proc/sys/kernel/randomize_va_space
27602760

2761-
noreplace-paravirt [X86,IA-64,PV_OPS] Don't patch paravirt_ops
2762-
27632761
noreplace-smp [X86-32,SMP] Don't replace SMP instructions
27642762
with UP alternatives
27652763

Documentation/speculation.txt

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
This document explains potential effects of speculation, and how undesirable
2+
effects can be mitigated portably using common APIs.
3+
4+
===========
5+
Speculation
6+
===========
7+
8+
To improve performance and minimize average latencies, many contemporary CPUs
9+
employ speculative execution techniques such as branch prediction, performing
10+
work which may be discarded at a later stage.
11+
12+
Typically speculative execution cannot be observed from architectural state,
13+
such as the contents of registers. However, in some cases it is possible to
14+
observe its impact on microarchitectural state, such as the presence or
15+
absence of data in caches. Such state may form side-channels which can be
16+
observed to extract secret information.
17+
18+
For example, in the presence of branch prediction, it is possible for bounds
19+
checks to be ignored by code which is speculatively executed. Consider the
20+
following code:
21+
22+
int load_array(int *array, unsigned int index)
23+
{
24+
if (index >= MAX_ARRAY_ELEMS)
25+
return 0;
26+
else
27+
return array[index];
28+
}
29+
30+
Which, on arm64, may be compiled to an assembly sequence such as:
31+
32+
CMP <index>, #MAX_ARRAY_ELEMS
33+
B.LT less
34+
MOV <returnval>, #0
35+
RET
36+
less:
37+
LDR <returnval>, [<array>, <index>]
38+
RET
39+
40+
It is possible that a CPU mis-predicts the conditional branch, and
41+
speculatively loads array[index], even if index >= MAX_ARRAY_ELEMS. This
42+
value will subsequently be discarded, but the speculated load may affect
43+
microarchitectural state which can be subsequently measured.
44+
45+
More complex sequences involving multiple dependent memory accesses may
46+
result in sensitive information being leaked. Consider the following
47+
code, building on the prior example:
48+
49+
int load_dependent_arrays(int *arr1, int *arr2, int index)
50+
{
51+
int val1, val2,
52+
53+
val1 = load_array(arr1, index);
54+
val2 = load_array(arr2, val1);
55+
56+
return val2;
57+
}
58+
59+
Under speculation, the first call to load_array() may return the value
60+
of an out-of-bounds address, while the second call will influence
61+
microarchitectural state dependent on this value. This may provide an
62+
arbitrary read primitive.
63+
64+
====================================
65+
Mitigating speculation side-channels
66+
====================================
67+
68+
The kernel provides a generic API to ensure that bounds checks are
69+
respected even under speculation. Architectures which are affected by
70+
speculation-based side-channels are expected to implement these
71+
primitives.
72+
73+
The array_index_nospec() helper in <linux/nospec.h> can be used to
74+
prevent information from being leaked via side-channels.
75+
76+
A call to array_index_nospec(index, size) returns a sanitized index
77+
value that is bounded to [0, size) even under cpu speculation
78+
conditions.
79+
80+
This can be used to protect the earlier load_array() example:
81+
82+
int load_array(int *array, unsigned int index)
83+
{
84+
if (index >= MAX_ARRAY_ELEMS)
85+
return 0;
86+
else {
87+
index = array_index_nospec(index, MAX_ARRAY_ELEMS);
88+
return array[index];
89+
}
90+
}

arch/x86/entry/common.c

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
#include <linux/export.h>
2222
#include <linux/context_tracking.h>
2323
#include <linux/user-return-notifier.h>
24+
#include <linux/nospec.h>
2425
#include <linux/uprobes.h>
2526
#include <linux/livepatch.h>
2627
#include <linux/syscalls.h>
@@ -206,7 +207,7 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
206207
* special case only applies after poking regs and before the
207208
* very next return to user mode.
208209
*/
209-
current->thread.status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
210+
ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
210211
#endif
211212

212213
user_enter_irqoff();
@@ -282,7 +283,8 @@ __visible void do_syscall_64(struct pt_regs *regs)
282283
* regs->orig_ax, which changes the behavior of some syscalls.
283284
*/
284285
if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
285-
regs->ax = sys_call_table[nr & __SYSCALL_MASK](
286+
nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls);
287+
regs->ax = sys_call_table[nr](
286288
regs->di, regs->si, regs->dx,
287289
regs->r10, regs->r8, regs->r9);
288290
}
@@ -304,7 +306,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
304306
unsigned int nr = (unsigned int)regs->orig_ax;
305307

306308
#ifdef CONFIG_IA32_EMULATION
307-
current->thread.status |= TS_COMPAT;
309+
ti->status |= TS_COMPAT;
308310
#endif
309311

310312
if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) {
@@ -318,6 +320,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
318320
}
319321

320322
if (likely(nr < IA32_NR_syscalls)) {
323+
nr = array_index_nospec(nr, IA32_NR_syscalls);
321324
/*
322325
* It's possible that a 32-bit syscall implementation
323326
* takes a 64-bit parameter but nonetheless assumes that

arch/x86/entry/entry_64.S

Lines changed: 7 additions & 120 deletions
Original file line numberDiff line numberDiff line change
@@ -236,91 +236,20 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
236236
pushq %r9 /* pt_regs->r9 */
237237
pushq %r10 /* pt_regs->r10 */
238238
pushq %r11 /* pt_regs->r11 */
239-
sub $(6*8), %rsp /* pt_regs->bp, bx, r12-15 not saved */
240-
UNWIND_HINT_REGS extra=0
241-
242-
TRACE_IRQS_OFF
243-
244-
/*
245-
* If we need to do entry work or if we guess we'll need to do
246-
* exit work, go straight to the slow path.
247-
*/
248-
movq PER_CPU_VAR(current_task), %r11
249-
testl $_TIF_WORK_SYSCALL_ENTRY|_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
250-
jnz entry_SYSCALL64_slow_path
251-
252-
entry_SYSCALL_64_fastpath:
253-
/*
254-
* Easy case: enable interrupts and issue the syscall. If the syscall
255-
* needs pt_regs, we'll call a stub that disables interrupts again
256-
* and jumps to the slow path.
257-
*/
258-
TRACE_IRQS_ON
259-
ENABLE_INTERRUPTS(CLBR_NONE)
260-
#if __SYSCALL_MASK == ~0
261-
cmpq $__NR_syscall_max, %rax
262-
#else
263-
andl $__SYSCALL_MASK, %eax
264-
cmpl $__NR_syscall_max, %eax
265-
#endif
266-
ja 1f /* return -ENOSYS (already in pt_regs->ax) */
267-
movq %r10, %rcx
268-
269-
/*
270-
* This call instruction is handled specially in stub_ptregs_64.
271-
* It might end up jumping to the slow path. If it jumps, RAX
272-
* and all argument registers are clobbered.
273-
*/
274-
#ifdef CONFIG_RETPOLINE
275-
movq sys_call_table(, %rax, 8), %rax
276-
call __x86_indirect_thunk_rax
277-
#else
278-
call *sys_call_table(, %rax, 8)
279-
#endif
280-
.Lentry_SYSCALL_64_after_fastpath_call:
281-
282-
movq %rax, RAX(%rsp)
283-
1:
239+
pushq %rbx /* pt_regs->rbx */
240+
pushq %rbp /* pt_regs->rbp */
241+
pushq %r12 /* pt_regs->r12 */
242+
pushq %r13 /* pt_regs->r13 */
243+
pushq %r14 /* pt_regs->r14 */
244+
pushq %r15 /* pt_regs->r15 */
245+
UNWIND_HINT_REGS
284246

285-
/*
286-
* If we get here, then we know that pt_regs is clean for SYSRET64.
287-
* If we see that no exit work is required (which we are required
288-
* to check with IRQs off), then we can go straight to SYSRET64.
289-
*/
290-
DISABLE_INTERRUPTS(CLBR_ANY)
291247
TRACE_IRQS_OFF
292-
movq PER_CPU_VAR(current_task), %r11
293-
testl $_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
294-
jnz 1f
295-
296-
LOCKDEP_SYS_EXIT
297-
TRACE_IRQS_ON /* user mode is traced as IRQs on */
298-
movq RIP(%rsp), %rcx
299-
movq EFLAGS(%rsp), %r11
300-
addq $6*8, %rsp /* skip extra regs -- they were preserved */
301-
UNWIND_HINT_EMPTY
302-
jmp .Lpop_c_regs_except_rcx_r11_and_sysret
303248

304-
1:
305-
/*
306-
* The fast path looked good when we started, but something changed
307-
* along the way and we need to switch to the slow path. Calling
308-
* raise(3) will trigger this, for example. IRQs are off.
309-
*/
310-
TRACE_IRQS_ON
311-
ENABLE_INTERRUPTS(CLBR_ANY)
312-
SAVE_EXTRA_REGS
313-
movq %rsp, %rdi
314-
call syscall_return_slowpath /* returns with IRQs disabled */
315-
jmp return_from_SYSCALL_64
316-
317-
entry_SYSCALL64_slow_path:
318249
/* IRQs are off. */
319-
SAVE_EXTRA_REGS
320250
movq %rsp, %rdi
321251
call do_syscall_64 /* returns with IRQs disabled */
322252

323-
return_from_SYSCALL_64:
324253
TRACE_IRQS_IRETQ /* we're about to change IF */
325254

326255
/*
@@ -393,7 +322,6 @@ syscall_return_via_sysret:
393322
/* rcx and r11 are already restored (see code above) */
394323
UNWIND_HINT_EMPTY
395324
POP_EXTRA_REGS
396-
.Lpop_c_regs_except_rcx_r11_and_sysret:
397325
popq %rsi /* skip r11 */
398326
popq %r10
399327
popq %r9
@@ -424,47 +352,6 @@ syscall_return_via_sysret:
424352
USERGS_SYSRET64
425353
END(entry_SYSCALL_64)
426354

427-
ENTRY(stub_ptregs_64)
428-
/*
429-
* Syscalls marked as needing ptregs land here.
430-
* If we are on the fast path, we need to save the extra regs,
431-
* which we achieve by trying again on the slow path. If we are on
432-
* the slow path, the extra regs are already saved.
433-
*
434-
* RAX stores a pointer to the C function implementing the syscall.
435-
* IRQs are on.
436-
*/
437-
cmpq $.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
438-
jne 1f
439-
440-
/*
441-
* Called from fast path -- disable IRQs again, pop return address
442-
* and jump to slow path
443-
*/
444-
DISABLE_INTERRUPTS(CLBR_ANY)
445-
TRACE_IRQS_OFF
446-
popq %rax
447-
UNWIND_HINT_REGS extra=0
448-
jmp entry_SYSCALL64_slow_path
449-
450-
1:
451-
JMP_NOSPEC %rax /* Called from C */
452-
END(stub_ptregs_64)
453-
454-
.macro ptregs_stub func
455-
ENTRY(ptregs_\func)
456-
UNWIND_HINT_FUNC
457-
leaq \func(%rip), %rax
458-
jmp stub_ptregs_64
459-
END(ptregs_\func)
460-
.endm
461-
462-
/* Instantiate ptregs_stub for each ptregs-using syscall */
463-
#define __SYSCALL_64_QUAL_(sym)
464-
#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_stub sym
465-
#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym)
466-
#include <asm/syscalls_64.h>
467-
468355
/*
469356
* %rdi: prev task
470357
* %rsi: next task

arch/x86/entry/syscall_64.c

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,11 @@
77
#include <asm/asm-offsets.h>
88
#include <asm/syscall.h>
99

10-
#define __SYSCALL_64_QUAL_(sym) sym
11-
#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_##sym
12-
13-
#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long __SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
10+
#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
1411
#include <asm/syscalls_64.h>
1512
#undef __SYSCALL_64
1613

17-
#define __SYSCALL_64(nr, sym, qual) [nr] = __SYSCALL_64_QUAL_##qual(sym),
14+
#define __SYSCALL_64(nr, sym, qual) [nr] = sym,
1815

1916
extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
2017

arch/x86/include/asm/barrier.h

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,34 @@
2424
#define wmb() asm volatile("sfence" ::: "memory")
2525
#endif
2626

27+
/**
28+
* array_index_mask_nospec() - generate a mask that is ~0UL when the
29+
* bounds check succeeds and 0 otherwise
30+
* @index: array element index
31+
* @size: number of elements in array
32+
*
33+
* Returns:
34+
* 0 - (index < size)
35+
*/
36+
static inline unsigned long array_index_mask_nospec(unsigned long index,
37+
unsigned long size)
38+
{
39+
unsigned long mask;
40+
41+
asm ("cmp %1,%2; sbb %0,%0;"
42+
:"=r" (mask)
43+
:"r"(size),"r" (index)
44+
:"cc");
45+
return mask;
46+
}
47+
48+
/* Override the default implementation from linux/nospec.h. */
49+
#define array_index_mask_nospec array_index_mask_nospec
50+
51+
/* Prevent speculative execution past this barrier. */
52+
#define barrier_nospec() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \
53+
"lfence", X86_FEATURE_LFENCE_RDTSC)
54+
2755
#ifdef CONFIG_X86_PPRO_FENCE
2856
#define dma_rmb() rmb()
2957
#else

arch/x86/include/asm/fixmap.h

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -137,8 +137,10 @@ enum fixed_addresses {
137137

138138
extern void reserve_top_address(unsigned long reserve);
139139

140-
#define FIXADDR_SIZE (__end_of_permanent_fixed_addresses << PAGE_SHIFT)
141-
#define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE)
140+
#define FIXADDR_SIZE (__end_of_permanent_fixed_addresses << PAGE_SHIFT)
141+
#define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE)
142+
#define FIXADDR_TOT_SIZE (__end_of_fixed_addresses << PAGE_SHIFT)
143+
#define FIXADDR_TOT_START (FIXADDR_TOP - FIXADDR_TOT_SIZE)
142144

143145
extern int fixmaps_set;
144146

arch/x86/include/asm/msr.h

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -214,8 +214,7 @@ static __always_inline unsigned long long rdtsc_ordered(void)
214214
* that some other imaginary CPU is updating continuously with a
215215
* time stamp.
216216
*/
217-
alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
218-
"lfence", X86_FEATURE_LFENCE_RDTSC);
217+
barrier_nospec();
219218
return rdtsc();
220219
}
221220

arch/x86/include/asm/nospec-branch.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,7 @@ extern char __indirect_thunk_end[];
150150
* On VMEXIT we must ensure that no RSB predictions learned in the guest
151151
* can be followed in the host, by overwriting the RSB completely. Both
152152
* retpoline and IBRS mitigations for Spectre v2 need this; only on future
153-
* CPUs with IBRS_ATT *might* it be avoided.
153+
* CPUs with IBRS_ALL *might* it be avoided.
154154
*/
155155
static inline void vmexit_fill_RSB(void)
156156
{

arch/x86/include/asm/pgtable_32_types.h

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,9 @@ extern bool __vmalloc_start_set; /* set once high_memory is set */
4444
*/
4545
#define CPU_ENTRY_AREA_PAGES (NR_CPUS * 40)
4646

47-
#define CPU_ENTRY_AREA_BASE \
48-
((FIXADDR_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1)) & PMD_MASK)
47+
#define CPU_ENTRY_AREA_BASE \
48+
((FIXADDR_TOT_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1)) \
49+
& PMD_MASK)
4950

5051
#define PKMAP_BASE \
5152
((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK)

0 commit comments

Comments
 (0)