Skip to content

Commit 2bb69f5

Browse files
committed
Merge tag 'nativebhi' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mitigations from Thomas Gleixner: "Mitigations for the native BHI hardware vulnerabilty: Branch History Injection (BHI) attacks may allow a malicious application to influence indirect branch prediction in kernel by poisoning the branch history. eIBRS isolates indirect branch targets in ring0. The BHB can still influence the choice of indirect branch predictor entry, and although branch predictor entries are isolated between modes when eIBRS is enabled, the BHB itself is not isolated between modes. Add mitigations against it either with the help of microcode or with software sequences for the affected CPUs" [ This also ends up enabling the full mitigation by default despite the system call hardening, because apparently there are other indirect calls that are still sufficiently reachable, and the 'auto' case just isn't hardened enough. We'll have some more inevitable tweaking in the future - Linus ] * tag 'nativebhi' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: KVM: x86: Add BHI_NO x86/bhi: Mitigate KVM by default x86/bhi: Add BHI mitigation knob x86/bhi: Enumerate Branch History Injection (BHI) bug x86/bhi: Define SPEC_CTRL_BHI_DIS_S x86/bhi: Add support for clearing branch history at syscall entry x86/syscall: Don't force use of indirect calls for system calls x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file
2 parents 20cb38a + ed2e8d4 commit 2bb69f5

File tree

19 files changed

+371
-49
lines changed

19 files changed

+371
-49
lines changed

Documentation/admin-guide/hw-vuln/spectre.rst

Lines changed: 42 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -138,11 +138,10 @@ associated with the source address of the indirect branch. Specifically,
138138
the BHB might be shared across privilege levels even in the presence of
139139
Enhanced IBRS.
140140

141-
Currently the only known real-world BHB attack vector is via
142-
unprivileged eBPF. Therefore, it's highly recommended to not enable
143-
unprivileged eBPF, especially when eIBRS is used (without retpolines).
144-
For a full mitigation against BHB attacks, it's recommended to use
145-
retpolines (or eIBRS combined with retpolines).
141+
Previously the only known real-world BHB attack vector was via unprivileged
142+
eBPF. Further research has found attacks that don't require unprivileged eBPF.
143+
For a full mitigation against BHB attacks it is recommended to set BHI_DIS_S or
144+
use the BHB clearing sequence.
146145

147146
Attack scenarios
148147
----------------
@@ -430,6 +429,23 @@ The possible values in this file are:
430429
'PBRSB-eIBRS: Not affected' CPU is not affected by PBRSB
431430
=========================== =======================================================
432431

432+
- Branch History Injection (BHI) protection status:
433+
434+
.. list-table::
435+
436+
* - BHI: Not affected
437+
- System is not affected
438+
* - BHI: Retpoline
439+
- System is protected by retpoline
440+
* - BHI: BHI_DIS_S
441+
- System is protected by BHI_DIS_S
442+
* - BHI: SW loop; KVM SW loop
443+
- System is protected by software clearing sequence
444+
* - BHI: Syscall hardening
445+
- Syscalls are hardened against BHI
446+
* - BHI: Syscall hardening; KVM: SW loop
447+
- System is protected from userspace attacks by syscall hardening; KVM is protected by software clearing sequence
448+
433449
Full mitigation might require a microcode update from the CPU
434450
vendor. When the necessary microcode is not available, the kernel will
435451
report vulnerability.
@@ -484,7 +500,11 @@ Spectre variant 2
484500

485501
Systems which support enhanced IBRS (eIBRS) enable IBRS protection once at
486502
boot, by setting the IBRS bit, and they're automatically protected against
487-
Spectre v2 variant attacks.
503+
some Spectre v2 variant attacks. The BHB can still influence the choice of
504+
indirect branch predictor entry, and although branch predictor entries are
505+
isolated between modes when eIBRS is enabled, the BHB itself is not isolated
506+
between modes. Systems which support BHI_DIS_S will set it to protect against
507+
BHI attacks.
488508

489509
On Intel's enhanced IBRS systems, this includes cross-thread branch target
490510
injections on SMT systems (STIBP). In other words, Intel eIBRS enables
@@ -638,6 +658,22 @@ kernel command line.
638658
spectre_v2=off. Spectre variant 1 mitigations
639659
cannot be disabled.
640660

661+
spectre_bhi=
662+
663+
[X86] Control mitigation of Branch History Injection
664+
(BHI) vulnerability. Syscalls are hardened against BHI
665+
regardless of this setting. This setting affects the deployment
666+
of the HW BHI control and the SW BHB clearing sequence.
667+
668+
on
669+
unconditionally enable.
670+
off
671+
unconditionally disable.
672+
auto
673+
enable if hardware mitigation
674+
control(BHI_DIS_S) is available, otherwise
675+
enable alternate mitigation in KVM.
676+
641677
For spectre_v2_user see Documentation/admin-guide/kernel-parameters.txt
642678

643679
Mitigation selection guide

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6063,6 +6063,18 @@
60636063
sonypi.*= [HW] Sony Programmable I/O Control Device driver
60646064
See Documentation/admin-guide/laptops/sonypi.rst
60656065

6066+
spectre_bhi= [X86] Control mitigation of Branch History Injection
6067+
(BHI) vulnerability. Syscalls are hardened against BHI
6068+
reglardless of this setting. This setting affects the
6069+
deployment of the HW BHI control and the SW BHB
6070+
clearing sequence.
6071+
6072+
on - unconditionally enable.
6073+
off - unconditionally disable.
6074+
auto - (default) enable hardware mitigation
6075+
(BHI_DIS_S) if available, otherwise enable
6076+
alternate mitigation in KVM.
6077+
60666078
spectre_v2= [X86,EARLY] Control mitigation of Spectre variant 2
60676079
(indirect branch speculation) vulnerability.
60686080
The default operation protects the kernel from

arch/x86/Kconfig

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2633,6 +2633,32 @@ config MITIGATION_RFDS
26332633
stored in floating point, vector and integer registers.
26342634
See also <file:Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst>
26352635

2636+
choice
2637+
prompt "Clear branch history"
2638+
depends on CPU_SUP_INTEL
2639+
default SPECTRE_BHI_ON
2640+
help
2641+
Enable BHI mitigations. BHI attacks are a form of Spectre V2 attacks
2642+
where the branch history buffer is poisoned to speculatively steer
2643+
indirect branches.
2644+
See <file:Documentation/admin-guide/hw-vuln/spectre.rst>
2645+
2646+
config SPECTRE_BHI_ON
2647+
bool "on"
2648+
help
2649+
Equivalent to setting spectre_bhi=on command line parameter.
2650+
config SPECTRE_BHI_OFF
2651+
bool "off"
2652+
help
2653+
Equivalent to setting spectre_bhi=off command line parameter.
2654+
config SPECTRE_BHI_AUTO
2655+
bool "auto"
2656+
depends on BROKEN
2657+
help
2658+
Equivalent to setting spectre_bhi=auto command line parameter.
2659+
2660+
endchoice
2661+
26362662
endif
26372663

26382664
config ARCH_HAS_ADD_PAGES

arch/x86/entry/common.c

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ static __always_inline bool do_syscall_x64(struct pt_regs *regs, int nr)
4949

5050
if (likely(unr < NR_syscalls)) {
5151
unr = array_index_nospec(unr, NR_syscalls);
52-
regs->ax = sys_call_table[unr](regs);
52+
regs->ax = x64_sys_call(regs, unr);
5353
return true;
5454
}
5555
return false;
@@ -66,7 +66,7 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
6666

6767
if (IS_ENABLED(CONFIG_X86_X32_ABI) && likely(xnr < X32_NR_syscalls)) {
6868
xnr = array_index_nospec(xnr, X32_NR_syscalls);
69-
regs->ax = x32_sys_call_table[xnr](regs);
69+
regs->ax = x32_sys_call(regs, xnr);
7070
return true;
7171
}
7272
return false;
@@ -162,7 +162,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs, int nr)
162162

163163
if (likely(unr < IA32_NR_syscalls)) {
164164
unr = array_index_nospec(unr, IA32_NR_syscalls);
165-
regs->ax = ia32_sys_call_table[unr](regs);
165+
regs->ax = ia32_sys_call(regs, unr);
166166
} else if (nr != -1) {
167167
regs->ax = __ia32_sys_ni_syscall(regs);
168168
}
@@ -189,7 +189,7 @@ static __always_inline bool int80_is_external(void)
189189
}
190190

191191
/**
192-
* int80_emulation - 32-bit legacy syscall entry
192+
* do_int80_emulation - 32-bit legacy syscall C entry from asm
193193
*
194194
* This entry point can be used by 32-bit and 64-bit programs to perform
195195
* 32-bit system calls. Instances of INT $0x80 can be found inline in
@@ -207,7 +207,7 @@ static __always_inline bool int80_is_external(void)
207207
* eax: system call number
208208
* ebx, ecx, edx, esi, edi, ebp: arg1 - arg 6
209209
*/
210-
DEFINE_IDTENTRY_RAW(int80_emulation)
210+
__visible noinstr void do_int80_emulation(struct pt_regs *regs)
211211
{
212212
int nr;
213213

arch/x86/entry/entry_64.S

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,7 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
116116
/* clobbers %rax, make sure it is after saving the syscall nr */
117117
IBRS_ENTER
118118
UNTRAIN_RET
119+
CLEAR_BRANCH_HISTORY
119120

120121
call do_syscall_64 /* returns with IRQs disabled */
121122

@@ -1491,3 +1492,63 @@ SYM_CODE_START_NOALIGN(rewind_stack_and_make_dead)
14911492
call make_task_dead
14921493
SYM_CODE_END(rewind_stack_and_make_dead)
14931494
.popsection
1495+
1496+
/*
1497+
* This sequence executes branches in order to remove user branch information
1498+
* from the branch history tracker in the Branch Predictor, therefore removing
1499+
* user influence on subsequent BTB lookups.
1500+
*
1501+
* It should be used on parts prior to Alder Lake. Newer parts should use the
1502+
* BHI_DIS_S hardware control instead. If a pre-Alder Lake part is being
1503+
* virtualized on newer hardware the VMM should protect against BHI attacks by
1504+
* setting BHI_DIS_S for the guests.
1505+
*
1506+
* CALLs/RETs are necessary to prevent Loop Stream Detector(LSD) from engaging
1507+
* and not clearing the branch history. The call tree looks like:
1508+
*
1509+
* call 1
1510+
* call 2
1511+
* call 2
1512+
* call 2
1513+
* call 2
1514+
* call 2
1515+
* ret
1516+
* ret
1517+
* ret
1518+
* ret
1519+
* ret
1520+
* ret
1521+
*
1522+
* This means that the stack is non-constant and ORC can't unwind it with %rsp
1523+
* alone. Therefore we unconditionally set up the frame pointer, which allows
1524+
* ORC to unwind properly.
1525+
*
1526+
* The alignment is for performance and not for safety, and may be safely
1527+
* refactored in the future if needed.
1528+
*/
1529+
SYM_FUNC_START(clear_bhb_loop)
1530+
push %rbp
1531+
mov %rsp, %rbp
1532+
movl $5, %ecx
1533+
ANNOTATE_INTRA_FUNCTION_CALL
1534+
call 1f
1535+
jmp 5f
1536+
.align 64, 0xcc
1537+
ANNOTATE_INTRA_FUNCTION_CALL
1538+
1: call 2f
1539+
RET
1540+
.align 64, 0xcc
1541+
2: movl $5, %eax
1542+
3: jmp 4f
1543+
nop
1544+
4: sub $1, %eax
1545+
jnz 3b
1546+
sub $1, %ecx
1547+
jnz 1b
1548+
RET
1549+
5: lfence
1550+
pop %rbp
1551+
RET
1552+
SYM_FUNC_END(clear_bhb_loop)
1553+
EXPORT_SYMBOL_GPL(clear_bhb_loop)
1554+
STACK_FRAME_NON_STANDARD(clear_bhb_loop)

arch/x86/entry/entry_64_compat.S

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ SYM_INNER_LABEL(entry_SYSENTER_compat_after_hwframe, SYM_L_GLOBAL)
9292

9393
IBRS_ENTER
9494
UNTRAIN_RET
95+
CLEAR_BRANCH_HISTORY
9596

9697
/*
9798
* SYSENTER doesn't filter flags, so we need to clear NT and AC
@@ -206,6 +207,7 @@ SYM_INNER_LABEL(entry_SYSCALL_compat_after_hwframe, SYM_L_GLOBAL)
206207

207208
IBRS_ENTER
208209
UNTRAIN_RET
210+
CLEAR_BRANCH_HISTORY
209211

210212
movq %rsp, %rdi
211213
call do_fast_syscall_32
@@ -276,3 +278,17 @@ SYM_INNER_LABEL(entry_SYSRETL_compat_end, SYM_L_GLOBAL)
276278
ANNOTATE_NOENDBR
277279
int3
278280
SYM_CODE_END(entry_SYSCALL_compat)
281+
282+
/*
283+
* int 0x80 is used by 32 bit mode as a system call entry. Normally idt entries
284+
* point to C routines, however since this is a system call interface the branch
285+
* history needs to be scrubbed to protect against BHI attacks, and that
286+
* scrubbing needs to take place in assembly code prior to entering any C
287+
* routines.
288+
*/
289+
SYM_CODE_START(int80_emulation)
290+
ANNOTATE_NOENDBR
291+
UNWIND_HINT_FUNC
292+
CLEAR_BRANCH_HISTORY
293+
jmp do_int80_emulation
294+
SYM_CODE_END(int80_emulation)

arch/x86/entry/syscall_32.c

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,25 @@
1818
#include <asm/syscalls_32.h>
1919
#undef __SYSCALL
2020

21+
/*
22+
* The sys_call_table[] is no longer used for system calls, but
23+
* kernel/trace/trace_syscalls.c still wants to know the system
24+
* call address.
25+
*/
26+
#ifdef CONFIG_X86_32
2127
#define __SYSCALL(nr, sym) __ia32_##sym,
22-
23-
__visible const sys_call_ptr_t ia32_sys_call_table[] = {
28+
const sys_call_ptr_t sys_call_table[] = {
2429
#include <asm/syscalls_32.h>
2530
};
31+
#undef __SYSCALL
32+
#endif
33+
34+
#define __SYSCALL(nr, sym) case nr: return __ia32_##sym(regs);
35+
36+
long ia32_sys_call(const struct pt_regs *regs, unsigned int nr)
37+
{
38+
switch (nr) {
39+
#include <asm/syscalls_32.h>
40+
default: return __ia32_sys_ni_syscall(regs);
41+
}
42+
};

arch/x86/entry/syscall_64.c

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,23 @@
1111
#include <asm/syscalls_64.h>
1212
#undef __SYSCALL
1313

14+
/*
15+
* The sys_call_table[] is no longer used for system calls, but
16+
* kernel/trace/trace_syscalls.c still wants to know the system
17+
* call address.
18+
*/
1419
#define __SYSCALL(nr, sym) __x64_##sym,
15-
16-
asmlinkage const sys_call_ptr_t sys_call_table[] = {
20+
const sys_call_ptr_t sys_call_table[] = {
1721
#include <asm/syscalls_64.h>
1822
};
23+
#undef __SYSCALL
24+
25+
#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
26+
27+
long x64_sys_call(const struct pt_regs *regs, unsigned int nr)
28+
{
29+
switch (nr) {
30+
#include <asm/syscalls_64.h>
31+
default: return __x64_sys_ni_syscall(regs);
32+
}
33+
};

arch/x86/entry/syscall_x32.c

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,12 @@
1111
#include <asm/syscalls_x32.h>
1212
#undef __SYSCALL
1313

14-
#define __SYSCALL(nr, sym) __x64_##sym,
14+
#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
1515

16-
asmlinkage const sys_call_ptr_t x32_sys_call_table[] = {
17-
#include <asm/syscalls_x32.h>
16+
long x32_sys_call(const struct pt_regs *regs, unsigned int nr)
17+
{
18+
switch (nr) {
19+
#include <asm/syscalls_x32.h>
20+
default: return __x64_sys_ni_syscall(regs);
21+
}
1822
};

arch/x86/include/asm/cpufeatures.h

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -461,11 +461,15 @@
461461

462462
/*
463463
* Extended auxiliary flags: Linux defined - for features scattered in various
464-
* CPUID levels like 0x80000022, etc.
464+
* CPUID levels like 0x80000022, etc and Linux defined features.
465465
*
466466
* Reuse free bits when adding new feature flags!
467467
*/
468468
#define X86_FEATURE_AMD_LBR_PMC_FREEZE (21*32+ 0) /* AMD LBR and PMC Freeze */
469+
#define X86_FEATURE_CLEAR_BHB_LOOP (21*32+ 1) /* "" Clear branch history at syscall entry using SW loop */
470+
#define X86_FEATURE_BHI_CTRL (21*32+ 2) /* "" BHI_DIS_S HW control available */
471+
#define X86_FEATURE_CLEAR_BHB_HW (21*32+ 3) /* "" BHI_DIS_S HW control enabled */
472+
#define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
469473

470474
/*
471475
* BUG word(s)
@@ -515,4 +519,5 @@
515519
#define X86_BUG_SRSO X86_BUG(1*32 + 0) /* AMD SRSO bug */
516520
#define X86_BUG_DIV0 X86_BUG(1*32 + 1) /* AMD DIV0 speculation bug */
517521
#define X86_BUG_RFDS X86_BUG(1*32 + 2) /* CPU is vulnerable to Register File Data Sampling */
522+
#define X86_BUG_BHI X86_BUG(1*32 + 3) /* CPU is affected by Branch History Injection */
518523
#endif /* _ASM_X86_CPUFEATURES_H */

arch/x86/include/asm/msr-index.h

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,10 +61,13 @@
6161
#define SPEC_CTRL_SSBD BIT(SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */
6262
#define SPEC_CTRL_RRSBA_DIS_S_SHIFT 6 /* Disable RRSBA behavior */
6363
#define SPEC_CTRL_RRSBA_DIS_S BIT(SPEC_CTRL_RRSBA_DIS_S_SHIFT)
64+
#define SPEC_CTRL_BHI_DIS_S_SHIFT 10 /* Disable Branch History Injection behavior */
65+
#define SPEC_CTRL_BHI_DIS_S BIT(SPEC_CTRL_BHI_DIS_S_SHIFT)
6466

6567
/* A mask for bits which the kernel toggles when controlling mitigations */
6668
#define SPEC_CTRL_MITIGATIONS_MASK (SPEC_CTRL_IBRS | SPEC_CTRL_STIBP | SPEC_CTRL_SSBD \
67-
| SPEC_CTRL_RRSBA_DIS_S)
69+
| SPEC_CTRL_RRSBA_DIS_S \
70+
| SPEC_CTRL_BHI_DIS_S)
6871

6972
#define MSR_IA32_PRED_CMD 0x00000049 /* Prediction Command */
7073
#define PRED_CMD_IBPB BIT(0) /* Indirect Branch Prediction Barrier */
@@ -163,6 +166,10 @@
163166
* are restricted to targets in
164167
* kernel.
165168
*/
169+
#define ARCH_CAP_BHI_NO BIT(20) /*
170+
* CPU is not affected by Branch
171+
* History Injection.
172+
*/
166173
#define ARCH_CAP_PBRSB_NO BIT(24) /*
167174
* Not susceptible to Post-Barrier
168175
* Return Stack Buffer Predictions.

0 commit comments

Comments
 (0)