Skip to content

Commit e5455ef

Browse files
ashok-rajBrian Maly
authored andcommitted
KVM/x86: Add IBPB support
The Indirect Branch Predictor Barrier (IBPB) is an indirect branch control mechanism. It keeps earlier branches from influencing later ones. Unlike IBRS and STIBP, IBPB does not define a new mode of operation. It's a command that ensures predicted branch targets aren't used after the barrier. Although IBRS and IBPB are enumerated by the same CPUID enumeration, IBPB is very different. IBPB helps mitigate against three potential attacks: * Mitigate guests from being attacked by other guests. - This is addressed by issing IBPB when we do a guest switch. * Mitigate attacks from guest/ring3->host/ring3. These would require a IBPB during context switch in host, or after VMEXIT. The host process has two ways to mitigate - Either it can be compiled with retpoline - If its going through context switch, and has set !dumpable then there is a IBPB in that path. (Tim's patch: https://patchwork.kernel.org/patch/10192871) - The case where after a VMEXIT you return back to Qemu might make Qemu attackable from guest when Qemu isn't compiled with retpoline. There are issues reported when doing IBPB on every VMEXIT that resulted in some tsc calibration woes in guest. * Mitigate guest/ring0->host/ring0 attacks. When host kernel is using retpoline it is safe against these attacks. If host kernel isn't using retpoline we might need to do a IBPB flush on every VMEXIT. Even when using retpoline for indirect calls, in certain conditions 'ret' can use the BTB on Skylake-era CPUs. There are other mitigations available like RSB stuffing/clearing. * IBPB is issued only for SVM during svm_free_vcpu(). VMX has a vmclear and SVM doesn't. Follow discussion here: https://lkml.org/lkml/2018/1/15/146 Please refer to the following spec for more details on the enumeration and control. Refer here to get documentation about mitigations. https://software.intel.com/en-us/side-channel-security-support [peterz: rebase and changelog rewrite] [karahmed: - rebase - vmx: expose PRED_CMD if guest has it in CPUID - svm: only pass through IBPB if guest has it in CPUID - vmx: support !cpu_has_vmx_msr_bitmap()] - vmx: support nested] [dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS) PRED_CMD is a write-only MSR] Signed-off-by: Ashok Raj <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: KarimAllah Ahmed <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andi Kleen <[email protected]> Cc: [email protected] Cc: Asit Mallick <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Arjan Van De Ven <[email protected]> Cc: Greg KH <[email protected]> Cc: Jun Nakajima <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Dan Williams <[email protected]> Cc: Tim Chen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] (cherry picked from commit 15d4507) Orabug: 28069548 Signed-off-by: Mihai Carabas <[email protected]> Reviewed-by: Darren Kenny <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Signed-off-by: Brian Maly <[email protected]> Conflicts: arch/x86/kvm/cpuid.c arch/x86/kvm/svm.c arch/x86/kvm/vmx.c All the conflicts were contextual. Major differences in the code between UEK4 and upstream (also in UEK4 we only have the feature IBRS, not SPEC_CTRL). We had to introduce guest_cpuid_has_* functions in cpuid.h for each feature. Also moved defines in cpuid.h that were needed in cpuid.h and cpuid.c. Signed-off-by: Brian Maly <[email protected]>
1 parent fe09396 commit e5455ef

File tree

4 files changed

+106
-19
lines changed

4 files changed

+106
-19
lines changed

arch/x86/kvm/cpuid.c

Lines changed: 2 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -56,20 +56,6 @@ u64 kvm_supported_xcr0(void)
5656
return xcr0;
5757
}
5858

59-
#define F(x) bit(X86_FEATURE_##x)
60-
61-
/* These are scattered features in cpufeatures.h. */
62-
#define KVM_CPUID_BIT_IBRS 26
63-
#define KVM_CPUID_BIT_STIBP 27
64-
#define KVM_CPUID_BIT_IA32_ARCH_CAPS 29
65-
#define KVM_CPUID_BIT_SSBD 31
66-
67-
68-
/* CPUID[eax=0x80000008].ebx */
69-
#define KVM_CPUID_BIT_IBPB_SUPPORT 12
70-
#define KVM_CPUID_BIT_VIRT_SSBD 25
71-
72-
#define KF(x) bit(KVM_CPUID_BIT_##x)
7359

7460
int kvm_update_cpuid(struct kvm_vcpu *vcpu)
7561
{
@@ -372,7 +358,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
372358

373359
/* cpuid 0x80000008.ebx */
374360
const u32 kvm_cpuid_80000008_ebx_x86_features =
375-
KF(IBPB_SUPPORT) | KF(VIRT_SSBD);
361+
KF(IBPB) | KF(VIRT_SSBD);
376362

377363
/* all calls to cpuid_count() should be made on the same cpu */
378364
get_cpu();
@@ -609,7 +595,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
609595
entry->ebx &= kvm_cpuid_80000008_ebx_x86_features;
610596

611597
if ( !boot_cpu_has(X86_FEATURE_IBPB) )
612-
entry->ebx &= ~(1u << KVM_CPUID_BIT_IBPB_SUPPORT);
598+
entry->ebx &= ~(1u << KVM_CPUID_BIT_IBPB);
613599

614600
if (boot_cpu_has(X86_FEATURE_AMD_SSBD))
615601
entry->ebx |= KF(VIRT_SSBD);

arch/x86/kvm/cpuid.h

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,4 +125,31 @@ static inline bool guest_cpuid_has_mpx(struct kvm_vcpu *vcpu)
125125
best = kvm_find_cpuid_entry(vcpu, 7, 0);
126126
return best && (best->ebx & bit(X86_FEATURE_MPX));
127127
}
128+
129+
#define F(x) bit(X86_FEATURE_##x)
130+
#define KF(x) bit(KVM_CPUID_BIT_##x)
131+
132+
/* These are scattered features in cpufeatures.h. */
133+
#define KVM_CPUID_BIT_IBPB 12
134+
#define KVM_CPUID_BIT_VIRT_SSBD 25
135+
#define KVM_CPUID_BIT_IBRS 26
136+
#define KVM_CPUID_BIT_STIBP 27
137+
#define KVM_CPUID_BIT_IA32_ARCH_CAPS 29
138+
#define KVM_CPUID_BIT_SSBD 31
139+
140+
static inline bool guest_cpuid_has_ibpb(struct kvm_vcpu *vcpu)
141+
{
142+
struct kvm_cpuid_entry2 *best;
143+
144+
best = kvm_find_cpuid_entry(vcpu, 0x80000008, 0);
145+
return best && (best->ebx & KF(IBPB));
146+
}
147+
148+
static inline bool guest_cpuid_has_ibrs(struct kvm_vcpu *vcpu)
149+
{
150+
struct kvm_cpuid_entry2 *best;
151+
152+
best = kvm_find_cpuid_entry(vcpu, 7, 0);
153+
return best && (best->edx & KF(IBRS));
154+
}
128155
#endif

arch/x86/kvm/svm.c

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,7 @@ static const struct svm_direct_access_msrs {
194194
{ .index = MSR_IA32_LASTINTFROMIP, .always = false },
195195
{ .index = MSR_IA32_LASTINTTOIP, .always = false },
196196
{ .index = MSR_IA32_SPEC_CTRL, .always = true },
197-
{ .index = MSR_IA32_PRED_CMD, .always = true },
197+
{ .index = MSR_IA32_PRED_CMD, .always = false },
198198
{ .index = MSR_INVALID, .always = false },
199199
};
200200

@@ -3304,6 +3304,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
33043304
svm->spec_ctrl = data;
33053305
break;
33063306
case MSR_IA32_PRED_CMD:
3307+
if (!msr->host_initiated &&
3308+
!guest_cpuid_has_ibpb(vcpu))
3309+
return 1;
3310+
33073311
if (data & ~FEATURE_SET_IBPB)
33083312
return 1;
33093313

@@ -3312,6 +3316,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
33123316

33133317
if (ibpb_inuse)
33143318
wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
3319+
3320+
if (is_guest_mode(vcpu))
3321+
break;
3322+
set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
33153323
break;
33163324
case MSR_AMD64_VIRT_SPEC_CTRL:
33173325
if (data & ~SPEC_CTRL_SSBD)

arch/x86/kvm/vmx.c

Lines changed: 68 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -986,6 +986,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
986986
static int alloc_identity_pagetable(struct kvm *kvm);
987987

988988
static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
989+
static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
990+
u32 msr, int type);
989991

990992
static DEFINE_PER_CPU(struct vmcs *, vmxarea);
991993
static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -1724,6 +1726,29 @@ static u32 vmx_read_guest_seg_ar(struct vcpu_vmx *vmx, unsigned seg)
17241726
return *p;
17251727
}
17261728

1729+
/*
1730+
* Check if MSR is intercepted for L01 MSR bitmap.
1731+
*/
1732+
static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
1733+
{
1734+
unsigned long *msr_bitmap;
1735+
int f = sizeof(unsigned long);
1736+
1737+
if (!cpu_has_vmx_msr_bitmap())
1738+
return true;
1739+
1740+
msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
1741+
1742+
if (msr <= 0x1fff) {
1743+
return !!test_bit(msr, msr_bitmap + 0x800 / f);
1744+
} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
1745+
msr &= 0x1fff;
1746+
return !!test_bit(msr, msr_bitmap + 0xc00 / f);
1747+
}
1748+
1749+
return true;
1750+
}
1751+
17271752
static void update_exception_bitmap(struct kvm_vcpu *vcpu)
17281753
{
17291754
u32 eb;
@@ -2934,6 +2959,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
29342959
to_vmx(vcpu)->spec_ctrl = data;
29352960
break;
29362961
case MSR_IA32_PRED_CMD:
2962+
if (!msr_info->host_initiated &&
2963+
!guest_cpuid_has_ibpb(vcpu) &&
2964+
!guest_cpuid_has_ibrs(vcpu))
2965+
return 1;
2966+
29372967
if (data & ~FEATURE_SET_IBPB)
29382968
return 1;
29392969

@@ -2942,6 +2972,20 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
29422972

29432973
if (ibpb_inuse)
29442974
wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
2975+
2976+
/*
2977+
* For non-nested:
2978+
* When it's written (to non-zero) for the first time, pass
2979+
* it through.
2980+
*
2981+
* For nested:
2982+
* The handling of the MSR bitmap for L2 guests is done in
2983+
* nested_vmx_merge_msr_bitmap. We should not touch the
2984+
* vmcs02.msr_bitmap here since it gets completely overwritten
2985+
* in the merging.
2986+
*/
2987+
vmx_disable_intercept_for_msr(to_vmx(vcpu)->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
2988+
MSR_TYPE_W);
29452989
break;
29462990
case MSR_IA32_ARCH_CAPABILITIES:
29472991
vmx->arch_capabilities = data;
@@ -9073,8 +9117,23 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
90739117
unsigned long *msr_bitmap_l1;
90749118
unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
90759119

9076-
/* This shortcut is ok because we support only x2APIC MSRs so far. */
9077-
if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
9120+
/*
9121+
* pred_cmd is trying to verify two things:
9122+
*
9123+
* 1. L0 gave a permission to L1 to actually passthrough the MSR. This
9124+
* ensures that we do not accidentally generate an L02 MSR bitmap
9125+
* from the L12 MSR bitmap that is too permissive.
9126+
* 2. That L1 or L2s have actually used the MSR. This avoids
9127+
* unnecessarily merging of the bitmap if the MSR is unused. This
9128+
* works properly because we only update the L01 MSR bitmap lazily.
9129+
* So even if L0 should pass L1 these MSRs, the L01 bitmap is only
9130+
* updated to reflect this when L1 (or its L2s) actually write to
9131+
* the MSR.
9132+
*/
9133+
bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
9134+
9135+
if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
9136+
!pred_cmd)
90789137
return false;
90799138

90809139
page = nested_get_page(vcpu, vmcs12->msr_bitmap);
@@ -9114,6 +9173,13 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
91149173
MSR_TYPE_W);
91159174
}
91169175
}
9176+
9177+
if (pred_cmd)
9178+
nested_vmx_disable_intercept_for_msr(
9179+
msr_bitmap_l1, msr_bitmap_l0,
9180+
MSR_IA32_PRED_CMD,
9181+
MSR_TYPE_W);
9182+
91179183
kunmap(page);
91189184
nested_release_page_clean(page);
91199185

0 commit comments

Comments
 (0)