Skip to content

Commit 7bcf724

Browse files
committed
Merge branch 'kvm-tdx-finish-initial' into HEAD
This patch ties the remaining loose ends and finally enables TDX guests to run inside KVM. It implements handling of EPT violation/misconfig and of several TDVMCALL leaves that are handled in the kernel (CPUID, HLT, RDMSR/WRMSR, GetTdVmCallInfo); it also adds a bunch of wrappers in vmx/main.c to ignore operations not supported by TDX guests(*) Finally, it introduces documentation for the new APIs that have been added along the way. (*) access to CPU state, VMX preemption timer, accesses to TSC offset or multiplier, LMCE enable/disable, hypercall patching.
2 parents 9913212 + 52f52ea commit 7bcf724

File tree

21 files changed

+1204
-107
lines changed

21 files changed

+1204
-107
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 31 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1407,6 +1407,9 @@ the memory region are automatically reflected into the guest. For example, an
14071407
mmap() that affects the region will be made visible immediately. Another
14081408
example is madvise(MADV_DROP).
14091409

1410+
For TDX guest, deleting/moving memory region loses guest memory contents.
1411+
Read only region isn't supported. Only as-id 0 is supported.
1412+
14101413
Note: On arm64, a write generated by the page-table walker (to update
14111414
the Access and Dirty flags, for example) never results in a
14121415
KVM_EXIT_MMIO exit when the slot has the KVM_MEM_READONLY flag. This
@@ -4764,17 +4767,19 @@ H_GET_CPU_CHARACTERISTICS hypercall.
47644767

47654768
:Capability: basic
47664769
:Architectures: x86
4767-
:Type: vm
4770+
:Type: vm ioctl, vcpu ioctl
47684771
:Parameters: an opaque platform specific structure (in/out)
47694772
:Returns: 0 on success; -1 on error
47704773

47714774
If the platform supports creating encrypted VMs then this ioctl can be used
47724775
for issuing platform-specific memory encryption commands to manage those
47734776
encrypted VMs.
47744777

4775-
Currently, this ioctl is used for issuing Secure Encrypted Virtualization
4776-
(SEV) commands on AMD Processors. The SEV commands are defined in
4777-
Documentation/virt/kvm/x86/amd-memory-encryption.rst.
4778+
Currently, this ioctl is used for issuing both Secure Encrypted Virtualization
4779+
(SEV) commands on AMD Processors and Trusted Domain Extensions (TDX) commands
4780+
on Intel Processors. The detailed commands are defined in
4781+
Documentation/virt/kvm/x86/amd-memory-encryption.rst and
4782+
Documentation/virt/kvm/x86/intel-tdx.rst.
47784783

47794784
4.111 KVM_MEMORY_ENCRYPT_REG_REGION
47804785
-----------------------------------
@@ -8160,6 +8165,28 @@ KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the
81608165
and 0x489), as KVM does now allow them to
81618166
be set by userspace (KVM sets them based on
81628167
guest CPUID, for safety purposes).
8168+
8169+
KVM_X86_QUIRK_IGNORE_GUEST_PAT By default, on Intel platforms, KVM ignores
8170+
guest PAT and forces the effective memory
8171+
type to WB in EPT. The quirk is not available
8172+
on Intel platforms which are incapable of
8173+
safely honoring guest PAT (i.e., without CPU
8174+
self-snoop, KVM always ignores guest PAT and
8175+
forces effective memory type to WB). It is
8176+
also ignored on AMD platforms or, on Intel,
8177+
when a VM has non-coherent DMA devices
8178+
assigned; KVM always honors guest PAT in
8179+
such case. The quirk is needed to avoid
8180+
slowdowns on certain Intel Xeon platforms
8181+
(e.g. ICX, SPR) where self-snoop feature is
8182+
supported but UC is slow enough to cause
8183+
issues with some older guests that use
8184+
UC instead of WC to map the video RAM.
8185+
Userspace can disable the quirk to honor
8186+
guest PAT if it knows that there is no such
8187+
guest software, for example if it does not
8188+
expose a bochs graphics device (which is
8189+
known to have had a buggy driver).
81638190
=================================== ============================================
81648191

81658192
7.32 KVM_CAP_MAX_VCPU_ID

Documentation/virt/kvm/x86/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ KVM for x86 systems
1111
cpuid
1212
errata
1313
hypercalls
14+
intel-tdx
1415
mmu
1516
msr
1617
nested-vmx
Lines changed: 255 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,255 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
===================================
4+
Intel Trust Domain Extensions (TDX)
5+
===================================
6+
7+
Overview
8+
========
9+
Intel's Trust Domain Extensions (TDX) protect confidential guest VMs from the
10+
host and physical attacks. A CPU-attested software module called 'the TDX
11+
module' runs inside a new CPU isolated range to provide the functionalities to
12+
manage and run protected VMs, a.k.a, TDX guests or TDs.
13+
14+
Please refer to [1] for the whitepaper, specifications and other resources.
15+
16+
This documentation describes TDX-specific KVM ABIs. The TDX module needs to be
17+
initialized before it can be used by KVM to run any TDX guests. The host
18+
core-kernel provides the support of initializing the TDX module, which is
19+
described in the Documentation/arch/x86/tdx.rst.
20+
21+
API description
22+
===============
23+
24+
KVM_MEMORY_ENCRYPT_OP
25+
---------------------
26+
:Type: vm ioctl, vcpu ioctl
27+
28+
For TDX operations, KVM_MEMORY_ENCRYPT_OP is re-purposed to be generic
29+
ioctl with TDX specific sub-ioctl() commands.
30+
31+
::
32+
33+
/* Trust Domain Extensions sub-ioctl() commands. */
34+
enum kvm_tdx_cmd_id {
35+
KVM_TDX_CAPABILITIES = 0,
36+
KVM_TDX_INIT_VM,
37+
KVM_TDX_INIT_VCPU,
38+
KVM_TDX_INIT_MEM_REGION,
39+
KVM_TDX_FINALIZE_VM,
40+
KVM_TDX_GET_CPUID,
41+
42+
KVM_TDX_CMD_NR_MAX,
43+
};
44+
45+
struct kvm_tdx_cmd {
46+
/* enum kvm_tdx_cmd_id */
47+
__u32 id;
48+
/* flags for sub-command. If sub-command doesn't use this, set zero. */
49+
__u32 flags;
50+
/*
51+
* data for each sub-command. An immediate or a pointer to the actual
52+
* data in process virtual address. If sub-command doesn't use it,
53+
* set zero.
54+
*/
55+
__u64 data;
56+
/*
57+
* Auxiliary error code. The sub-command may return TDX SEAMCALL
58+
* status code in addition to -Exxx.
59+
*/
60+
__u64 hw_error;
61+
};
62+
63+
KVM_TDX_CAPABILITIES
64+
--------------------
65+
:Type: vm ioctl
66+
:Returns: 0 on success, <0 on error
67+
68+
Return the TDX capabilities that current KVM supports with the specific TDX
69+
module loaded in the system. It reports what features/capabilities are allowed
70+
to be configured to the TDX guest.
71+
72+
- id: KVM_TDX_CAPABILITIES
73+
- flags: must be 0
74+
- data: pointer to struct kvm_tdx_capabilities
75+
- hw_error: must be 0
76+
77+
::
78+
79+
struct kvm_tdx_capabilities {
80+
__u64 supported_attrs;
81+
__u64 supported_xfam;
82+
__u64 reserved[254];
83+
84+
/* Configurable CPUID bits for userspace */
85+
struct kvm_cpuid2 cpuid;
86+
};
87+
88+
89+
KVM_TDX_INIT_VM
90+
---------------
91+
:Type: vm ioctl
92+
:Returns: 0 on success, <0 on error
93+
94+
Perform TDX specific VM initialization. This needs to be called after
95+
KVM_CREATE_VM and before creating any VCPUs.
96+
97+
- id: KVM_TDX_INIT_VM
98+
- flags: must be 0
99+
- data: pointer to struct kvm_tdx_init_vm
100+
- hw_error: must be 0
101+
102+
::
103+
104+
struct kvm_tdx_init_vm {
105+
__u64 attributes;
106+
__u64 xfam;
107+
__u64 mrconfigid[6]; /* sha384 digest */
108+
__u64 mrowner[6]; /* sha384 digest */
109+
__u64 mrownerconfig[6]; /* sha384 digest */
110+
111+
/* The total space for TD_PARAMS before the CPUIDs is 256 bytes */
112+
__u64 reserved[12];
113+
114+
/*
115+
* Call KVM_TDX_INIT_VM before vcpu creation, thus before
116+
* KVM_SET_CPUID2.
117+
* This configuration supersedes KVM_SET_CPUID2s for VCPUs because the
118+
* TDX module directly virtualizes those CPUIDs without VMM. The user
119+
* space VMM, e.g. qemu, should make KVM_SET_CPUID2 consistent with
120+
* those values. If it doesn't, KVM may have wrong idea of vCPUIDs of
121+
* the guest, and KVM may wrongly emulate CPUIDs or MSRs that the TDX
122+
* module doesn't virtualize.
123+
*/
124+
struct kvm_cpuid2 cpuid;
125+
};
126+
127+
128+
KVM_TDX_INIT_VCPU
129+
-----------------
130+
:Type: vcpu ioctl
131+
:Returns: 0 on success, <0 on error
132+
133+
Perform TDX specific VCPU initialization.
134+
135+
- id: KVM_TDX_INIT_VCPU
136+
- flags: must be 0
137+
- data: initial value of the guest TD VCPU RCX
138+
- hw_error: must be 0
139+
140+
KVM_TDX_INIT_MEM_REGION
141+
-----------------------
142+
:Type: vcpu ioctl
143+
:Returns: 0 on success, <0 on error
144+
145+
Initialize @nr_pages TDX guest private memory starting from @gpa with userspace
146+
provided data from @source_addr.
147+
148+
Note, before calling this sub command, memory attribute of the range
149+
[gpa, gpa + nr_pages] needs to be private. Userspace can use
150+
KVM_SET_MEMORY_ATTRIBUTES to set the attribute.
151+
152+
If KVM_TDX_MEASURE_MEMORY_REGION flag is specified, it also extends measurement.
153+
154+
- id: KVM_TDX_INIT_MEM_REGION
155+
- flags: currently only KVM_TDX_MEASURE_MEMORY_REGION is defined
156+
- data: pointer to struct kvm_tdx_init_mem_region
157+
- hw_error: must be 0
158+
159+
::
160+
161+
#define KVM_TDX_MEASURE_MEMORY_REGION (1UL << 0)
162+
163+
struct kvm_tdx_init_mem_region {
164+
__u64 source_addr;
165+
__u64 gpa;
166+
__u64 nr_pages;
167+
};
168+
169+
170+
KVM_TDX_FINALIZE_VM
171+
-------------------
172+
:Type: vm ioctl
173+
:Returns: 0 on success, <0 on error
174+
175+
Complete measurement of the initial TD contents and mark it ready to run.
176+
177+
- id: KVM_TDX_FINALIZE_VM
178+
- flags: must be 0
179+
- data: must be 0
180+
- hw_error: must be 0
181+
182+
183+
KVM_TDX_GET_CPUID
184+
-----------------
185+
:Type: vcpu ioctl
186+
:Returns: 0 on success, <0 on error
187+
188+
Get the CPUID values that the TDX module virtualizes for the TD guest.
189+
When it returns -E2BIG, the user space should allocate a larger buffer and
190+
retry. The minimum buffer size is updated in the nent field of the
191+
struct kvm_cpuid2.
192+
193+
- id: KVM_TDX_GET_CPUID
194+
- flags: must be 0
195+
- data: pointer to struct kvm_cpuid2 (in/out)
196+
- hw_error: must be 0 (out)
197+
198+
::
199+
200+
struct kvm_cpuid2 {
201+
__u32 nent;
202+
__u32 padding;
203+
struct kvm_cpuid_entry2 entries[0];
204+
};
205+
206+
struct kvm_cpuid_entry2 {
207+
__u32 function;
208+
__u32 index;
209+
__u32 flags;
210+
__u32 eax;
211+
__u32 ebx;
212+
__u32 ecx;
213+
__u32 edx;
214+
__u32 padding[3];
215+
};
216+
217+
KVM TDX creation flow
218+
=====================
219+
In addition to the standard KVM flow, new TDX ioctls need to be called. The
220+
control flow is as follows:
221+
222+
#. Check system wide capability
223+
224+
* KVM_CAP_VM_TYPES: Check if VM type is supported and if KVM_X86_TDX_VM
225+
is supported.
226+
227+
#. Create VM
228+
229+
* KVM_CREATE_VM
230+
* KVM_TDX_CAPABILITIES: Query TDX capabilities for creating TDX guests.
231+
* KVM_CHECK_EXTENSION(KVM_CAP_MAX_VCPUS): Query maximum VCPUs the TD can
232+
support at VM level (TDX has its own limitation on this).
233+
* KVM_SET_TSC_KHZ: Configure TD's TSC frequency if a different TSC frequency
234+
than host is desired. This is Optional.
235+
* KVM_TDX_INIT_VM: Pass TDX specific VM parameters.
236+
237+
#. Create VCPU
238+
239+
* KVM_CREATE_VCPU
240+
* KVM_TDX_INIT_VCPU: Pass TDX specific VCPU parameters.
241+
* KVM_SET_CPUID2: Configure TD's CPUIDs.
242+
* KVM_SET_MSRS: Configure TD's MSRs.
243+
244+
#. Initialize initial guest memory
245+
246+
* Prepare content of initial guest memory.
247+
* KVM_TDX_INIT_MEM_REGION: Add initial guest memory.
248+
* KVM_TDX_FINALIZE_VM: Finalize the measurement of the TDX guest.
249+
250+
#. Run VCPU
251+
252+
References
253+
==========
254+
255+
.. [1] https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/documentation.html

arch/x86/include/asm/kvm_host.h

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2420,7 +2420,12 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
24202420
KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \
24212421
KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \
24222422
KVM_X86_QUIRK_SLOT_ZAP_ALL | \
2423-
KVM_X86_QUIRK_STUFF_FEATURE_MSRS)
2423+
KVM_X86_QUIRK_STUFF_FEATURE_MSRS | \
2424+
KVM_X86_QUIRK_IGNORE_GUEST_PAT)
2425+
2426+
#define KVM_X86_CONDITIONAL_QUIRKS \
2427+
(KVM_X86_QUIRK_CD_NW_CLEARED | \
2428+
KVM_X86_QUIRK_IGNORE_GUEST_PAT)
24242429

24252430
/*
24262431
* KVM previously used a u32 field in kvm_run to indicate the hypercall was

arch/x86/include/asm/shared/tdx.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@
6767
#define TD_CTLS_LOCK BIT_ULL(TD_CTLS_LOCK_BIT)
6868

6969
/* TDX hypercall Leaf IDs */
70+
#define TDVMCALL_GET_TD_VM_CALL_INFO 0x10000
7071
#define TDVMCALL_MAP_GPA 0x10001
7172
#define TDVMCALL_GET_QUOTE 0x10002
7273
#define TDVMCALL_REPORT_FATAL_ERROR 0x10003

arch/x86/include/asm/vmx.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -585,12 +585,14 @@ enum vm_entry_failure_code {
585585
#define EPT_VIOLATION_ACC_WRITE_BIT 1
586586
#define EPT_VIOLATION_ACC_INSTR_BIT 2
587587
#define EPT_VIOLATION_RWX_SHIFT 3
588+
#define EPT_VIOLATION_EXEC_R3_LIN_BIT 6
588589
#define EPT_VIOLATION_GVA_IS_VALID_BIT 7
589590
#define EPT_VIOLATION_GVA_TRANSLATED_BIT 8
590591
#define EPT_VIOLATION_ACC_READ (1 << EPT_VIOLATION_ACC_READ_BIT)
591592
#define EPT_VIOLATION_ACC_WRITE (1 << EPT_VIOLATION_ACC_WRITE_BIT)
592593
#define EPT_VIOLATION_ACC_INSTR (1 << EPT_VIOLATION_ACC_INSTR_BIT)
593594
#define EPT_VIOLATION_RWX_MASK (VMX_EPT_RWX_MASK << EPT_VIOLATION_RWX_SHIFT)
595+
#define EPT_VIOLATION_EXEC_FOR_RING3_LIN (1 << EPT_VIOLATION_EXEC_R3_LIN_BIT)
594596
#define EPT_VIOLATION_GVA_IS_VALID (1 << EPT_VIOLATION_GVA_IS_VALID_BIT)
595597
#define EPT_VIOLATION_GVA_TRANSLATED (1 << EPT_VIOLATION_GVA_TRANSLATED_BIT)
596598

arch/x86/include/uapi/asm/kvm.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -441,6 +441,7 @@ struct kvm_sync_regs {
441441
#define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS (1 << 6)
442442
#define KVM_X86_QUIRK_SLOT_ZAP_ALL (1 << 7)
443443
#define KVM_X86_QUIRK_STUFF_FEATURE_MSRS (1 << 8)
444+
#define KVM_X86_QUIRK_IGNORE_GUEST_PAT (1 << 9)
444445

445446
#define KVM_STATE_NESTED_FORMAT_VMX 0
446447
#define KVM_STATE_NESTED_FORMAT_SVM 1

arch/x86/kvm/mmu.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -232,7 +232,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
232232
return -(u32)fault & errcode;
233233
}
234234

235-
bool kvm_mmu_may_ignore_guest_pat(void);
235+
bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm);
236236

237237
int kvm_mmu_post_init_vm(struct kvm *kvm);
238238
void kvm_mmu_pre_destroy_vm(struct kvm *kvm);

0 commit comments

Comments
 (0)