Skip to content

Commit fd02aa4

Browse files
committed
Merge branch 'kvm-tdx-initial' into HEAD
This large commit contains the initial support for TDX in KVM. All x86 parts enable the host-side hypercalls that KVM uses to talk to the TDX module, a software component that runs in a special CPU mode called SEAM (Secure Arbitration Mode). The series is in turn split into multiple sub-series, each with a separate merge commit: - Initialization: basic setup for using the TDX module from KVM, plus ioctls to create TDX VMs and vCPUs. - MMU: in TDX, private and shared halves of the address space are mapped by different EPT roots, and the private half is managed by the TDX module. Using the support that was added to the generic MMU code in 6.14, add support for TDX's secure page tables to the Intel side of KVM. Generic KVM code takes care of maintaining a mirror of the secure page tables so that they can be queried efficiently, and ensuring that changes are applied to both the mirror and the secure EPT. - vCPU enter/exit: implement the callbacks that handle the entry of a TDX vCPU (via the SEAMCALL TDH.VP.ENTER) and the corresponding save/restore of host state. - Userspace exits: introduce support for guest TDVMCALLs that KVM forwards to userspace. These correspond to the usual KVM_EXIT_* "heavyweight vmexits" but are triggered through a different mechanism, similar to VMGEXIT for SEV-ES and SEV-SNP. - Interrupt handling: support for virtual interrupt injection as well as handling VM-Exits that are caused by vectored events. Exclusive to TDX are machine-check SMIs, which the kernel already knows how to handle through the kernel machine check handler (commit 7911f14, "x86/mce: Implement recovery for errors in TDX/SEAM non-root mode") - Loose ends: handling of the remaining exits from the TDX module, including EPT violation/misconfig and several TDVMCALL leaves that are handled in the kernel (CPUID, HLT, RDMSR/WRMSR, GetTdVmCallInfo); plus returning an error or ignoring operations that are not supported by TDX guests Signed-off-by: Paolo Bonzini <[email protected]>
2 parents 7d76856 + 7bcf724 commit fd02aa4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+6822
-583
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 37 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1411,6 +1411,9 @@ the memory region are automatically reflected into the guest. For example, an
14111411
mmap() that affects the region will be made visible immediately. Another
14121412
example is madvise(MADV_DROP).
14131413

1414+
For TDX guest, deleting/moving memory region loses guest memory contents.
1415+
Read only region isn't supported. Only as-id 0 is supported.
1416+
14141417
Note: On arm64, a write generated by the page-table walker (to update
14151418
the Access and Dirty flags, for example) never results in a
14161419
KVM_EXIT_MMIO exit when the slot has the KVM_MEM_READONLY flag. This
@@ -4768,17 +4771,19 @@ H_GET_CPU_CHARACTERISTICS hypercall.
47684771

47694772
:Capability: basic
47704773
:Architectures: x86
4771-
:Type: vm
4774+
:Type: vm ioctl, vcpu ioctl
47724775
:Parameters: an opaque platform specific structure (in/out)
47734776
:Returns: 0 on success; -1 on error
47744777

47754778
If the platform supports creating encrypted VMs then this ioctl can be used
47764779
for issuing platform-specific memory encryption commands to manage those
47774780
encrypted VMs.
47784781

4779-
Currently, this ioctl is used for issuing Secure Encrypted Virtualization
4780-
(SEV) commands on AMD Processors. The SEV commands are defined in
4781-
Documentation/virt/kvm/x86/amd-memory-encryption.rst.
4782+
Currently, this ioctl is used for issuing both Secure Encrypted Virtualization
4783+
(SEV) commands on AMD Processors and Trusted Domain Extensions (TDX) commands
4784+
on Intel Processors. The detailed commands are defined in
4785+
Documentation/virt/kvm/x86/amd-memory-encryption.rst and
4786+
Documentation/virt/kvm/x86/intel-tdx.rst.
47824787

47834788
4.111 KVM_MEMORY_ENCRYPT_REG_REGION
47844789
-----------------------------------
@@ -6827,6 +6832,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
68276832
#define KVM_SYSTEM_EVENT_WAKEUP 4
68286833
#define KVM_SYSTEM_EVENT_SUSPEND 5
68296834
#define KVM_SYSTEM_EVENT_SEV_TERM 6
6835+
#define KVM_SYSTEM_EVENT_TDX_FATAL 7
68306836
__u32 type;
68316837
__u32 ndata;
68326838
__u64 data[16];
@@ -6853,6 +6859,11 @@ Valid values for 'type' are:
68536859
reset/shutdown of the VM.
68546860
- KVM_SYSTEM_EVENT_SEV_TERM -- an AMD SEV guest requested termination.
68556861
The guest physical address of the guest's GHCB is stored in `data[0]`.
6862+
- KVM_SYSTEM_EVENT_TDX_FATAL -- a TDX guest reported a fatal error state.
6863+
KVM doesn't do any parsing or conversion, it just dumps 16 general-purpose
6864+
registers to userspace, in ascending order of the 4-bit indices for x86-64
6865+
general-purpose registers in instruction encoding, as defined in the Intel
6866+
SDM.
68566867
- KVM_SYSTEM_EVENT_WAKEUP -- the exiting vCPU is in a suspended state and
68576868
KVM has recognized a wakeup event. Userspace may honor this event by
68586869
marking the exiting vCPU as runnable, or deny it and call KVM_RUN again.
@@ -8194,6 +8205,28 @@ KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the
81948205
and 0x489), as KVM does now allow them to
81958206
be set by userspace (KVM sets them based on
81968207
guest CPUID, for safety purposes).
8208+
8209+
KVM_X86_QUIRK_IGNORE_GUEST_PAT By default, on Intel platforms, KVM ignores
8210+
guest PAT and forces the effective memory
8211+
type to WB in EPT. The quirk is not available
8212+
on Intel platforms which are incapable of
8213+
safely honoring guest PAT (i.e., without CPU
8214+
self-snoop, KVM always ignores guest PAT and
8215+
forces effective memory type to WB). It is
8216+
also ignored on AMD platforms or, on Intel,
8217+
when a VM has non-coherent DMA devices
8218+
assigned; KVM always honors guest PAT in
8219+
such case. The quirk is needed to avoid
8220+
slowdowns on certain Intel Xeon platforms
8221+
(e.g. ICX, SPR) where self-snoop feature is
8222+
supported but UC is slow enough to cause
8223+
issues with some older guests that use
8224+
UC instead of WC to map the video RAM.
8225+
Userspace can disable the quirk to honor
8226+
guest PAT if it knows that there is no such
8227+
guest software, for example if it does not
8228+
expose a bochs graphics device (which is
8229+
known to have had a buggy driver).
81978230
=================================== ============================================
81988231

81998232
7.32 KVM_CAP_MAX_VCPU_ID

Documentation/virt/kvm/x86/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ KVM for x86 systems
1111
cpuid
1212
errata
1313
hypercalls
14+
intel-tdx
1415
mmu
1516
msr
1617
nested-vmx
Lines changed: 255 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,255 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
===================================
4+
Intel Trust Domain Extensions (TDX)
5+
===================================
6+
7+
Overview
8+
========
9+
Intel's Trust Domain Extensions (TDX) protect confidential guest VMs from the
10+
host and physical attacks. A CPU-attested software module called 'the TDX
11+
module' runs inside a new CPU isolated range to provide the functionalities to
12+
manage and run protected VMs, a.k.a, TDX guests or TDs.
13+
14+
Please refer to [1] for the whitepaper, specifications and other resources.
15+
16+
This documentation describes TDX-specific KVM ABIs. The TDX module needs to be
17+
initialized before it can be used by KVM to run any TDX guests. The host
18+
core-kernel provides the support of initializing the TDX module, which is
19+
described in the Documentation/arch/x86/tdx.rst.
20+
21+
API description
22+
===============
23+
24+
KVM_MEMORY_ENCRYPT_OP
25+
---------------------
26+
:Type: vm ioctl, vcpu ioctl
27+
28+
For TDX operations, KVM_MEMORY_ENCRYPT_OP is re-purposed to be generic
29+
ioctl with TDX specific sub-ioctl() commands.
30+
31+
::
32+
33+
/* Trust Domain Extensions sub-ioctl() commands. */
34+
enum kvm_tdx_cmd_id {
35+
KVM_TDX_CAPABILITIES = 0,
36+
KVM_TDX_INIT_VM,
37+
KVM_TDX_INIT_VCPU,
38+
KVM_TDX_INIT_MEM_REGION,
39+
KVM_TDX_FINALIZE_VM,
40+
KVM_TDX_GET_CPUID,
41+
42+
KVM_TDX_CMD_NR_MAX,
43+
};
44+
45+
struct kvm_tdx_cmd {
46+
/* enum kvm_tdx_cmd_id */
47+
__u32 id;
48+
/* flags for sub-command. If sub-command doesn't use this, set zero. */
49+
__u32 flags;
50+
/*
51+
* data for each sub-command. An immediate or a pointer to the actual
52+
* data in process virtual address. If sub-command doesn't use it,
53+
* set zero.
54+
*/
55+
__u64 data;
56+
/*
57+
* Auxiliary error code. The sub-command may return TDX SEAMCALL
58+
* status code in addition to -Exxx.
59+
*/
60+
__u64 hw_error;
61+
};
62+
63+
KVM_TDX_CAPABILITIES
64+
--------------------
65+
:Type: vm ioctl
66+
:Returns: 0 on success, <0 on error
67+
68+
Return the TDX capabilities that current KVM supports with the specific TDX
69+
module loaded in the system. It reports what features/capabilities are allowed
70+
to be configured to the TDX guest.
71+
72+
- id: KVM_TDX_CAPABILITIES
73+
- flags: must be 0
74+
- data: pointer to struct kvm_tdx_capabilities
75+
- hw_error: must be 0
76+
77+
::
78+
79+
struct kvm_tdx_capabilities {
80+
__u64 supported_attrs;
81+
__u64 supported_xfam;
82+
__u64 reserved[254];
83+
84+
/* Configurable CPUID bits for userspace */
85+
struct kvm_cpuid2 cpuid;
86+
};
87+
88+
89+
KVM_TDX_INIT_VM
90+
---------------
91+
:Type: vm ioctl
92+
:Returns: 0 on success, <0 on error
93+
94+
Perform TDX specific VM initialization. This needs to be called after
95+
KVM_CREATE_VM and before creating any VCPUs.
96+
97+
- id: KVM_TDX_INIT_VM
98+
- flags: must be 0
99+
- data: pointer to struct kvm_tdx_init_vm
100+
- hw_error: must be 0
101+
102+
::
103+
104+
struct kvm_tdx_init_vm {
105+
__u64 attributes;
106+
__u64 xfam;
107+
__u64 mrconfigid[6]; /* sha384 digest */
108+
__u64 mrowner[6]; /* sha384 digest */
109+
__u64 mrownerconfig[6]; /* sha384 digest */
110+
111+
/* The total space for TD_PARAMS before the CPUIDs is 256 bytes */
112+
__u64 reserved[12];
113+
114+
/*
115+
* Call KVM_TDX_INIT_VM before vcpu creation, thus before
116+
* KVM_SET_CPUID2.
117+
* This configuration supersedes KVM_SET_CPUID2s for VCPUs because the
118+
* TDX module directly virtualizes those CPUIDs without VMM. The user
119+
* space VMM, e.g. qemu, should make KVM_SET_CPUID2 consistent with
120+
* those values. If it doesn't, KVM may have wrong idea of vCPUIDs of
121+
* the guest, and KVM may wrongly emulate CPUIDs or MSRs that the TDX
122+
* module doesn't virtualize.
123+
*/
124+
struct kvm_cpuid2 cpuid;
125+
};
126+
127+
128+
KVM_TDX_INIT_VCPU
129+
-----------------
130+
:Type: vcpu ioctl
131+
:Returns: 0 on success, <0 on error
132+
133+
Perform TDX specific VCPU initialization.
134+
135+
- id: KVM_TDX_INIT_VCPU
136+
- flags: must be 0
137+
- data: initial value of the guest TD VCPU RCX
138+
- hw_error: must be 0
139+
140+
KVM_TDX_INIT_MEM_REGION
141+
-----------------------
142+
:Type: vcpu ioctl
143+
:Returns: 0 on success, <0 on error
144+
145+
Initialize @nr_pages TDX guest private memory starting from @gpa with userspace
146+
provided data from @source_addr.
147+
148+
Note, before calling this sub command, memory attribute of the range
149+
[gpa, gpa + nr_pages] needs to be private. Userspace can use
150+
KVM_SET_MEMORY_ATTRIBUTES to set the attribute.
151+
152+
If KVM_TDX_MEASURE_MEMORY_REGION flag is specified, it also extends measurement.
153+
154+
- id: KVM_TDX_INIT_MEM_REGION
155+
- flags: currently only KVM_TDX_MEASURE_MEMORY_REGION is defined
156+
- data: pointer to struct kvm_tdx_init_mem_region
157+
- hw_error: must be 0
158+
159+
::
160+
161+
#define KVM_TDX_MEASURE_MEMORY_REGION (1UL << 0)
162+
163+
struct kvm_tdx_init_mem_region {
164+
__u64 source_addr;
165+
__u64 gpa;
166+
__u64 nr_pages;
167+
};
168+
169+
170+
KVM_TDX_FINALIZE_VM
171+
-------------------
172+
:Type: vm ioctl
173+
:Returns: 0 on success, <0 on error
174+
175+
Complete measurement of the initial TD contents and mark it ready to run.
176+
177+
- id: KVM_TDX_FINALIZE_VM
178+
- flags: must be 0
179+
- data: must be 0
180+
- hw_error: must be 0
181+
182+
183+
KVM_TDX_GET_CPUID
184+
-----------------
185+
:Type: vcpu ioctl
186+
:Returns: 0 on success, <0 on error
187+
188+
Get the CPUID values that the TDX module virtualizes for the TD guest.
189+
When it returns -E2BIG, the user space should allocate a larger buffer and
190+
retry. The minimum buffer size is updated in the nent field of the
191+
struct kvm_cpuid2.
192+
193+
- id: KVM_TDX_GET_CPUID
194+
- flags: must be 0
195+
- data: pointer to struct kvm_cpuid2 (in/out)
196+
- hw_error: must be 0 (out)
197+
198+
::
199+
200+
struct kvm_cpuid2 {
201+
__u32 nent;
202+
__u32 padding;
203+
struct kvm_cpuid_entry2 entries[0];
204+
};
205+
206+
struct kvm_cpuid_entry2 {
207+
__u32 function;
208+
__u32 index;
209+
__u32 flags;
210+
__u32 eax;
211+
__u32 ebx;
212+
__u32 ecx;
213+
__u32 edx;
214+
__u32 padding[3];
215+
};
216+
217+
KVM TDX creation flow
218+
=====================
219+
In addition to the standard KVM flow, new TDX ioctls need to be called. The
220+
control flow is as follows:
221+
222+
#. Check system wide capability
223+
224+
* KVM_CAP_VM_TYPES: Check if VM type is supported and if KVM_X86_TDX_VM
225+
is supported.
226+
227+
#. Create VM
228+
229+
* KVM_CREATE_VM
230+
* KVM_TDX_CAPABILITIES: Query TDX capabilities for creating TDX guests.
231+
* KVM_CHECK_EXTENSION(KVM_CAP_MAX_VCPUS): Query maximum VCPUs the TD can
232+
support at VM level (TDX has its own limitation on this).
233+
* KVM_SET_TSC_KHZ: Configure TD's TSC frequency if a different TSC frequency
234+
than host is desired. This is Optional.
235+
* KVM_TDX_INIT_VM: Pass TDX specific VM parameters.
236+
237+
#. Create VCPU
238+
239+
* KVM_CREATE_VCPU
240+
* KVM_TDX_INIT_VCPU: Pass TDX specific VCPU parameters.
241+
* KVM_SET_CPUID2: Configure TD's CPUIDs.
242+
* KVM_SET_MSRS: Configure TD's MSRs.
243+
244+
#. Initialize initial guest memory
245+
246+
* Prepare content of initial guest memory.
247+
* KVM_TDX_INIT_MEM_REGION: Add initial guest memory.
248+
* KVM_TDX_FINALIZE_VM: Finalize the measurement of the TDX guest.
249+
250+
#. Run VCPU
251+
252+
References
253+
==========
254+
255+
.. [1] https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/documentation.html

arch/x86/include/asm/kvm-x86-ops.h

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ KVM_X86_OP(has_emulated_msr)
2121
KVM_X86_OP(vcpu_after_set_cpuid)
2222
KVM_X86_OP(vm_init)
2323
KVM_X86_OP_OPTIONAL(vm_destroy)
24+
KVM_X86_OP_OPTIONAL(vm_pre_destroy)
2425
KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate)
2526
KVM_X86_OP(vcpu_create)
2627
KVM_X86_OP(vcpu_free)
@@ -115,6 +116,7 @@ KVM_X86_OP_OPTIONAL(pi_start_assignment)
115116
KVM_X86_OP_OPTIONAL(apicv_pre_state_restore)
116117
KVM_X86_OP_OPTIONAL(apicv_post_state_restore)
117118
KVM_X86_OP_OPTIONAL_RET0(dy_apicv_has_pending_interrupt)
119+
KVM_X86_OP_OPTIONAL(protected_apic_has_interrupt)
118120
KVM_X86_OP_OPTIONAL(set_hv_timer)
119121
KVM_X86_OP_OPTIONAL(cancel_hv_timer)
120122
KVM_X86_OP(setup_mce)
@@ -125,7 +127,8 @@ KVM_X86_OP(leave_smm)
125127
KVM_X86_OP(enable_smi_window)
126128
#endif
127129
KVM_X86_OP_OPTIONAL(dev_get_attr)
128-
KVM_X86_OP_OPTIONAL(mem_enc_ioctl)
130+
KVM_X86_OP(mem_enc_ioctl)
131+
KVM_X86_OP_OPTIONAL(vcpu_mem_enc_ioctl)
129132
KVM_X86_OP_OPTIONAL(mem_enc_register_region)
130133
KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
131134
KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from)

0 commit comments

Comments
 (0)