Skip to content

Commit 4bb3c7a

Browse files
paulusmackmpe
authored andcommitted
KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
POWER9 has hardware bugs relating to transactional memory and thread reconfiguration (changes to hardware SMT mode). Specifically, the core does not have enough storage to store a complete checkpoint of all the architected state for all four threads. The DD2.2 version of POWER9 includes hardware modifications designed to allow hypervisor software to implement workarounds for these problems. This patch implements those workarounds in KVM code so that KVM guests see a full, working transactional memory implementation. The problems center around the use of TM suspended state, where the CPU has a checkpointed state but execution is not transactional. The workaround is to implement a "fake suspend" state, which looks to the guest like suspended state but the CPU does not store a checkpoint. In this state, any instruction that would cause a transition to transactional state (rfid, rfebb, mtmsrd, tresume) or would use the checkpointed state (treclaim) causes a "soft patch" interrupt (vector 0x1500) to the hypervisor so that it can be emulated. The trechkpt instruction also causes a soft patch interrupt. On POWER9 DD2.2, we avoid returning to the guest in any state which would require a checkpoint to be present. The trechkpt in the guest entry path which would normally create that checkpoint is replaced by either a transition to fake suspend state, if the guest is in suspend state, or a rollback to the pre-transactional state if the guest is in transactional state. Fake suspend state is indicated by a flag in the PACA plus a new bit in the PSSCR. The new PSSCR bit is write-only and reads back as 0. On exit from the guest, if the guest is in fake suspend state, we still do the treclaim instruction as we would in real suspend state, in order to get into non-transactional state, but we do not save the resulting register state since there was no checkpoint. Emulation of the instructions that cause a softpatch interrupt is handled in two paths. If the guest is in real suspend mode, we call kvmhv_p9_tm_emulation_early() to handle the cases where the guest is transitioning to transactional state. This is called before we do the treclaim in the guest exit path; because we haven't done treclaim, we can get back to the guest with the transaction still active. If the instruction is a case that kvmhv_p9_tm_emulation_early() doesn't handle, or if the guest is in fake suspend state, then we proceed to do the complete guest exit path and subsequently call kvmhv_p9_tm_emulation() in host context with the MMU on. This handles all the cases including the cases that generate program interrupts (illegal instruction or TM Bad Thing) and facility unavailable interrupts. The emulation is reasonably straightforward and is mostly concerned with checking for exception conditions and updating the state of registers such as MSR and CR0. The treclaim emulation takes care to ensure that the TEXASR register gets updated as if it were the guest treclaim instruction that had done failure recording, not the treclaim done in hypervisor state in the guest exit path. With this, the KVM_CAP_PPC_HTM capability returns true (1) even if transactional memory is not available to host userspace. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
1 parent 7672691 commit 4bb3c7a

16 files changed

+557
-10
lines changed

arch/powerpc/include/asm/kvm_asm.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,8 @@
108108

109109
/* book3s_hv */
110110

111+
#define BOOK3S_INTERRUPT_HV_SOFTPATCH 0x1500
112+
111113
/*
112114
* Special trap used to indicate to host that this is a
113115
* passthrough interrupt that could not be handled

arch/powerpc/include/asm/kvm_book3s.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,10 @@ extern void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr,
241241
unsigned long mask);
242242
extern void kvmppc_set_fscr(struct kvm_vcpu *vcpu, u64 fscr);
243243

244+
extern int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu);
245+
extern int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu);
246+
extern void kvmhv_emulate_tm_rollback(struct kvm_vcpu *vcpu);
247+
244248
extern void kvmppc_entry_trampoline(void);
245249
extern void kvmppc_hv_entry_trampoline(void);
246250
extern u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst);

arch/powerpc/include/asm/kvm_book3s_64.h

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -472,6 +472,49 @@ static inline void set_dirty_bits_atomic(unsigned long *map, unsigned long i,
472472
set_bit_le(i, map);
473473
}
474474

475+
static inline u64 sanitize_msr(u64 msr)
476+
{
477+
msr &= ~MSR_HV;
478+
msr |= MSR_ME;
479+
return msr;
480+
}
481+
482+
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
483+
static inline void copy_from_checkpoint(struct kvm_vcpu *vcpu)
484+
{
485+
vcpu->arch.cr = vcpu->arch.cr_tm;
486+
vcpu->arch.xer = vcpu->arch.xer_tm;
487+
vcpu->arch.lr = vcpu->arch.lr_tm;
488+
vcpu->arch.ctr = vcpu->arch.ctr_tm;
489+
vcpu->arch.amr = vcpu->arch.amr_tm;
490+
vcpu->arch.ppr = vcpu->arch.ppr_tm;
491+
vcpu->arch.dscr = vcpu->arch.dscr_tm;
492+
vcpu->arch.tar = vcpu->arch.tar_tm;
493+
memcpy(vcpu->arch.gpr, vcpu->arch.gpr_tm,
494+
sizeof(vcpu->arch.gpr));
495+
vcpu->arch.fp = vcpu->arch.fp_tm;
496+
vcpu->arch.vr = vcpu->arch.vr_tm;
497+
vcpu->arch.vrsave = vcpu->arch.vrsave_tm;
498+
}
499+
500+
static inline void copy_to_checkpoint(struct kvm_vcpu *vcpu)
501+
{
502+
vcpu->arch.cr_tm = vcpu->arch.cr;
503+
vcpu->arch.xer_tm = vcpu->arch.xer;
504+
vcpu->arch.lr_tm = vcpu->arch.lr;
505+
vcpu->arch.ctr_tm = vcpu->arch.ctr;
506+
vcpu->arch.amr_tm = vcpu->arch.amr;
507+
vcpu->arch.ppr_tm = vcpu->arch.ppr;
508+
vcpu->arch.dscr_tm = vcpu->arch.dscr;
509+
vcpu->arch.tar_tm = vcpu->arch.tar;
510+
memcpy(vcpu->arch.gpr_tm, vcpu->arch.gpr,
511+
sizeof(vcpu->arch.gpr));
512+
vcpu->arch.fp_tm = vcpu->arch.fp;
513+
vcpu->arch.vr_tm = vcpu->arch.vr;
514+
vcpu->arch.vrsave_tm = vcpu->arch.vrsave;
515+
}
516+
#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
517+
475518
#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
476519

477520
#endif /* __ASM_KVM_BOOK3S_64_H__ */

arch/powerpc/include/asm/kvm_book3s_asm.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,7 @@ struct kvmppc_host_state {
119119
u8 host_ipi;
120120
u8 ptid; /* thread number within subcore when split */
121121
u8 tid; /* thread number within whole core */
122+
u8 fake_suspend;
122123
struct kvm_vcpu *kvm_vcpu;
123124
struct kvmppc_vcore *kvm_vcore;
124125
void __iomem *xics_phys;

arch/powerpc/include/asm/kvm_host.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -610,6 +610,7 @@ struct kvm_vcpu_arch {
610610
u64 tfhar;
611611
u64 texasr;
612612
u64 tfiar;
613+
u64 orig_texasr;
613614

614615
u32 cr_tm;
615616
u64 xer_tm;

arch/powerpc/include/asm/ppc-opcode.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -232,15 +232,18 @@
232232
#define PPC_INST_MSGSYNC 0x7c0006ec
233233
#define PPC_INST_MSGSNDP 0x7c00011c
234234
#define PPC_INST_MSGCLRP 0x7c00015c
235+
#define PPC_INST_MTMSRD 0x7c000164
235236
#define PPC_INST_MTTMR 0x7c0003dc
236237
#define PPC_INST_NOP 0x60000000
237238
#define PPC_INST_PASTE 0x7c20070d
238239
#define PPC_INST_POPCNTB 0x7c0000f4
239240
#define PPC_INST_POPCNTB_MASK 0xfc0007fe
240241
#define PPC_INST_POPCNTD 0x7c0003f4
241242
#define PPC_INST_POPCNTW 0x7c0002f4
243+
#define PPC_INST_RFEBB 0x4c000124
242244
#define PPC_INST_RFCI 0x4c000066
243245
#define PPC_INST_RFDI 0x4c00004e
246+
#define PPC_INST_RFID 0x4c000024
244247
#define PPC_INST_RFMCI 0x4c00004c
245248
#define PPC_INST_MFSPR 0x7c0002a6
246249
#define PPC_INST_MFSPR_DSCR 0x7c1102a6
@@ -277,6 +280,7 @@
277280
#define PPC_INST_TRECHKPT 0x7c0007dd
278281
#define PPC_INST_TRECLAIM 0x7c00075d
279282
#define PPC_INST_TABORT 0x7c00071d
283+
#define PPC_INST_TSR 0x7c0005dd
280284

281285
#define PPC_INST_NAP 0x4c000364
282286
#define PPC_INST_SLEEP 0x4c0003a4

arch/powerpc/include/asm/reg.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,8 @@
156156
#define PSSCR_SD 0x00400000 /* Status Disable */
157157
#define PSSCR_PLS 0xf000000000000000 /* Power-saving Level Status */
158158
#define PSSCR_GUEST_VIS 0xf0000000000003ff /* Guest-visible PSSCR fields */
159+
#define PSSCR_FAKE_SUSPEND 0x00000400 /* Fake-suspend bit (P9 DD2.2) */
160+
#define PSSCR_FAKE_SUSPEND_LG 10 /* Fake-suspend bit position */
159161

160162
/* Floating Point Status and Control Register (FPSCR) Fields */
161163
#define FPSCR_FX 0x80000000 /* FPU exception summary */
@@ -237,7 +239,12 @@
237239
#define SPRN_TFIAR 0x81 /* Transaction Failure Inst Addr */
238240
#define SPRN_TEXASR 0x82 /* Transaction EXception & Summary */
239241
#define SPRN_TEXASRU 0x83 /* '' '' '' Upper 32 */
242+
#define TEXASR_ABORT __MASK(63-31) /* terminated by tabort or treclaim */
243+
#define TEXASR_SUSP __MASK(63-32) /* tx failed in suspended state */
244+
#define TEXASR_HV __MASK(63-34) /* MSR[HV] when failure occurred */
245+
#define TEXASR_PR __MASK(63-35) /* MSR[PR] when failure occurred */
240246
#define TEXASR_FS __MASK(63-36) /* TEXASR Failure Summary */
247+
#define TEXASR_EXACT __MASK(63-37) /* TFIAR value is exact */
241248
#define SPRN_TFHAR 0x80 /* Transaction Failure Handler Addr */
242249
#define SPRN_TIDR 144 /* Thread ID register */
243250
#define SPRN_CTRLF 0x088

arch/powerpc/kernel/asm-offsets.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -568,6 +568,7 @@ int main(void)
568568
OFFSET(VCPU_TFHAR, kvm_vcpu, arch.tfhar);
569569
OFFSET(VCPU_TFIAR, kvm_vcpu, arch.tfiar);
570570
OFFSET(VCPU_TEXASR, kvm_vcpu, arch.texasr);
571+
OFFSET(VCPU_ORIG_TEXASR, kvm_vcpu, arch.orig_texasr);
571572
OFFSET(VCPU_GPR_TM, kvm_vcpu, arch.gpr_tm);
572573
OFFSET(VCPU_FPRS_TM, kvm_vcpu, arch.fp_tm.fpr);
573574
OFFSET(VCPU_VRS_TM, kvm_vcpu, arch.vr_tm.vr);
@@ -650,6 +651,7 @@ int main(void)
650651
HSTATE_FIELD(HSTATE_HOST_IPI, host_ipi);
651652
HSTATE_FIELD(HSTATE_PTID, ptid);
652653
HSTATE_FIELD(HSTATE_TID, tid);
654+
HSTATE_FIELD(HSTATE_FAKE_SUSPEND, fake_suspend);
653655
HSTATE_FIELD(HSTATE_MMCR0, host_mmcr[0]);
654656
HSTATE_FIELD(HSTATE_MMCR1, host_mmcr[1]);
655657
HSTATE_FIELD(HSTATE_MMCRA, host_mmcr[2]);

arch/powerpc/kernel/cputable.c

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -569,7 +569,6 @@ static struct cpu_spec __initdata cpu_specs[] = {
569569
.oprofile_type = PPC_OPROFILE_INVALID,
570570
.cpu_setup = __setup_cpu_power9,
571571
.cpu_restore = __restore_cpu_power9,
572-
.flush_tlb = __flush_tlb_power9,
573572
.machine_check_early = __machine_check_early_realmode_p9,
574573
.platform = "power9",
575574
},

arch/powerpc/kernel/exceptions-64s.S

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1273,7 +1273,7 @@ EXC_REAL_BEGIN(denorm_exception_hv, 0x1500, 0x100)
12731273
bne+ denorm_assist
12741274
#endif
12751275

1276-
KVMTEST_PR(0x1500)
1276+
KVMTEST_HV(0x1500)
12771277
EXCEPTION_PROLOG_PSERIES_1(denorm_common, EXC_HV)
12781278
EXC_REAL_END(denorm_exception_hv, 0x1500, 0x100)
12791279

@@ -1285,7 +1285,7 @@ EXC_VIRT_END(denorm_exception, 0x5500, 0x100)
12851285
EXC_VIRT_NONE(0x5500, 0x100)
12861286
#endif
12871287

1288-
TRAMP_KVM_SKIP(PACA_EXGEN, 0x1500)
1288+
TRAMP_KVM_HV(PACA_EXGEN, 0x1500)
12891289

12901290
#ifdef CONFIG_PPC_DENORMALISATION
12911291
TRAMP_REAL_BEGIN(denorm_assist)

arch/powerpc/kvm/Makefile

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,16 +74,23 @@ kvm-hv-y += \
7474
book3s_64_mmu_hv.o \
7575
book3s_64_mmu_radix.o
7676

77+
kvm-hv-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \
78+
book3s_hv_tm.o
79+
7780
kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XICS) := \
7881
book3s_hv_rm_xics.o book3s_hv_rm_xive.o
7982

83+
kvm-book3s_64-builtin-tm-objs-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \
84+
book3s_hv_tm_builtin.o
85+
8086
ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
8187
kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
8288
book3s_hv_hmi.o \
8389
book3s_hv_rmhandlers.o \
8490
book3s_hv_rm_mmu.o \
8591
book3s_hv_ras.o \
8692
book3s_hv_builtin.o \
93+
$(kvm-book3s_64-builtin-tm-objs-y) \
8794
$(kvm-book3s_64-builtin-xics-objs-y)
8895
endif
8996

arch/powerpc/kvm/book3s_hv.c

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1206,6 +1206,19 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
12061206
r = RESUME_GUEST;
12071207
}
12081208
break;
1209+
1210+
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
1211+
case BOOK3S_INTERRUPT_HV_SOFTPATCH:
1212+
/*
1213+
* This occurs for various TM-related instructions that
1214+
* we need to emulate on POWER9 DD2.2. We have already
1215+
* handled the cases where the guest was in real-suspend
1216+
* mode and was transitioning to transactional state.
1217+
*/
1218+
r = kvmhv_p9_tm_emulation(vcpu);
1219+
break;
1220+
#endif
1221+
12091222
case BOOK3S_INTERRUPT_HV_RM_HARD:
12101223
r = RESUME_PASSTHROUGH;
12111224
break;
@@ -1978,7 +1991,9 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
19781991
* turn off the HFSCR bit, which causes those instructions to trap.
19791992
*/
19801993
vcpu->arch.hfscr = mfspr(SPRN_HFSCR);
1981-
if (!cpu_has_feature(CPU_FTR_TM))
1994+
if (cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
1995+
vcpu->arch.hfscr |= HFSCR_TM;
1996+
else if (!cpu_has_feature(CPU_FTR_TM_COMP))
19821997
vcpu->arch.hfscr &= ~HFSCR_TM;
19831998
if (cpu_has_feature(CPU_FTR_ARCH_300))
19841999
vcpu->arch.hfscr &= ~HFSCR_MSGP;
@@ -2242,6 +2257,7 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu, struct kvmppc_vcore *vc)
22422257
tpaca = &paca[cpu];
22432258
tpaca->kvm_hstate.kvm_vcpu = vcpu;
22442259
tpaca->kvm_hstate.ptid = cpu - vc->pcpu;
2260+
tpaca->kvm_hstate.fake_suspend = 0;
22452261
/* Order stores to hstate.kvm_vcpu etc. before store to kvm_vcore */
22462262
smp_wmb();
22472263
tpaca->kvm_hstate.kvm_vcore = vc;

0 commit comments

Comments
 (0)