Message ID | 20121122092555.GB31117@bloggs.ozlabs.ibm.com |
---|---|
State | New, archived |
Headers | show |
On 22.11.2012, at 10:25, Paul Mackerras wrote: > Currently, if a machine check interrupt happens while we are in the > guest, we exit the guest and call the host's machine check handler, > which tends to cause the host to panic. Some machine checks can be > triggered by the guest; for example, if the guest creates two entries > in the SLB that map the same effective address, and then accesses that > effective address, the CPU will take a machine check interrupt. > > To handle this better, we firstly don't call the host's machine check > handler for machine checks that happen inside the guest. Instead we > call a new function, kvmppc_realmode_machine_check(), while still in > real mode before exiting the guest. On POWER7, it handles the cases > that the guest can trigger, either by flushing and reloading the SLB, > or by flushing the TLB, and then it delivers the machine check interrupt > to the guest. (On PPC970 we currently just exit the guest and leave it > up to kvmppc_handle_exit() to handle the condition, which it doesn't, > so the situation is no better -- but no worse -- on PPC970 now.) > > Signed-off-by: Paul Mackerras <paulus@samba.org> > --- > arch/powerpc/kvm/Makefile | 1 + > arch/powerpc/kvm/book3s_hv_ras.c | 115 +++++++++++++++++++++++++++++++ > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 61 +++++++++------- > 3 files changed, 153 insertions(+), 24 deletions(-) > create mode 100644 arch/powerpc/kvm/book3s_hv_ras.c > > diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile > index cd89658..1e473d4 100644 > --- a/arch/powerpc/kvm/Makefile > +++ b/arch/powerpc/kvm/Makefile > @@ -73,6 +73,7 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \ > book3s_hv_rmhandlers.o \ > book3s_hv_rm_mmu.o \ > book3s_64_vio_hv.o \ > + book3s_hv_ras.o \ > book3s_hv_builtin.o > > kvm-book3s_64-module-objs := \ > diff --git a/arch/powerpc/kvm/book3s_hv_ras.c b/arch/powerpc/kvm/book3s_hv_ras.c > new file mode 100644 > index 0000000..2646d07 > --- /dev/null > +++ b/arch/powerpc/kvm/book3s_hv_ras.c > @@ -0,0 +1,115 @@ > +/* > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License, version 2, as > + * published by the Free Software Foundation. > + * > + * Copyright 2012 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com> > + */ > + > +#include <linux/types.h> > +#include <linux/string.h> > +#include <linux/kvm.h> > +#include <linux/kvm_host.h> > + > +/* SRR1 bits for machine check on POWER7 */ > +#define SRR1_MC_LDSTERR (1ul << (63-42)) > +#define SRR1_MC_IFETCH_SH (63-45) > +#define SRR1_MC_IFETCH_MASK 0x7 > +#define SRR1_MC_IFETCH_SLBPAR 2 /* SLB parity error */ > +#define SRR1_MC_IFETCH_SLBMULTI 3 /* SLB multi-hit */ > +#define SRR1_MC_IFETCH_SLBPARMULTI 4 /* SLB parity + multi-hit */ > +#define SRR1_MC_IFETCH_TLBMULTI 5 /* I-TLB multi-hit */ > + > +/* DSISR bits for machine check on POWER7 */ > +#define DSISR_MC_DERAT_MULTI 0x800 /* D-ERAT multi-hit */ > +#define DSISR_MC_TLB_MULTI 0x400 /* D-TLB multi-hit */ > +#define DSISR_MC_SLB_PARITY 0x100 /* SLB parity error */ > +#define DSISR_MC_SLB_MULTI 0x080 /* SLB multi-hit */ > +#define DSISR_MC_SLB_PARMULTI 0x040 /* SLB parity + multi-hit */ > + > +/* POWER7 SLB flush and reload */ > +static void reload_slb(struct kvm_vcpu *vcpu) > +{ > + struct slb_shadow *slb; > + unsigned long i, n; > + > + /* First clear out SLB */ > + asm volatile("slbmte %0,%0; slbia" : : "r" (0)); > + > + /* Do they have an SLB shadow buffer registered? */ > + slb = vcpu->arch.slb_shadow.pinned_addr; > + if (!slb) > + return; Mind to explain this case? What happens here? Do we leave the guest with an empty SLB? Why would this ever happen? What happens next as soon as we go back into the guest? > + > + /* Sanity check */ > + n = slb->persistent; > + if (n > SLB_MIN_SIZE) > + n = SLB_MIN_SIZE; Please use a max() macro here. > + if ((void *) &slb->save_area[n] > vcpu->arch.slb_shadow.pinned_end) > + return; > + > + /* Load up the SLB from that */ > + for (i = 0; i < n; ++i) { > + unsigned long rb = slb->save_area[i].esid; > + unsigned long rs = slb->save_area[i].vsid; > + > + rb = (rb & ~0xFFFul) | i; /* insert entry number */ > + asm volatile("slbmte %0,%1" : : "r" (rs), "r" (rb)); > + } > +} > + > +/* POWER7 TLB flush */ > +static void flush_tlb_power7(struct kvm_vcpu *vcpu) > +{ > + unsigned long i, rb; > + > + rb = 0x800; /* IS field = 0b10, flush congruence class */ > + for (i = 0; i < 128; ++i) { Please put up a #define for this. POWER7_TLB_SIZE or so. Is there any way to fetch that number from an SPR? I don't really want to have a p7+ and a p8 function in here too ;). > + asm volatile("tlbiel %0" : : "r" (rb)); > + rb += 0x1000; I assume this also means (1 << TLBIE_ENTRY_SHIFT)? Would be nice to keep the code readable without guessing :). So I take it that p7 does not implement tlbia? > + } > +} > + > +/* > + * On POWER7, see if we can handle a machine check that occurred inside > + * the guest in real mode, without switching to the host partition. > + * > + * Returns: 0 => exit guest, 1 => deliver machine check to guest > + */ > +static long kvmppc_realmode_mc_power7(struct kvm_vcpu *vcpu) > +{ > + unsigned long srr1 = vcpu->arch.shregs.msr; > + > + if (srr1 & SRR1_MC_LDSTERR) { > + /* error on load/store */ > + unsigned long dsisr = vcpu->arch.shregs.dsisr; > + > + if (dsisr & (DSISR_MC_SLB_PARMULTI | DSISR_MC_SLB_MULTI | > + DSISR_MC_SLB_PARITY | DSISR_MC_DERAT_MULTI)) > + /* flush and reload SLB; flushes D-ERAT too */ > + reload_slb(vcpu); > + if (dsisr & DSISR_MC_TLB_MULTI) > + flush_tlb_power7(vcpu); > + } > + > + switch ((srr1 >> SRR1_MC_IFETCH_SH) & SRR1_MC_IFETCH_MASK) { > + case SRR1_MC_IFETCH_SLBPAR: > + case SRR1_MC_IFETCH_SLBMULTI: > + case SRR1_MC_IFETCH_SLBPARMULTI: > + reload_slb(vcpu); > + break; > + case SRR1_MC_IFETCH_TLBMULTI: > + flush_tlb_power7(vcpu); > + break; > + } > + > + return 1; So we never return 0? How about ECC errors and the likes? Wouldn't those also be #MCs that the host needs to handle? > +} > + > +long kvmppc_realmode_machine_check(struct kvm_vcpu *vcpu) > +{ > + if (cpu_has_feature(CPU_FTR_ARCH_206)) > + return kvmppc_realmode_mc_power7(vcpu); > + > + return 0; > +} > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > index 690d112..b31fb4f 100644 > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > @@ -682,8 +682,7 @@ BEGIN_FTR_SECTION > 1: > END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) > > -nohpte_cont: > -hcall_real_cont: /* r9 = vcpu, r12 = trap, r13 = paca */ > +guest_exit_cont: /* r9 = vcpu, r12 = trap, r13 = paca */ > /* Save DEC */ > mfspr r5,SPRN_DEC > mftb r6 > @@ -704,6 +703,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) > std r6, VCPU_FAULT_DAR(r9) > stw r7, VCPU_FAULT_DSISR(r9) > > + /* See if it is a machine check */ > + cmpwi r12, BOOK3S_INTERRUPT_MACHINE_CHECK > + beq machine_check_realmode > +mc_cont: > + > /* Save guest CTRL register, set runlatch to 1 */ > 6: mfspr r6,SPRN_CTRLF > stw r6,VCPU_CTRL(r9) > @@ -1114,20 +1118,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201) > isync > 23: > /* > - * For external and machine check interrupts, we need > + * For external interrupts, we need > * to call the Linux handler to process the interrupt. > * We do that by jumping to the interrupt vector address > * which we have in r12. The [h]rfid at the end of the > * handler will return to the book3s_hv_interrupts.S code. > * For other interrupts we do the rfid to get back > - * to the book3s_interrupts.S code here. > + * to the book3s_hv_interrupts.S code here. > */ > ld r8, HSTATE_VMHANDLER(r13) > ld r7, HSTATE_HOST_MSR(r13) > > cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL > +BEGIN_FTR_SECTION > beq 11f > - cmpwi r12, BOOK3S_INTERRUPT_MACHINE_CHECK > +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) Mind to explain the logic that happens here? Do we get external interrupts on 970? If not, the cmpwi should also be inside the FTR section. Also, if we do a beq here, why do the beqctr below again? Alex > > /* RFI into the highmem handler, or branch to interrupt handler */ > 12: mfmsr r6 > @@ -1140,11 +1145,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201) > beqctr > RFI > > -11: > -BEGIN_FTR_SECTION > - b 12b > -END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201) > - mtspr SPRN_HSRR0, r8 > + /* On POWER7, we have external interrupts set to use HSRR0/1 */ > +11: mtspr SPRN_HSRR0, r8 > mtspr SPRN_HSRR1, r7 > ba 0x500 > > @@ -1180,7 +1182,7 @@ kvmppc_hdsi: > cmpdi r3, 0 /* retry the instruction */ > beq 6f > cmpdi r3, -1 /* handle in kernel mode */ > - beq nohpte_cont > + beq guest_exit_cont > cmpdi r3, -2 /* MMIO emulation; need instr word */ > beq 2f > > @@ -1194,6 +1196,7 @@ kvmppc_hdsi: > li r10, BOOK3S_INTERRUPT_DATA_STORAGE > li r11, (MSR_ME << 1) | 1 /* synthesize MSR_SF | MSR_ME */ > rotldi r11, r11, 63 > +fast_interrupt_c_return: > 6: ld r7, VCPU_CTR(r9) > lwz r8, VCPU_XER(r9) > mtctr r7 > @@ -1226,7 +1229,7 @@ kvmppc_hdsi: > /* Unset guest mode. */ > li r0, KVM_GUEST_MODE_NONE > stb r0, HSTATE_IN_GUEST(r13) > - b nohpte_cont > + b guest_exit_cont > > /* > * Similarly for an HISI, reflect it to the guest as an ISI unless > @@ -1252,9 +1255,9 @@ kvmppc_hisi: > ld r11, VCPU_MSR(r9) > li r12, BOOK3S_INTERRUPT_H_INST_STORAGE > cmpdi r3, 0 /* retry the instruction */ > - beq 6f > + beq fast_interrupt_c_return > cmpdi r3, -1 /* handle in kernel mode */ > - beq nohpte_cont > + beq guest_exit_cont > > /* Synthesize an ISI for the guest */ > mr r11, r3 > @@ -1263,12 +1266,7 @@ kvmppc_hisi: > li r10, BOOK3S_INTERRUPT_INST_STORAGE > li r11, (MSR_ME << 1) | 1 /* synthesize MSR_SF | MSR_ME */ > rotldi r11, r11, 63 > -6: ld r7, VCPU_CTR(r9) > - lwz r8, VCPU_XER(r9) > - mtctr r7 > - mtxer r8 > - mr r4, r9 > - b fast_guest_return > + b fast_interrupt_c_return > > 3: ld r6, VCPU_KVM(r9) /* not relocated, use VRMA */ > ld r5, KVM_VRMA_SLB_V(r6) > @@ -1284,14 +1282,14 @@ kvmppc_hisi: > hcall_try_real_mode: > ld r3,VCPU_GPR(R3)(r9) > andi. r0,r11,MSR_PR > - bne hcall_real_cont > + bne guest_exit_cont > clrrdi r3,r3,2 > cmpldi r3,hcall_real_table_end - hcall_real_table > - bge hcall_real_cont > + bge guest_exit_cont > LOAD_REG_ADDR(r4, hcall_real_table) > lwzx r3,r3,r4 > cmpwi r3,0 > - beq hcall_real_cont > + beq guest_exit_cont > add r3,r3,r4 > mtctr r3 > mr r3,r9 /* get vcpu pointer */ > @@ -1312,7 +1310,7 @@ hcall_real_fallback: > li r12,BOOK3S_INTERRUPT_SYSCALL > ld r9, HSTATE_KVM_VCPU(r13) > > - b hcall_real_cont > + b guest_exit_cont > > .globl hcall_real_table > hcall_real_table: > @@ -1571,6 +1569,21 @@ kvm_cede_exit: > li r3,H_TOO_HARD > blr > > + /* Try to handle a machine check in real mode */ > +machine_check_realmode: > + mr r3, r9 /* get vcpu pointer */ > + bl .kvmppc_realmode_machine_check > + nop > + cmpdi r3, 0 /* continue exiting from guest? */ > + ld r9, HSTATE_KVM_VCPU(r13) > + li r12, BOOK3S_INTERRUPT_MACHINE_CHECK > + beq mc_cont > + /* If not, deliver a machine check. SRR0/1 are already set */ > + li r10, BOOK3S_INTERRUPT_MACHINE_CHECK > + li r11, (MSR_ME << 1) | 1 /* synthesize MSR_SF | MSR_ME */ > + rotldi r11, r11, 63 > + b fast_interrupt_c_return > + > secondary_too_late: > ld r5,HSTATE_KVM_VCORE(r13) > HMT_LOW > -- > 1.7.10.rc3.219.g53414 > -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 23, 2012 at 03:13:09PM +0100, Alexander Graf wrote: > > On 22.11.2012, at 10:25, Paul Mackerras wrote: > > > + /* Do they have an SLB shadow buffer registered? */ > > + slb = vcpu->arch.slb_shadow.pinned_addr; > > + if (!slb) > > + return; > > Mind to explain this case? What happens here? Do we leave the guest with an empty SLB? Why would this ever happen? What happens next as soon as we go back into the guest? Yes, we leave the guest with an empty SLB, the access gets retried and this time the guest gets an SLB miss interrupt, which it can hopefully handle using an SLB miss handler that runs entirely in real mode. This could happen for instance while the guest is in SLOF or yaboot or some other code that runs basically in real mode but occasionally turns the MMU on for some accesses, and happens to have a bug where it creates a duplicate SLB entry. > > + /* Sanity check */ > > + n = slb->persistent; > > + if (n > SLB_MIN_SIZE) > > + n = SLB_MIN_SIZE; > > Please use a max() macro here. OK. > > + rb = 0x800; /* IS field = 0b10, flush congruence class */ > > + for (i = 0; i < 128; ++i) { > > Please put up a #define for this. POWER7_TLB_SIZE or so. Is there any way to fetch that number from an SPR? I don't really want to have a p7+ and a p8 function in here too ;). > > > + asm volatile("tlbiel %0" : : "r" (rb)); > > + rb += 0x1000; > > I assume this also means (1 << TLBIE_ENTRY_SHIFT)? Would be nice to keep the code readable without guessing :). The 0x800 and 0x1000 are taken from the architecture - it defines fields in the RB value for the flush type and TLB index. The 128 is POWER7-specific and isn't in any SPR that I know of. Eventually we'll probably have to put it (the number of TLB congruence classes) in the cputable, but for now I'll just do a define. > So I take it that p7 does not implement tlbia? Correct. > So we never return 0? How about ECC errors and the likes? Wouldn't those also be #MCs that the host needs to handle? Yes, true. In fact the OPAL firmware gets to see the machine checks before we do (see the opal_register_exception_handler() calls in arch/powerpc/platforms/powernv/opal.c), so it should have already handled recoverable things like L1 cache parity errors. I'll make the function return 0 if there's an error that it doesn't know about. > > ld r8, HSTATE_VMHANDLER(r13) > > ld r7, HSTATE_HOST_MSR(r13) > > > > cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL > > +BEGIN_FTR_SECTION > > beq 11f > > - cmpwi r12, BOOK3S_INTERRUPT_MACHINE_CHECK > > +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) > > Mind to explain the logic that happens here? Do we get external interrupts on 970? If not, the cmpwi should also be inside the FTR section. Also, if we do a beq here, why do the beqctr below again? I was making it not call the host kernel machine check handler if it was a machine check that pulled us out of the guest. In fact we probably do still want to call the handler, but we don't want to jump to 0x200, since that has been patched by OPAL, and we don't want to make OPAL think we just got another machine check. Instead we would need to jump to machine_check_pSeries. The feature section is because POWER7 sets HSRR0/1 on external interrupts, whereas PPC970 sets SRR0/1. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 23.11.2012, at 22:42, Paul Mackerras wrote: > On Fri, Nov 23, 2012 at 03:13:09PM +0100, Alexander Graf wrote: >> >> On 22.11.2012, at 10:25, Paul Mackerras wrote: >> >>> + /* Do they have an SLB shadow buffer registered? */ >>> + slb = vcpu->arch.slb_shadow.pinned_addr; >>> + if (!slb) >>> + return; >> >> Mind to explain this case? What happens here? Do we leave the guest with an empty SLB? Why would this ever happen? What happens next as soon as we go back into the guest? > > Yes, we leave the guest with an empty SLB, the access gets retried and > this time the guest gets an SLB miss interrupt, which it can hopefully > handle using an SLB miss handler that runs entirely in real mode. > This could happen for instance while the guest is in SLOF or yaboot or > some other code that runs basically in real mode but occasionally > turns the MMU on for some accesses, and happens to have a bug where it > creates a duplicate SLB entry. Is this what pHyp does? Also, is this what we want? Why don't we populate an #MC into the guest so it knows it did something wrong? Alex > >>> + /* Sanity check */ >>> + n = slb->persistent; >>> + if (n > SLB_MIN_SIZE) >>> + n = SLB_MIN_SIZE; >> >> Please use a max() macro here. > > OK. > >>> + rb = 0x800; /* IS field = 0b10, flush congruence class */ >>> + for (i = 0; i < 128; ++i) { >> >> Please put up a #define for this. POWER7_TLB_SIZE or so. Is there any way to fetch that number from an SPR? I don't really want to have a p7+ and a p8 function in here too ;). >> >>> + asm volatile("tlbiel %0" : : "r" (rb)); >>> + rb += 0x1000; >> >> I assume this also means (1 << TLBIE_ENTRY_SHIFT)? Would be nice to keep the code readable without guessing :). > > The 0x800 and 0x1000 are taken from the architecture - it defines > fields in the RB value for the flush type and TLB index. The 128 is > POWER7-specific and isn't in any SPR that I know of. Eventually we'll > probably have to put it (the number of TLB congruence classes) in the > cputable, but for now I'll just do a define. > >> So I take it that p7 does not implement tlbia? > > Correct. > >> So we never return 0? How about ECC errors and the likes? Wouldn't those also be #MCs that the host needs to handle? > > Yes, true. In fact the OPAL firmware gets to see the machine checks > before we do (see the opal_register_exception_handler() calls in > arch/powerpc/platforms/powernv/opal.c), so it should have already > handled recoverable things like L1 cache parity errors. > > I'll make the function return 0 if there's an error that it doesn't > know about. > >>> ld r8, HSTATE_VMHANDLER(r13) >>> ld r7, HSTATE_HOST_MSR(r13) >>> >>> cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL >>> +BEGIN_FTR_SECTION >>> beq 11f >>> - cmpwi r12, BOOK3S_INTERRUPT_MACHINE_CHECK >>> +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) >> >> Mind to explain the logic that happens here? Do we get external interrupts on 970? If not, the cmpwi should also be inside the FTR section. Also, if we do a beq here, why do the beqctr below again? > > I was making it not call the host kernel machine check handler if it > was a machine check that pulled us out of the guest. In fact we > probably do still want to call the handler, but we don't want to jump > to 0x200, since that has been patched by OPAL, and we don't want to > make OPAL think we just got another machine check. Instead we would > need to jump to machine_check_pSeries. > > The feature section is because POWER7 sets HSRR0/1 on external > interrupts, whereas PPC970 sets SRR0/1. > > Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 26, 2012 at 02:15:16PM +0100, Alexander Graf wrote: > > On 23.11.2012, at 22:42, Paul Mackerras wrote: > > > On Fri, Nov 23, 2012 at 03:13:09PM +0100, Alexander Graf wrote: > >> > >> On 22.11.2012, at 10:25, Paul Mackerras wrote: > >> > >>> + /* Do they have an SLB shadow buffer registered? */ > >>> + slb = vcpu->arch.slb_shadow.pinned_addr; > >>> + if (!slb) > >>> + return; > >> > >> Mind to explain this case? What happens here? Do we leave the guest with an empty SLB? Why would this ever happen? What happens next as soon as we go back into the guest? > > > > Yes, we leave the guest with an empty SLB, the access gets retried and > > this time the guest gets an SLB miss interrupt, which it can hopefully > > handle using an SLB miss handler that runs entirely in real mode. > > This could happen for instance while the guest is in SLOF or yaboot or > > some other code that runs basically in real mode but occasionally > > turns the MMU on for some accesses, and happens to have a bug where it > > creates a duplicate SLB entry. > > Is this what pHyp does? Also, is this what we want? Why don't we populate an #MC into the guest so it knows it did something wrong? Yes, yes and we do. Anytime we get a machine check while in the guest we give the guest a machine check interrupt. Ultimately we want to implement the "FWNMI" (Firmware-assisted NMI) thing defined in PAPR which makes the handling of system reset and machine check slightly nicer for the guest, but that's for later. It will build on top of the stuff in this patch. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 26.11.2012, at 22:33, Paul Mackerras wrote: > On Mon, Nov 26, 2012 at 02:15:16PM +0100, Alexander Graf wrote: >> >> On 23.11.2012, at 22:42, Paul Mackerras wrote: >> >>> On Fri, Nov 23, 2012 at 03:13:09PM +0100, Alexander Graf wrote: >>>> >>>> On 22.11.2012, at 10:25, Paul Mackerras wrote: >>>> >>>>> + /* Do they have an SLB shadow buffer registered? */ >>>>> + slb = vcpu->arch.slb_shadow.pinned_addr; >>>>> + if (!slb) >>>>> + return; >>>> >>>> Mind to explain this case? What happens here? Do we leave the guest with an empty SLB? Why would this ever happen? What happens next as soon as we go back into the guest? >>> >>> Yes, we leave the guest with an empty SLB, the access gets retried and >>> this time the guest gets an SLB miss interrupt, which it can hopefully >>> handle using an SLB miss handler that runs entirely in real mode. >>> This could happen for instance while the guest is in SLOF or yaboot or >>> some other code that runs basically in real mode but occasionally >>> turns the MMU on for some accesses, and happens to have a bug where it >>> creates a duplicate SLB entry. >> >> Is this what pHyp does? Also, is this what we want? Why don't we populate an #MC into the guest so it knows it did something wrong? > > Yes, yes and we do. Anytime we get a machine check while in the guest > we give the guest a machine check interrupt. > > Ultimately we want to implement the "FWNMI" (Firmware-assisted NMI) > thing defined in PAPR which makes the handling of system reset and > machine check slightly nicer for the guest, but that's for later. It > will build on top of the stuff in this patch. So why would the function return 1 then which means "MC is handled, forget about it" rather than 0, which means "inject MC into the guest"? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 26.11.2012, at 22:55, Alexander Graf wrote: > > On 26.11.2012, at 22:33, Paul Mackerras wrote: > >> On Mon, Nov 26, 2012 at 02:15:16PM +0100, Alexander Graf wrote: >>> >>> On 23.11.2012, at 22:42, Paul Mackerras wrote: >>> >>>> On Fri, Nov 23, 2012 at 03:13:09PM +0100, Alexander Graf wrote: >>>>> >>>>> On 22.11.2012, at 10:25, Paul Mackerras wrote: >>>>> >>>>>> + /* Do they have an SLB shadow buffer registered? */ >>>>>> + slb = vcpu->arch.slb_shadow.pinned_addr; >>>>>> + if (!slb) >>>>>> + return; >>>>> >>>>> Mind to explain this case? What happens here? Do we leave the guest with an empty SLB? Why would this ever happen? What happens next as soon as we go back into the guest? >>>> >>>> Yes, we leave the guest with an empty SLB, the access gets retried and >>>> this time the guest gets an SLB miss interrupt, which it can hopefully >>>> handle using an SLB miss handler that runs entirely in real mode. >>>> This could happen for instance while the guest is in SLOF or yaboot or >>>> some other code that runs basically in real mode but occasionally >>>> turns the MMU on for some accesses, and happens to have a bug where it >>>> creates a duplicate SLB entry. >>> >>> Is this what pHyp does? Also, is this what we want? Why don't we populate an #MC into the guest so it knows it did something wrong? >> >> Yes, yes and we do. Anytime we get a machine check while in the guest >> we give the guest a machine check interrupt. >> >> Ultimately we want to implement the "FWNMI" (Firmware-assisted NMI) >> thing defined in PAPR which makes the handling of system reset and >> machine check slightly nicer for the guest, but that's for later. It >> will build on top of the stuff in this patch. > > So why would the function return 1 then which means "MC is handled, forget about it" rather than 0, which means "inject MC into the guest"? Oh wait - 1 means "have the host handle it". Let me check up the code again. Alex-- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 26, 2012 at 11:03:48PM +0100, Alexander Graf wrote: > > On 26.11.2012, at 22:55, Alexander Graf wrote: > > > > > On 26.11.2012, at 22:33, Paul Mackerras wrote: > > > >> On Mon, Nov 26, 2012 at 02:15:16PM +0100, Alexander Graf wrote: > >>> > >>> On 23.11.2012, at 22:42, Paul Mackerras wrote: > >>> > >>>> On Fri, Nov 23, 2012 at 03:13:09PM +0100, Alexander Graf wrote: > >>>>> > >>>>> On 22.11.2012, at 10:25, Paul Mackerras wrote: > >>>>> > >>>>>> + /* Do they have an SLB shadow buffer registered? */ > >>>>>> + slb = vcpu->arch.slb_shadow.pinned_addr; > >>>>>> + if (!slb) > >>>>>> + return; > >>>>> > >>>>> Mind to explain this case? What happens here? Do we leave the guest with an empty SLB? Why would this ever happen? What happens next as soon as we go back into the guest? > >>>> > >>>> Yes, we leave the guest with an empty SLB, the access gets retried and > >>>> this time the guest gets an SLB miss interrupt, which it can hopefully > >>>> handle using an SLB miss handler that runs entirely in real mode. > >>>> This could happen for instance while the guest is in SLOF or yaboot or > >>>> some other code that runs basically in real mode but occasionally > >>>> turns the MMU on for some accesses, and happens to have a bug where it > >>>> creates a duplicate SLB entry. > >>> > >>> Is this what pHyp does? Also, is this what we want? Why don't we populate an #MC into the guest so it knows it did something wrong? > >> > >> Yes, yes and we do. Anytime we get a machine check while in the guest > >> we give the guest a machine check interrupt. > >> > >> Ultimately we want to implement the "FWNMI" (Firmware-assisted NMI) > >> thing defined in PAPR which makes the handling of system reset and > >> machine check slightly nicer for the guest, but that's for later. It > >> will build on top of the stuff in this patch. > > > > So why would the function return 1 then which means "MC is handled, forget about it" rather than 0, which means "inject MC into the guest"? > > Oh wait - 1 means "have the host handle it". Let me check up the code again. 1 means "the problem is fixed, now give the guest a machine check interrupt". 0 means "exit the guest, have the host's MC handler look at it, then give the guest a machine check". In this case the delivery of the MC to the guest happens in kvmppc_handle_exit(). Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index cd89658..1e473d4 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -73,6 +73,7 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \ book3s_hv_rmhandlers.o \ book3s_hv_rm_mmu.o \ book3s_64_vio_hv.o \ + book3s_hv_ras.o \ book3s_hv_builtin.o kvm-book3s_64-module-objs := \ diff --git a/arch/powerpc/kvm/book3s_hv_ras.c b/arch/powerpc/kvm/book3s_hv_ras.c new file mode 100644 index 0000000..2646d07 --- /dev/null +++ b/arch/powerpc/kvm/book3s_hv_ras.c @@ -0,0 +1,115 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * Copyright 2012 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com> + */ + +#include <linux/types.h> +#include <linux/string.h> +#include <linux/kvm.h> +#include <linux/kvm_host.h> + +/* SRR1 bits for machine check on POWER7 */ +#define SRR1_MC_LDSTERR (1ul << (63-42)) +#define SRR1_MC_IFETCH_SH (63-45) +#define SRR1_MC_IFETCH_MASK 0x7 +#define SRR1_MC_IFETCH_SLBPAR 2 /* SLB parity error */ +#define SRR1_MC_IFETCH_SLBMULTI 3 /* SLB multi-hit */ +#define SRR1_MC_IFETCH_SLBPARMULTI 4 /* SLB parity + multi-hit */ +#define SRR1_MC_IFETCH_TLBMULTI 5 /* I-TLB multi-hit */ + +/* DSISR bits for machine check on POWER7 */ +#define DSISR_MC_DERAT_MULTI 0x800 /* D-ERAT multi-hit */ +#define DSISR_MC_TLB_MULTI 0x400 /* D-TLB multi-hit */ +#define DSISR_MC_SLB_PARITY 0x100 /* SLB parity error */ +#define DSISR_MC_SLB_MULTI 0x080 /* SLB multi-hit */ +#define DSISR_MC_SLB_PARMULTI 0x040 /* SLB parity + multi-hit */ + +/* POWER7 SLB flush and reload */ +static void reload_slb(struct kvm_vcpu *vcpu) +{ + struct slb_shadow *slb; + unsigned long i, n; + + /* First clear out SLB */ + asm volatile("slbmte %0,%0; slbia" : : "r" (0)); + + /* Do they have an SLB shadow buffer registered? */ + slb = vcpu->arch.slb_shadow.pinned_addr; + if (!slb) + return; + + /* Sanity check */ + n = slb->persistent; + if (n > SLB_MIN_SIZE) + n = SLB_MIN_SIZE; + if ((void *) &slb->save_area[n] > vcpu->arch.slb_shadow.pinned_end) + return; + + /* Load up the SLB from that */ + for (i = 0; i < n; ++i) { + unsigned long rb = slb->save_area[i].esid; + unsigned long rs = slb->save_area[i].vsid; + + rb = (rb & ~0xFFFul) | i; /* insert entry number */ + asm volatile("slbmte %0,%1" : : "r" (rs), "r" (rb)); + } +} + +/* POWER7 TLB flush */ +static void flush_tlb_power7(struct kvm_vcpu *vcpu) +{ + unsigned long i, rb; + + rb = 0x800; /* IS field = 0b10, flush congruence class */ + for (i = 0; i < 128; ++i) { + asm volatile("tlbiel %0" : : "r" (rb)); + rb += 0x1000; + } +} + +/* + * On POWER7, see if we can handle a machine check that occurred inside + * the guest in real mode, without switching to the host partition. + * + * Returns: 0 => exit guest, 1 => deliver machine check to guest + */ +static long kvmppc_realmode_mc_power7(struct kvm_vcpu *vcpu) +{ + unsigned long srr1 = vcpu->arch.shregs.msr; + + if (srr1 & SRR1_MC_LDSTERR) { + /* error on load/store */ + unsigned long dsisr = vcpu->arch.shregs.dsisr; + + if (dsisr & (DSISR_MC_SLB_PARMULTI | DSISR_MC_SLB_MULTI | + DSISR_MC_SLB_PARITY | DSISR_MC_DERAT_MULTI)) + /* flush and reload SLB; flushes D-ERAT too */ + reload_slb(vcpu); + if (dsisr & DSISR_MC_TLB_MULTI) + flush_tlb_power7(vcpu); + } + + switch ((srr1 >> SRR1_MC_IFETCH_SH) & SRR1_MC_IFETCH_MASK) { + case SRR1_MC_IFETCH_SLBPAR: + case SRR1_MC_IFETCH_SLBMULTI: + case SRR1_MC_IFETCH_SLBPARMULTI: + reload_slb(vcpu); + break; + case SRR1_MC_IFETCH_TLBMULTI: + flush_tlb_power7(vcpu); + break; + } + + return 1; +} + +long kvmppc_realmode_machine_check(struct kvm_vcpu *vcpu) +{ + if (cpu_has_feature(CPU_FTR_ARCH_206)) + return kvmppc_realmode_mc_power7(vcpu); + + return 0; +} diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 690d112..b31fb4f 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -682,8 +682,7 @@ BEGIN_FTR_SECTION 1: END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) -nohpte_cont: -hcall_real_cont: /* r9 = vcpu, r12 = trap, r13 = paca */ +guest_exit_cont: /* r9 = vcpu, r12 = trap, r13 = paca */ /* Save DEC */ mfspr r5,SPRN_DEC mftb r6 @@ -704,6 +703,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) std r6, VCPU_FAULT_DAR(r9) stw r7, VCPU_FAULT_DSISR(r9) + /* See if it is a machine check */ + cmpwi r12, BOOK3S_INTERRUPT_MACHINE_CHECK + beq machine_check_realmode +mc_cont: + /* Save guest CTRL register, set runlatch to 1 */ 6: mfspr r6,SPRN_CTRLF stw r6,VCPU_CTRL(r9) @@ -1114,20 +1118,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201) isync 23: /* - * For external and machine check interrupts, we need + * For external interrupts, we need * to call the Linux handler to process the interrupt. * We do that by jumping to the interrupt vector address * which we have in r12. The [h]rfid at the end of the * handler will return to the book3s_hv_interrupts.S code. * For other interrupts we do the rfid to get back - * to the book3s_interrupts.S code here. + * to the book3s_hv_interrupts.S code here. */ ld r8, HSTATE_VMHANDLER(r13) ld r7, HSTATE_HOST_MSR(r13) cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL +BEGIN_FTR_SECTION beq 11f - cmpwi r12, BOOK3S_INTERRUPT_MACHINE_CHECK +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) /* RFI into the highmem handler, or branch to interrupt handler */ 12: mfmsr r6 @@ -1140,11 +1145,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201) beqctr RFI -11: -BEGIN_FTR_SECTION - b 12b -END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201) - mtspr SPRN_HSRR0, r8 + /* On POWER7, we have external interrupts set to use HSRR0/1 */ +11: mtspr SPRN_HSRR0, r8 mtspr SPRN_HSRR1, r7 ba 0x500 @@ -1180,7 +1182,7 @@ kvmppc_hdsi: cmpdi r3, 0 /* retry the instruction */ beq 6f cmpdi r3, -1 /* handle in kernel mode */ - beq nohpte_cont + beq guest_exit_cont cmpdi r3, -2 /* MMIO emulation; need instr word */ beq 2f @@ -1194,6 +1196,7 @@ kvmppc_hdsi: li r10, BOOK3S_INTERRUPT_DATA_STORAGE li r11, (MSR_ME << 1) | 1 /* synthesize MSR_SF | MSR_ME */ rotldi r11, r11, 63 +fast_interrupt_c_return: 6: ld r7, VCPU_CTR(r9) lwz r8, VCPU_XER(r9) mtctr r7 @@ -1226,7 +1229,7 @@ kvmppc_hdsi: /* Unset guest mode. */ li r0, KVM_GUEST_MODE_NONE stb r0, HSTATE_IN_GUEST(r13) - b nohpte_cont + b guest_exit_cont /* * Similarly for an HISI, reflect it to the guest as an ISI unless @@ -1252,9 +1255,9 @@ kvmppc_hisi: ld r11, VCPU_MSR(r9) li r12, BOOK3S_INTERRUPT_H_INST_STORAGE cmpdi r3, 0 /* retry the instruction */ - beq 6f + beq fast_interrupt_c_return cmpdi r3, -1 /* handle in kernel mode */ - beq nohpte_cont + beq guest_exit_cont /* Synthesize an ISI for the guest */ mr r11, r3 @@ -1263,12 +1266,7 @@ kvmppc_hisi: li r10, BOOK3S_INTERRUPT_INST_STORAGE li r11, (MSR_ME << 1) | 1 /* synthesize MSR_SF | MSR_ME */ rotldi r11, r11, 63 -6: ld r7, VCPU_CTR(r9) - lwz r8, VCPU_XER(r9) - mtctr r7 - mtxer r8 - mr r4, r9 - b fast_guest_return + b fast_interrupt_c_return 3: ld r6, VCPU_KVM(r9) /* not relocated, use VRMA */ ld r5, KVM_VRMA_SLB_V(r6) @@ -1284,14 +1282,14 @@ kvmppc_hisi: hcall_try_real_mode: ld r3,VCPU_GPR(R3)(r9) andi. r0,r11,MSR_PR - bne hcall_real_cont + bne guest_exit_cont clrrdi r3,r3,2 cmpldi r3,hcall_real_table_end - hcall_real_table - bge hcall_real_cont + bge guest_exit_cont LOAD_REG_ADDR(r4, hcall_real_table) lwzx r3,r3,r4 cmpwi r3,0 - beq hcall_real_cont + beq guest_exit_cont add r3,r3,r4 mtctr r3 mr r3,r9 /* get vcpu pointer */ @@ -1312,7 +1310,7 @@ hcall_real_fallback: li r12,BOOK3S_INTERRUPT_SYSCALL ld r9, HSTATE_KVM_VCPU(r13) - b hcall_real_cont + b guest_exit_cont .globl hcall_real_table hcall_real_table: @@ -1571,6 +1569,21 @@ kvm_cede_exit: li r3,H_TOO_HARD blr + /* Try to handle a machine check in real mode */ +machine_check_realmode: + mr r3, r9 /* get vcpu pointer */ + bl .kvmppc_realmode_machine_check + nop + cmpdi r3, 0 /* continue exiting from guest? */ + ld r9, HSTATE_KVM_VCPU(r13) + li r12, BOOK3S_INTERRUPT_MACHINE_CHECK + beq mc_cont + /* If not, deliver a machine check. SRR0/1 are already set */ + li r10, BOOK3S_INTERRUPT_MACHINE_CHECK + li r11, (MSR_ME << 1) | 1 /* synthesize MSR_SF | MSR_ME */ + rotldi r11, r11, 63 + b fast_interrupt_c_return + secondary_too_late: ld r5,HSTATE_KVM_VCORE(r13) HMT_LOW
Currently, if a machine check interrupt happens while we are in the guest, we exit the guest and call the host's machine check handler, which tends to cause the host to panic. Some machine checks can be triggered by the guest; for example, if the guest creates two entries in the SLB that map the same effective address, and then accesses that effective address, the CPU will take a machine check interrupt. To handle this better, we firstly don't call the host's machine check handler for machine checks that happen inside the guest. Instead we call a new function, kvmppc_realmode_machine_check(), while still in real mode before exiting the guest. On POWER7, it handles the cases that the guest can trigger, either by flushing and reloading the SLB, or by flushing the TLB, and then it delivers the machine check interrupt to the guest. (On PPC970 we currently just exit the guest and leave it up to kvmppc_handle_exit() to handle the condition, which it doesn't, so the situation is no better -- but no worse -- on PPC970 now.) Signed-off-by: Paul Mackerras <paulus@samba.org> --- arch/powerpc/kvm/Makefile | 1 + arch/powerpc/kvm/book3s_hv_ras.c | 115 +++++++++++++++++++++++++++++++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 61 +++++++++------- 3 files changed, 153 insertions(+), 24 deletions(-) create mode 100644 arch/powerpc/kvm/book3s_hv_ras.c