Message ID | 150607286967.26027.12529646475118424696.stgit@bahia.lan |
---|---|
State | Superseded |
Headers | show |
Series | KVM: PPC: Book3S PR: only call slbmte for valid SLB entries | expand |
On Fri, Sep 22, 2017 at 11:34:29AM +0200, Greg Kurz wrote: > Userland passes an array of 64 SLB descriptors to KVM_SET_SREGS, > some of which are valid (ie, SLB_ESID_V is set) and the rest are > likely all-zeroes (with QEMU at least). > > Each of them is then passed to kvmppc_mmu_book3s_64_slbmte(), which > assumes to find the SLB index in the 3 lower bits of its rb argument. > When passed zeroed arguments, it happily overwrites the 0th SLB entry > with zeroes. This is exactly what happens while doing live migration > with QEMU when the destination pushes the incoming SLB descriptors to > KVM PR. When reloading the SLBs at the next synchronization, QEMU first > clears its SLB array and only restore valid ones, but the 0th one is > now gone and we cannot access the corresponding memory anymore: > > (qemu) x/x $pc > c0000000000b742c: Cannot access memory > > To avoid this, let's filter out non-valid SLB entries, like we > already do for Book3S HV. > > Signed-off-by: Greg Kurz <groug@kaod.org> This seems like a good idea, but to make it fully correct, don't we also need to fully flush the SLB before inserting the new entries. > --- > arch/powerpc/kvm/book3s_pr.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c > index 3beb4ff469d1..cb6894e55f97 100644 > --- a/arch/powerpc/kvm/book3s_pr.c > +++ b/arch/powerpc/kvm/book3s_pr.c > @@ -1328,8 +1328,10 @@ static int kvm_arch_vcpu_ioctl_set_sregs_pr(struct kvm_vcpu *vcpu, > vcpu3s->sdr1 = sregs->u.s.sdr1; > if (vcpu->arch.hflags & BOOK3S_HFLAG_SLB) { > for (i = 0; i < 64; i++) { > - vcpu->arch.mmu.slbmte(vcpu, sregs->u.s.ppc64.slb[i].slbv, > - sregs->u.s.ppc64.slb[i].slbe); > + u64 rb = sregs->u.s.ppc64.slb[i].slbe; > + u64 rs = sregs->u.s.ppc64.slb[i].slbv; > + if (rb & SLB_ESID_V) > + vcpu->arch.mmu.slbmte(vcpu, rs, rb); > } > } else { > for (i = 0; i < 16; i++) { >
David Gibson <david@gibson.dropbear.id.au> writes: > On Fri, Sep 22, 2017 at 11:34:29AM +0200, Greg Kurz wrote: >> Userland passes an array of 64 SLB descriptors to KVM_SET_SREGS, >> some of which are valid (ie, SLB_ESID_V is set) and the rest are >> likely all-zeroes (with QEMU at least). >> >> Each of them is then passed to kvmppc_mmu_book3s_64_slbmte(), which >> assumes to find the SLB index in the 3 lower bits of its rb argument. >> When passed zeroed arguments, it happily overwrites the 0th SLB entry >> with zeroes. This is exactly what happens while doing live migration >> with QEMU when the destination pushes the incoming SLB descriptors to >> KVM PR. When reloading the SLBs at the next synchronization, QEMU first >> clears its SLB array and only restore valid ones, but the 0th one is >> now gone and we cannot access the corresponding memory anymore: >> >> (qemu) x/x $pc >> c0000000000b742c: Cannot access memory >> >> To avoid this, let's filter out non-valid SLB entries, like we >> already do for Book3S HV. >> >> Signed-off-by: Greg Kurz <groug@kaod.org> > > This seems like a good idea, but to make it fully correct, don't we > also need to fully flush the SLB before inserting the new entries. We would need to do that yeah. But I don't think I like this patch, it would mean userspace has no way of programming an invalid SLB entry. It's true that in general that isn't something we care about doing, but the API should allow it. For example the kernel could leave invalid entries in place and flip the valid bit when it wanted to make them valid, and this patch would prevent that state being successfully migrated IIUIC. cheers -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 26, 2017 at 03:24:05PM +1000, Michael Ellerman wrote: > David Gibson <david@gibson.dropbear.id.au> writes: > > > On Fri, Sep 22, 2017 at 11:34:29AM +0200, Greg Kurz wrote: > >> Userland passes an array of 64 SLB descriptors to KVM_SET_SREGS, > >> some of which are valid (ie, SLB_ESID_V is set) and the rest are > >> likely all-zeroes (with QEMU at least). > >> > >> Each of them is then passed to kvmppc_mmu_book3s_64_slbmte(), which > >> assumes to find the SLB index in the 3 lower bits of its rb argument. > >> When passed zeroed arguments, it happily overwrites the 0th SLB entry > >> with zeroes. This is exactly what happens while doing live migration > >> with QEMU when the destination pushes the incoming SLB descriptors to > >> KVM PR. When reloading the SLBs at the next synchronization, QEMU first > >> clears its SLB array and only restore valid ones, but the 0th one is > >> now gone and we cannot access the corresponding memory anymore: > >> > >> (qemu) x/x $pc > >> c0000000000b742c: Cannot access memory > >> > >> To avoid this, let's filter out non-valid SLB entries, like we > >> already do for Book3S HV. > >> > >> Signed-off-by: Greg Kurz <groug@kaod.org> > > > > This seems like a good idea, but to make it fully correct, don't we > > also need to fully flush the SLB before inserting the new entries. > > We would need to do that yeah. > > But I don't think I like this patch, it would mean userspace has no way > of programming an invalid SLB entry. It's true that in general that > isn't something we care about doing, but the API should allow it. > > For example the kernel could leave invalid entries in place and flip the > valid bit when it wanted to make them valid, and this patch would > prevent that state being successfully migrated IIUIC. If I remember correctly, the architecture says that slbmfee/slbmfev return all zeroes for an invalid entry, so there would be no way for the guest kernel to do what you suggest. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Paul Mackerras <paulus@ozlabs.org> writes: > On Tue, Sep 26, 2017 at 03:24:05PM +1000, Michael Ellerman wrote: >> David Gibson <david@gibson.dropbear.id.au> writes: >> >> > On Fri, Sep 22, 2017 at 11:34:29AM +0200, Greg Kurz wrote: >> >> Userland passes an array of 64 SLB descriptors to KVM_SET_SREGS, >> >> some of which are valid (ie, SLB_ESID_V is set) and the rest are >> >> likely all-zeroes (with QEMU at least). >> >> >> >> Each of them is then passed to kvmppc_mmu_book3s_64_slbmte(), which >> >> assumes to find the SLB index in the 3 lower bits of its rb argument. >> >> When passed zeroed arguments, it happily overwrites the 0th SLB entry >> >> with zeroes. This is exactly what happens while doing live migration >> >> with QEMU when the destination pushes the incoming SLB descriptors to >> >> KVM PR. When reloading the SLBs at the next synchronization, QEMU first >> >> clears its SLB array and only restore valid ones, but the 0th one is >> >> now gone and we cannot access the corresponding memory anymore: >> >> >> >> (qemu) x/x $pc >> >> c0000000000b742c: Cannot access memory >> >> >> >> To avoid this, let's filter out non-valid SLB entries, like we >> >> already do for Book3S HV. >> >> >> >> Signed-off-by: Greg Kurz <groug@kaod.org> >> > >> > This seems like a good idea, but to make it fully correct, don't we >> > also need to fully flush the SLB before inserting the new entries. >> >> We would need to do that yeah. >> >> But I don't think I like this patch, it would mean userspace has no way >> of programming an invalid SLB entry. It's true that in general that >> isn't something we care about doing, but the API should allow it. >> >> For example the kernel could leave invalid entries in place and flip the >> valid bit when it wanted to make them valid, and this patch would >> prevent that state being successfully migrated IIUIC. > > If I remember correctly, the architecture says that slbmfee/slbmfev > return all zeroes for an invalid entry, so there would be no way for > the guest kernel to do what you suggest. You're right it does. We have code in xmon that reads entries and then checks for SLB_ESID_V, but I guess that's just overly pessimistic. cheers -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 3beb4ff469d1..cb6894e55f97 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -1328,8 +1328,10 @@ static int kvm_arch_vcpu_ioctl_set_sregs_pr(struct kvm_vcpu *vcpu, vcpu3s->sdr1 = sregs->u.s.sdr1; if (vcpu->arch.hflags & BOOK3S_HFLAG_SLB) { for (i = 0; i < 64; i++) { - vcpu->arch.mmu.slbmte(vcpu, sregs->u.s.ppc64.slb[i].slbv, - sregs->u.s.ppc64.slb[i].slbe); + u64 rb = sregs->u.s.ppc64.slb[i].slbe; + u64 rs = sregs->u.s.ppc64.slb[i].slbv; + if (rb & SLB_ESID_V) + vcpu->arch.mmu.slbmte(vcpu, rs, rb); } } else { for (i = 0; i < 16; i++) {
Userland passes an array of 64 SLB descriptors to KVM_SET_SREGS, some of which are valid (ie, SLB_ESID_V is set) and the rest are likely all-zeroes (with QEMU at least). Each of them is then passed to kvmppc_mmu_book3s_64_slbmte(), which assumes to find the SLB index in the 3 lower bits of its rb argument. When passed zeroed arguments, it happily overwrites the 0th SLB entry with zeroes. This is exactly what happens while doing live migration with QEMU when the destination pushes the incoming SLB descriptors to KVM PR. When reloading the SLBs at the next synchronization, QEMU first clears its SLB array and only restore valid ones, but the 0th one is now gone and we cannot access the corresponding memory anymore: (qemu) x/x $pc c0000000000b742c: Cannot access memory To avoid this, let's filter out non-valid SLB entries, like we already do for Book3S HV. Signed-off-by: Greg Kurz <groug@kaod.org> --- arch/powerpc/kvm/book3s_pr.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html