Message ID | 20210323010305.1045293-29-npiggin@gmail.com |
---|---|
State | New |
Headers | show |
Series | KVM: PPC: Book3S: C-ify the P9 entry/exit code | expand |
On 23/03/2021 12:02, Nicholas Piggin wrote: > irq_work's use of the DEC SPR is racy with guest<->host switch and guest > entry which flips the DEC interrupt to guest, which could lose a host > work interrupt. > > This patch closes one race, and attempts to comment another class of > races. > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > arch/powerpc/kvm/book3s_hv.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 1f38a0abc611..989a1ff5ad11 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -3745,6 +3745,18 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, > if (!(vcpu->arch.ctrl & 1)) > mtspr(SPRN_CTRLT, mfspr(SPRN_CTRLF) & ~1); > > + /* > + * When setting DEC, we must always deal with irq_work_raise via NMI vs > + * setting DEC. The problem occurs right as we switch into guest mode > + * if a NMI hits and sets pending work and sets DEC, then that will > + * apply to the guest and not bring us back to the host. > + * > + * irq_work_raise could check a flag (or possibly LPCR[HDICE] for > + * example) and set HDEC to 1? That wouldn't solve the nested hv > + * case which needs to abort the hcall or zero the time limit. > + * > + * XXX: Another day's problem. > + */ > mtspr(SPRN_DEC, vcpu->arch.dec_expires - tb); > > if (kvmhv_on_pseries()) { > @@ -3879,7 +3891,8 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, > vc->entry_exit_map = 0x101; > vc->in_guest = 0; > > - mtspr(SPRN_DEC, local_paca->kvm_hstate.dec_expires - tb); > + set_dec_or_work(local_paca->kvm_hstate.dec_expires - tb); set_dec_or_work() will write local_paca->kvm_hstate.dec_expires - tb - 1 to SPRN_DEC which is not exactly the same, is this still alright? I asked in v3 but it is probably lost :) > + > mtspr(SPRN_SPRG_VDSO_WRITE, local_paca->sprg_vdso); > > kvmhv_load_host_pmu(); >
Excerpts from Alexey Kardashevskiy's message of March 23, 2021 8:13 pm: > > > On 23/03/2021 12:02, Nicholas Piggin wrote: >> irq_work's use of the DEC SPR is racy with guest<->host switch and guest >> entry which flips the DEC interrupt to guest, which could lose a host >> work interrupt. >> >> This patch closes one race, and attempts to comment another class of >> races. >> >> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> >> --- >> arch/powerpc/kvm/book3s_hv.c | 15 ++++++++++++++- >> 1 file changed, 14 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c >> index 1f38a0abc611..989a1ff5ad11 100644 >> --- a/arch/powerpc/kvm/book3s_hv.c >> +++ b/arch/powerpc/kvm/book3s_hv.c >> @@ -3745,6 +3745,18 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, >> if (!(vcpu->arch.ctrl & 1)) >> mtspr(SPRN_CTRLT, mfspr(SPRN_CTRLF) & ~1); >> >> + /* >> + * When setting DEC, we must always deal with irq_work_raise via NMI vs >> + * setting DEC. The problem occurs right as we switch into guest mode >> + * if a NMI hits and sets pending work and sets DEC, then that will >> + * apply to the guest and not bring us back to the host. >> + * >> + * irq_work_raise could check a flag (or possibly LPCR[HDICE] for >> + * example) and set HDEC to 1? That wouldn't solve the nested hv >> + * case which needs to abort the hcall or zero the time limit. >> + * >> + * XXX: Another day's problem. >> + */ >> mtspr(SPRN_DEC, vcpu->arch.dec_expires - tb); >> >> if (kvmhv_on_pseries()) { >> @@ -3879,7 +3891,8 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, >> vc->entry_exit_map = 0x101; >> vc->in_guest = 0; >> >> - mtspr(SPRN_DEC, local_paca->kvm_hstate.dec_expires - tb); >> + set_dec_or_work(local_paca->kvm_hstate.dec_expires - tb); > > > set_dec_or_work() will write local_paca->kvm_hstate.dec_expires - tb - 1 > to SPRN_DEC which is not exactly the same, is this still alright? > > I asked in v3 but it is probably lost :) Oh I did see that then forgot. It will write dec_expires - tb, then it will write 1 if it found irq_work was pending. The change is intentional to fixes one of the lost irq_work races. Thanks, Nick
Excerpts from Nicholas Piggin's message of March 23, 2021 8:36 pm: > Excerpts from Alexey Kardashevskiy's message of March 23, 2021 8:13 pm: >> >> >> On 23/03/2021 12:02, Nicholas Piggin wrote: >>> irq_work's use of the DEC SPR is racy with guest<->host switch and guest >>> entry which flips the DEC interrupt to guest, which could lose a host >>> work interrupt. >>> >>> This patch closes one race, and attempts to comment another class of >>> races. >>> >>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> >>> --- >>> arch/powerpc/kvm/book3s_hv.c | 15 ++++++++++++++- >>> 1 file changed, 14 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c >>> index 1f38a0abc611..989a1ff5ad11 100644 >>> --- a/arch/powerpc/kvm/book3s_hv.c >>> +++ b/arch/powerpc/kvm/book3s_hv.c >>> @@ -3745,6 +3745,18 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, >>> if (!(vcpu->arch.ctrl & 1)) >>> mtspr(SPRN_CTRLT, mfspr(SPRN_CTRLF) & ~1); >>> >>> + /* >>> + * When setting DEC, we must always deal with irq_work_raise via NMI vs >>> + * setting DEC. The problem occurs right as we switch into guest mode >>> + * if a NMI hits and sets pending work and sets DEC, then that will >>> + * apply to the guest and not bring us back to the host. >>> + * >>> + * irq_work_raise could check a flag (or possibly LPCR[HDICE] for >>> + * example) and set HDEC to 1? That wouldn't solve the nested hv >>> + * case which needs to abort the hcall or zero the time limit. >>> + * >>> + * XXX: Another day's problem. >>> + */ >>> mtspr(SPRN_DEC, vcpu->arch.dec_expires - tb); >>> >>> if (kvmhv_on_pseries()) { >>> @@ -3879,7 +3891,8 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, >>> vc->entry_exit_map = 0x101; >>> vc->in_guest = 0; >>> >>> - mtspr(SPRN_DEC, local_paca->kvm_hstate.dec_expires - tb); >>> + set_dec_or_work(local_paca->kvm_hstate.dec_expires - tb); >> >> >> set_dec_or_work() will write local_paca->kvm_hstate.dec_expires - tb - 1 >> to SPRN_DEC which is not exactly the same, is this still alright? >> >> I asked in v3 but it is probably lost :) > > Oh I did see that then forgot. > > It will write dec_expires - tb, then it will write 1 if it found irq_work > was pending. Ah you were actually asking about set_dec writing val - 1. I totally missed that. Yes that was an unintentional change. This is the way timer.c code works with respect to the decrementers_next_tb value, so it's probably better to make them so it seems like it should be okay (and better to bring the KVM code up to match timer code rather than be different or the other way around). The difference should be noted in the changelog though. Thanks, Nick
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 1f38a0abc611..989a1ff5ad11 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3745,6 +3745,18 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, if (!(vcpu->arch.ctrl & 1)) mtspr(SPRN_CTRLT, mfspr(SPRN_CTRLF) & ~1); + /* + * When setting DEC, we must always deal with irq_work_raise via NMI vs + * setting DEC. The problem occurs right as we switch into guest mode + * if a NMI hits and sets pending work and sets DEC, then that will + * apply to the guest and not bring us back to the host. + * + * irq_work_raise could check a flag (or possibly LPCR[HDICE] for + * example) and set HDEC to 1? That wouldn't solve the nested hv + * case which needs to abort the hcall or zero the time limit. + * + * XXX: Another day's problem. + */ mtspr(SPRN_DEC, vcpu->arch.dec_expires - tb); if (kvmhv_on_pseries()) { @@ -3879,7 +3891,8 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, vc->entry_exit_map = 0x101; vc->in_guest = 0; - mtspr(SPRN_DEC, local_paca->kvm_hstate.dec_expires - tb); + set_dec_or_work(local_paca->kvm_hstate.dec_expires - tb); + mtspr(SPRN_SPRG_VDSO_WRITE, local_paca->sprg_vdso); kvmhv_load_host_pmu();
irq_work's use of the DEC SPR is racy with guest<->host switch and guest entry which flips the DEC interrupt to guest, which could lose a host work interrupt. This patch closes one race, and attempts to comment another class of races. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- arch/powerpc/kvm/book3s_hv.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)