Message ID | 20240313072625.76804-1-vaibhav@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | KVM: PPC: Book3S HV nestedv2: Cancel pending HDEC exception | expand |
On Wed Mar 13, 2024 at 5:26 PM AEST, Vaibhav Jain wrote: > This reverts commit 180c6b072bf360b686e53d893d8dcf7dbbaec6bb ("KVM: PPC: > Book3S HV nestedv2: Do not cancel pending decrementer exception") which > prevented cancelling a pending HDEC exception for nestedv2 KVM guests. It > was done to avoid overhead of a H_GUEST_GET_STATE hcall to read the 'HDEC > expiry TB' register which was higher compared to handling extra decrementer > exceptions. > > This overhead of reading 'HDEC expiry TB' register has been mitigated > recently by the L0 hypervisor(PowerVM) by putting the value of this > register in L2 guest-state output buffer on trap to L1. From there the > value of this register is cached, made available in kvmhv_run_single_vcpu() > to compare it against host(L1) timebase and cancel the pending hypervisor > decrementer exception if needed. Ah, I figured out the problem here. Guest entry never clears the queued dec, because it's level triggered on the DEC MSB so it doesn't go away when it's delivered. So upstream code is indeed buggy and I think I take the blame for suggesting this nestedv2 workaround. I actually don't think that is necessary though, we could treat it like other interrupts. I think that would solve the problem without having to test dec here. I am wondering though, what workload slows down that this patch was needed in the first place. We'd only get here after a cede returns, then we'd dequeue the dec and stop having to GET_STATE it here. Thanks, Nick > > Fixes: 180c6b072bf3 ("KVM: PPC: Book3S HV nestedv2: Do not cancel pending decrementer exception") > Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> > --- > arch/powerpc/kvm/book3s_hv.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 0b921704da45..e47b954ce266 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -4856,7 +4856,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, > * entering a nested guest in which case the decrementer is now owned > * by L2 and the L1 decrementer is provided in hdec_expires > */ > - if (!kvmhv_is_nestedv2() && kvmppc_core_pending_dec(vcpu) && > + if (kvmppc_core_pending_dec(vcpu) && > ((tb < kvmppc_dec_expires_host_tb(vcpu)) || > (trap == BOOK3S_INTERRUPT_SYSCALL && > kvmppc_get_gpr(vcpu, 3) == H_ENTER_NESTED)))
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting for once, to make this easily accessible to everyone. Was this regression ever resolved? Doesn't look like it, but maybe I just missed something. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke On 20.03.24 14:43, Nicholas Piggin wrote: > On Wed Mar 13, 2024 at 5:26 PM AEST, Vaibhav Jain wrote: >> This reverts commit 180c6b072bf360b686e53d893d8dcf7dbbaec6bb ("KVM: PPC: >> Book3S HV nestedv2: Do not cancel pending decrementer exception") which >> prevented cancelling a pending HDEC exception for nestedv2 KVM guests. It >> was done to avoid overhead of a H_GUEST_GET_STATE hcall to read the 'HDEC >> expiry TB' register which was higher compared to handling extra decrementer >> exceptions. >> >> This overhead of reading 'HDEC expiry TB' register has been mitigated >> recently by the L0 hypervisor(PowerVM) by putting the value of this >> register in L2 guest-state output buffer on trap to L1. From there the >> value of this register is cached, made available in kvmhv_run_single_vcpu() >> to compare it against host(L1) timebase and cancel the pending hypervisor >> decrementer exception if needed. > > Ah, I figured out the problem here. Guest entry never clears the > queued dec, because it's level triggered on the DEC MSB so it > doesn't go away when it's delivered. So upstream code is indeed > buggy and I think I take the blame for suggesting this nestedv2 > workaround. > > I actually don't think that is necessary though, we could treat it > like other interrupts. I think that would solve the problem without > having to test dec here. > > I am wondering though, what workload slows down that this patch > was needed in the first place. We'd only get here after a cede > returns, then we'd dequeue the dec and stop having to GET_STATE > it here. > > Thanks, > Nick > >> >> Fixes: 180c6b072bf3 ("KVM: PPC: Book3S HV nestedv2: Do not cancel pending decrementer exception") >> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> >> --- >> arch/powerpc/kvm/book3s_hv.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c >> index 0b921704da45..e47b954ce266 100644 >> --- a/arch/powerpc/kvm/book3s_hv.c >> +++ b/arch/powerpc/kvm/book3s_hv.c >> @@ -4856,7 +4856,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, >> * entering a nested guest in which case the decrementer is now owned >> * by L2 and the L1 decrementer is provided in hdec_expires >> */ >> - if (!kvmhv_is_nestedv2() && kvmppc_core_pending_dec(vcpu) && >> + if (kvmppc_core_pending_dec(vcpu) && >> ((tb < kvmppc_dec_expires_host_tb(vcpu)) || >> (trap == BOOK3S_INTERRUPT_SYSCALL && >> kvmppc_get_gpr(vcpu, 3) == H_ENTER_NESTED))) >
"Linux regression tracking (Thorsten Leemhuis)" <regressions@leemhuis.info> writes: > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting > for once, to make this easily accessible to everyone. > > Was this regression ever resolved? Doesn't look like it, but maybe I > just missed something. I'm not sure how it ended up on the regression list. IMHO it's not really a regression. It was an attempt at a performance optimisation, which is no longer needed due to changes in (unreleased) firmware. I haven't merged it because Nick's reply contained several questions for Vaibhav, so I'm expecting either a reply to those or a new version of the patch. cheers > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > -- > Everything you wanna know about Linux kernel regression tracking: > https://linux-regtracking.leemhuis.info/about/#tldr > If I did something stupid, please tell me, as explained on that page. > > #regzbot poke > > On 20.03.24 14:43, Nicholas Piggin wrote: >> On Wed Mar 13, 2024 at 5:26 PM AEST, Vaibhav Jain wrote: >>> This reverts commit 180c6b072bf360b686e53d893d8dcf7dbbaec6bb ("KVM: PPC: >>> Book3S HV nestedv2: Do not cancel pending decrementer exception") which >>> prevented cancelling a pending HDEC exception for nestedv2 KVM guests. It >>> was done to avoid overhead of a H_GUEST_GET_STATE hcall to read the 'HDEC >>> expiry TB' register which was higher compared to handling extra decrementer >>> exceptions. >>> >>> This overhead of reading 'HDEC expiry TB' register has been mitigated >>> recently by the L0 hypervisor(PowerVM) by putting the value of this >>> register in L2 guest-state output buffer on trap to L1. From there the >>> value of this register is cached, made available in kvmhv_run_single_vcpu() >>> to compare it against host(L1) timebase and cancel the pending hypervisor >>> decrementer exception if needed. >> >> Ah, I figured out the problem here. Guest entry never clears the >> queued dec, because it's level triggered on the DEC MSB so it >> doesn't go away when it's delivered. So upstream code is indeed >> buggy and I think I take the blame for suggesting this nestedv2 >> workaround. >> >> I actually don't think that is necessary though, we could treat it >> like other interrupts. I think that would solve the problem without >> having to test dec here. >> >> I am wondering though, what workload slows down that this patch >> was needed in the first place. We'd only get here after a cede >> returns, then we'd dequeue the dec and stop having to GET_STATE >> it here. >> >> Thanks, >> Nick >> >>> >>> Fixes: 180c6b072bf3 ("KVM: PPC: Book3S HV nestedv2: Do not cancel pending decrementer exception") >>> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> >>> --- >>> arch/powerpc/kvm/book3s_hv.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c >>> index 0b921704da45..e47b954ce266 100644 >>> --- a/arch/powerpc/kvm/book3s_hv.c >>> +++ b/arch/powerpc/kvm/book3s_hv.c >>> @@ -4856,7 +4856,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, >>> * entering a nested guest in which case the decrementer is now owned >>> * by L2 and the L1 decrementer is provided in hdec_expires >>> */ >>> - if (!kvmhv_is_nestedv2() && kvmppc_core_pending_dec(vcpu) && >>> + if (kvmppc_core_pending_dec(vcpu) && >>> ((tb < kvmppc_dec_expires_host_tb(vcpu)) || >>> (trap == BOOK3S_INTERRUPT_SYSCALL && >>> kvmppc_get_gpr(vcpu, 3) == H_ENTER_NESTED))) >>
On 05.04.24 05:20, Michael Ellerman wrote: > "Linux regression tracking (Thorsten Leemhuis)" > <regressions@leemhuis.info> writes: >> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting >> for once, to make this easily accessible to everyone. >> >> Was this regression ever resolved? Doesn't look like it, but maybe I >> just missed something. > > I'm not sure how it ended up on the regression list. That is easy to explain: I let lei search for mails containing words like regress, bisect, and revert to become aware of regressions that might need tracking. And... > IMHO it's not really a regression. ...sometimes I misjudge or misinterpret something and add it to the regression tracking. Looks like that happened here. Sorry for that and the noise it caused! #regzbot resolve: invalid: was not really a regression in the first place Ciao, Thorsten
Thorsten Leemhuis <regressions@leemhuis.info> writes: > On 05.04.24 05:20, Michael Ellerman wrote: >> "Linux regression tracking (Thorsten Leemhuis)" >> <regressions@leemhuis.info> writes: >>> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting >>> for once, to make this easily accessible to everyone. >>> >>> Was this regression ever resolved? Doesn't look like it, but maybe I >>> just missed something. >> >> I'm not sure how it ended up on the regression list. > > That is easy to explain: I let lei search for mails containing words > like regress, bisect, and revert to become aware of regressions that > might need tracking. And... > >> IMHO it's not really a regression. > > ...sometimes I misjudge or misinterpret something and add it to the > regression tracking. Looks like that happened here. > > Sorry for that and the noise it caused! No worries. cheers
On Mon Apr 8, 2024 at 3:20 PM AEST, Michael Ellerman wrote: > Thorsten Leemhuis <regressions@leemhuis.info> writes: > > On 05.04.24 05:20, Michael Ellerman wrote: > >> "Linux regression tracking (Thorsten Leemhuis)" > >> <regressions@leemhuis.info> writes: > >>> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting > >>> for once, to make this easily accessible to everyone. > >>> > >>> Was this regression ever resolved? Doesn't look like it, but maybe I > >>> just missed something. > >> > >> I'm not sure how it ended up on the regression list. > > > > That is easy to explain: I let lei search for mails containing words > > like regress, bisect, and revert to become aware of regressions that > > might need tracking. And... > > > >> IMHO it's not really a regression. > > > > ...sometimes I misjudge or misinterpret something and add it to the > > regression tracking. Looks like that happened here. > > > > Sorry for that and the noise it caused! > > No worries. It is actually a regression. It only really affects performance, but the logic is broken (my fault). We were going to revert it, and solve the initial regression a better way. Thanks, Nick
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 0b921704da45..e47b954ce266 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -4856,7 +4856,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, * entering a nested guest in which case the decrementer is now owned * by L2 and the L1 decrementer is provided in hdec_expires */ - if (!kvmhv_is_nestedv2() && kvmppc_core_pending_dec(vcpu) && + if (kvmppc_core_pending_dec(vcpu) && ((tb < kvmppc_dec_expires_host_tb(vcpu)) || (trap == BOOK3S_INTERRUPT_SYSCALL && kvmppc_get_gpr(vcpu, 3) == H_ENTER_NESTED)))
This reverts commit 180c6b072bf360b686e53d893d8dcf7dbbaec6bb ("KVM: PPC: Book3S HV nestedv2: Do not cancel pending decrementer exception") which prevented cancelling a pending HDEC exception for nestedv2 KVM guests. It was done to avoid overhead of a H_GUEST_GET_STATE hcall to read the 'HDEC expiry TB' register which was higher compared to handling extra decrementer exceptions. This overhead of reading 'HDEC expiry TB' register has been mitigated recently by the L0 hypervisor(PowerVM) by putting the value of this register in L2 guest-state output buffer on trap to L1. From there the value of this register is cached, made available in kvmhv_run_single_vcpu() to compare it against host(L1) timebase and cancel the pending hypervisor decrementer exception if needed. Fixes: 180c6b072bf3 ("KVM: PPC: Book3S HV nestedv2: Do not cancel pending decrementer exception") Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> --- arch/powerpc/kvm/book3s_hv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)