@@ -3435,7 +3435,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
unsigned long host_pidr = mfspr(SPRN_PID);
hdec = time_limit - mftb();
- if (hdec < 0)
+ if (hdec < 2048)
return BOOK3S_INTERRUPT_HV_DECREMENTER;
mtspr(SPRN_HDEC, hdec);
@@ -3564,7 +3564,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
dec = mfspr(SPRN_DEC);
tb = mftb();
- if (dec < 512)
+ if (dec < 2048)
return BOOK3S_INTERRUPT_HV_DECREMENTER;
local_paca->kvm_hstate.dec_expires = dec + tb;
if (local_paca->kvm_hstate.dec_expires < time_limit)
Before entering a guest, we need to set the HDEC to pull us out again when the guest's time is up. This needs some care, though, because the HDEC is edge triggered, which means that if it expires before entering the guest, the interrupt will be lost, meaning we stay in the guest indefinitely (in practice, until the the hard lockup detector pulls us out with an NMI). For the POWER9, independent threads mode specific path, we attempt to prevent that, by testing time has already expired before setting the HDEC in kvmhv_load_regs_and_go(). However, that doesn't account for the case where the timer expires between that test and the actual guest entry. Preliminary instrumentation suggests that can take as long as 1.5µs under certain load conditions, and simply checking the HDEC value we're going to load is positive isn't enough to guarantee that leeway. That test here is sometimes masked by a test in kvmhv_p9_guest_entry(), its caller. That checks that the remaining time is at 1µs. However as noted above that doesn't appear to be sufficient in all circumstances even from the point HDEC is set, let alone this earlier point. Therefore, increase the threshold we check for in both locations to 4µs (2048 timebase ticks). This is a pretty crude approach, but it addresses a real problem where guest load can trigger a host hard lockup. We're hoping to refine this in future by gathering more data on exactly how long these paths can take, and possibly by moving the check closer to the actual guest entry point to reduce the variance. Getting the details for that might take some time however. NOTE: For reasons I haven't yet tracked down yet, I haven't actually managed to reproduce this on current upstream. I have reproduced it on RHEL kernels without obvious differences in this area. I'm still trying to determine what the cause of that difference is, but I think it's worth applying this change as a precaution in the interim. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- arch/powerpc/kvm/book3s_hv.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)