Message ID | 20211009021236.4122790-40-seanjc@google.com |
---|---|
State | New |
Headers | show |
Series | KVM: Halt-polling and x86 APICv overhaul | expand |
On 09/10/21 04:12, Sean Christopherson wrote: > + /* > + * The smp_wmb() in kvm_make_request() pairs with the smp_mb_*() > + * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU > + * is guaranteed to see the event request if triggering a posted > + * interrupt "fails" because vcpu->mode != IN_GUEST_MODE. This explanation doesn't make much sense to me. This is just the usual request/kick pattern explained in Documentation/virt/kvm/vcpu-requests.rst; except that we don't bother with a "kick" out of guest mode because the entry always goes through kvm_check_request (in the nVMX case) or sync_pir_to_irr (if non-nested) and completes the delivery itself. In other word, it is a similar idea as patch 43/43. What this smp_wmb() pair with, is the smp_mb__after_atomic in kvm_check_request(KVM_REQ_EVENT, vcpu). Setting the interrupt in the PIR orders before kvm_make_request in this thread, and orders after kvm_make_request in the vCPU thread. Here, instead: > + /* > + * The implied barrier in pi_test_and_set_on() pairs with the smp_mb_*() > + * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU is > + * guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a > + * posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE. > + */ > if (vcpu != kvm_get_running_vcpu() && > !kvm_vcpu_trigger_posted_interrupt(vcpu, false)) > - kvm_vcpu_kick(vcpu); > + kvm_vcpu_wake_up(vcpu); > it pairs with the smp_mb__after_atomic in vmx_sync_pir_to_irr(). As explained again in vcpu-requests.rst, the ON bit has the same function as vcpu->request in the previous case. Paolo > + */ > kvm_make_request(KVM_REQ_EVENT, vcpu);
On Mon, Oct 25, 2021, Paolo Bonzini wrote: > On 09/10/21 04:12, Sean Christopherson wrote: > > + /* > > + * The smp_wmb() in kvm_make_request() pairs with the smp_mb_*() > > + * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU > > + * is guaranteed to see the event request if triggering a posted > > + * interrupt "fails" because vcpu->mode != IN_GUEST_MODE. > > This explanation doesn't make much sense to me. This is just the usual > request/kick pattern explained in Documentation/virt/kvm/vcpu-requests.rst; > except that we don't bother with a "kick" out of guest mode because the > entry always goes through kvm_check_request (in the nVMX case) or > sync_pir_to_irr (if non-nested) and completes the delivery itself. > > In other word, it is a similar idea as patch 43/43. > > What this smp_wmb() pair with, is the smp_mb__after_atomic in > kvm_check_request(KVM_REQ_EVENT, vcpu). I don't think that's correct. There is no kvm_check_request() in the relevant path. kvm_vcpu_exit_request() uses kvm_request_pending(), which is just a READ_ONCE() without a barrier. The smp_mb__after_atomic ensures that any assets that were modified prior to making the request are seen by the vCPU handling the request. It does not provide any guarantees for a different vCPU/task making a request and checking vcpu->mode versus the target vCPU setting vcpu->mode and checking for a pending request. > Setting the interrupt in the PIR orders before kvm_make_request in this > thread, and orders after kvm_make_request in the vCPU thread. > > Here, instead: > > > + /* > > + * The implied barrier in pi_test_and_set_on() pairs with the smp_mb_*() > > + * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU is > > + * guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a > > + * posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE. > > + */ > > if (vcpu != kvm_get_running_vcpu() && > > !kvm_vcpu_trigger_posted_interrupt(vcpu, false)) > > - kvm_vcpu_kick(vcpu); > > + kvm_vcpu_wake_up(vcpu); > > it pairs with the smp_mb__after_atomic in vmx_sync_pir_to_irr(). As > explained again in vcpu-requests.rst, the ON bit has the same function as > vcpu->request in the previous case. Same as above, I don't think that's correct. The smp_mb__after_atomic() ensures that there's no race between the IOMMU writing vIRR and setting ON, and KVM clearing ON and processing the vIRR. pi_test_on() is not an atomic operation, and there's no memory barrier if ON=0. It's the same behavior as kvm_check_request(), but again the ordering with respect to vcpu->mode isn't being handled by PID.ON/kvm_check_request(). AIUI, this is the barrier that's paired with the PI barriers. This is even called out in (2). vcpu->mode = IN_GUEST_MODE; srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); /* * 1) We should set ->mode before checking ->requests. Please see * the comment in kvm_vcpu_exiting_guest_mode(). * * 2) For APICv, we should set ->mode before checking PID.ON. This * pairs with the memory barrier implicit in pi_test_and_set_on * (see vmx_deliver_posted_interrupt). * * 3) This also orders the write to mode from any reads to the page * tables done while the VCPU is running. Please see the comment * in kvm_flush_remote_tlbs. */ smp_mb__after_srcu_read_unlock();
On 27/10/21 18:04, Sean Christopherson wrote: >>> + /* >>> + * The smp_wmb() in kvm_make_request() pairs with the smp_mb_*() >>> + * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU >>> + * is guaranteed to see the event request if triggering a posted >>> + * interrupt "fails" because vcpu->mode != IN_GUEST_MODE. >> >> What this smp_wmb() pair with, is the smp_mb__after_atomic in >> kvm_check_request(KVM_REQ_EVENT, vcpu). > > I don't think that's correct. There is no kvm_check_request() in the relevant path. > kvm_vcpu_exit_request() uses kvm_request_pending(), which is just a READ_ONCE() > without a barrier. Ok, we are talking about two different set of barriers. This is mine: - smp_wmb() in kvm_make_request() pairs with the smp_mb__after_atomic() in kvm_check_request(); it ensures that everything before the request (in this case, pi_pending = true) is seen by inject_pending_event. - pi_test_and_set_on() orders the write to ON after the write to PIR, pairing with vmx_sync_pir_to_irr and ensuring that the bit in the PIR is seen. And this is yours: - pi_test_and_set_on() _also_ orders the write to ON before the read of vcpu->mode, pairing with vcpu_enter_guest() - kvm_make_request() however does _not_ order the write to vcpu->requests before the read of vcpu->mode, even though it's needed. Usually that's handled by kvm_vcpu_exiting_guest_mode(), but in this case vcpu->mode is read in kvm_vcpu_trigger_posted_interrupt. So vmx_deliver_nested_posted_interrupt() is missing a smp_mb__after_atomic(). It's documentation only for x86, but still easily done in v3. Paolo
On Thu, 2021-10-28 at 00:09 +0200, Paolo Bonzini wrote: > On 27/10/21 18:04, Sean Christopherson wrote: > > > > + /* > > > > + * The smp_wmb() in kvm_make_request() pairs with the smp_mb_*() > > > > + * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU > > > > + * is guaranteed to see the event request if triggering a posted > > > > + * interrupt "fails" because vcpu->mode != IN_GUEST_MODE. > > > > > > What this smp_wmb() pair with, is the smp_mb__after_atomic in > > > kvm_check_request(KVM_REQ_EVENT, vcpu). > > > > I don't think that's correct. There is no kvm_check_request() in the relevant path. > > kvm_vcpu_exit_request() uses kvm_request_pending(), which is just a READ_ONCE() > > without a barrier. > > Ok, we are talking about two different set of barriers. This is mine: > > - smp_wmb() in kvm_make_request() pairs with the smp_mb__after_atomic() in > kvm_check_request(); it ensures that everything before the request > (in this case, pi_pending = true) is seen by inject_pending_event. > > - pi_test_and_set_on() orders the write to ON after the write to PIR, > pairing with vmx_sync_pir_to_irr and ensuring that the bit in the PIR is > seen. > > And this is yours: > > - pi_test_and_set_on() _also_ orders the write to ON before the read of > vcpu->mode, pairing with vcpu_enter_guest() > > - kvm_make_request() however does _not_ order the write to > vcpu->requests before the read of vcpu->mode, even though it's needed. > Usually that's handled by kvm_vcpu_exiting_guest_mode(), but in this case > vcpu->mode is read in kvm_vcpu_trigger_posted_interrupt. Yes indeed, kvm_make_request() writes the vcpu->requests after the memory barrier, and then there is no barrier until reading of vcpu->mode in kvm_vcpu_trigger_posted_interrupt. > > So vmx_deliver_nested_posted_interrupt() is missing a smp_mb__after_atomic(). > It's documentation only for x86, but still easily done in v3. > > Paolo > I used this patch as a justification to read Paolo's excellent LWN series of articles on memory barriers, to refresh my knowledge of the memory barriers and understand the above analysis better. https://lwn.net/Articles/844224/ I agree with the above, but this is something that is so easy to make a mistake that I can't be 100% sure. Best regards, Maxim Levitsky
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 13e732a818f3..44d760dde0f9 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3978,10 +3978,16 @@ static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu, * we will accomplish it in the next vmentry. */ vmx->nested.pi_pending = true; + /* + * The smp_wmb() in kvm_make_request() pairs with the smp_mb_*() + * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU + * is guaranteed to see the event request if triggering a posted + * interrupt "fails" because vcpu->mode != IN_GUEST_MODE. + */ kvm_make_request(KVM_REQ_EVENT, vcpu); /* the PIR and ON have been set by L1. */ if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) - kvm_vcpu_kick(vcpu); + kvm_vcpu_wake_up(vcpu); return 0; } return -1; @@ -4012,9 +4018,15 @@ static int vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector) if (pi_test_and_set_on(&vmx->pi_desc)) return 0; + /* + * The implied barrier in pi_test_and_set_on() pairs with the smp_mb_*() + * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU is + * guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a + * posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE. + */ if (vcpu != kvm_get_running_vcpu() && !kvm_vcpu_trigger_posted_interrupt(vcpu, false)) - kvm_vcpu_kick(vcpu); + kvm_vcpu_wake_up(vcpu); return 0; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9643f23c28c7..274d295cabfb 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9752,8 +9752,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) smp_mb__after_srcu_read_unlock(); /* - * This handles the case where a posted interrupt was - * notified with kvm_vcpu_kick. + * Process pending posted interrupts to handle the case where the + * notification IRQ arrived in the host, or was never sent (because the + * target vCPU wasn't running). */ if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active) static_call(kvm_x86_sync_pir_to_irr)(vcpu);
Replace the full "kick" with just the "wake" in the fallback path when triggering a virtual interrupt via a posted interrupt fails because the guest is not IN_GUEST_MODE. If the guest transitions into guest mode between the check and the kick, then it's guaranteed to see the pending interrupt as KVM syncs the PIR to IRR (and onto GUEST_RVI) after setting IN_GUEST_MODE. Kicking the guest in this case is nothing more than an unnecessary VM-Exit (and host IRQ). Opportunistically update comments to explain the various ordering rules and barriers at play. Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/vmx/vmx.c | 16 ++++++++++++++-- arch/x86/kvm/x86.c | 5 +++-- 2 files changed, 17 insertions(+), 4 deletions(-)