diff mbox series

[v2,24/43] KVM: VMX: Drop pointless PI.NDST update when blocking

Message ID 20211009021236.4122790-25-seanjc@google.com
State New
Headers show
Series KVM: Halt-polling and x86 APICv overhaul | expand

Commit Message

Sean Christopherson Oct. 9, 2021, 2:12 a.m. UTC
Don't update Posted Interrupt's NDST, a.k.a. the target pCPU, in the
pre-block path, as NDST is guaranteed to be up-to-date.  The comment
about the vCPU being preempted during the update is simply wrong, as the
update path runs with IRQs disabled (from before snapshotting vcpu->cpu,
until after the update completes).

The vCPU can get preempted _before_ the update starts, but not during.
And if the vCPU is preempted before, vmx_vcpu_pi_load() is responsible
for updating NDST when the vCPU is scheduled back in.  In that case, the
check against the wakeup vector in vmx_vcpu_pi_load() cannot be true as
that would require the notification vector to have been set to the wakeup
vector _before_ blocking.

Opportunistically switch to using vcpu->cpu for the list/lock lookups,
which presumably used pre_pcpu only for some phantom preemption logic.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/posted_intr.c | 23 +++--------------------
 1 file changed, 3 insertions(+), 20 deletions(-)

Comments

Paolo Bonzini Oct. 25, 2021, 2:01 p.m. UTC | #1
On 09/10/21 04:12, Sean Christopherson wrote:
> Don't update Posted Interrupt's NDST, a.k.a. the target pCPU, in the
> pre-block path, as NDST is guaranteed to be up-to-date.  The comment
> about the vCPU being preempted during the update is simply wrong, as the
> update path runs with IRQs disabled (from before snapshotting vcpu->cpu,
> until after the update completes).

Right, it didn't as of commit bf9f6ac8d74969690df1485b33b7c238ca9f2269 
(when VT-d posted interrupts were introduced).

The interrupt disable/enable pair was added in the same commit that 
motivated the introduction of the sanity checks:

     commit 8b306e2f3c41939ea528e6174c88cfbfff893ce1
     Author: Paolo Bonzini <pbonzini@redhat.com>
     Date:   Tue Jun 6 12:57:05 2017 +0200

     KVM: VMX: avoid double list add with VT-d posted interrupts

     In some cases, for example involving hot-unplug of assigned
     devices, pi_post_block can forget to remove the vCPU from the
     blocked_vcpu_list.  When this happens, the next call to
     pi_pre_block corrupts the list.

     Fix this in two ways.  First, check vcpu->pre_pcpu in pi_pre_block
     and WARN instead of adding the element twice in the list.  Second,
     always do the list removal in pi_post_block if vcpu->pre_pcpu is
     set (not -1).

     The new code keeps interrupts disabled for the whole duration of
     pi_pre_block/pi_post_block.  This is not strictly necessary, but
     easier to follow.  For the same reason, PI.ON is checked only
     after the cmpxchg, and to handle it we just call the post-block
     code.  This removes duplication of the list removal code.

At the time, I didn't notice the now useless NDST update.

Paolo

> The vCPU can get preempted_before_  the update starts, but not during.
> And if the vCPU is preempted before, vmx_vcpu_pi_load() is responsible
> for updating NDST when the vCPU is scheduled back in.  In that case, the
> check against the wakeup vector in vmx_vcpu_pi_load() cannot be true as
> that would require the notification vector to have been set to the wakeup
> vector_before_  blocking.
> 
> Opportunistically switch to using vcpu->cpu for the list/lock lookups,
> which presumably used pre_pcpu only for some phantom preemption logic.
Sean Christopherson Oct. 27, 2021, 2:26 p.m. UTC | #2
On Mon, Oct 25, 2021, Paolo Bonzini wrote:
> On 09/10/21 04:12, Sean Christopherson wrote:
> > Don't update Posted Interrupt's NDST, a.k.a. the target pCPU, in the
> > pre-block path, as NDST is guaranteed to be up-to-date.  The comment
> > about the vCPU being preempted during the update is simply wrong, as the
> > update path runs with IRQs disabled (from before snapshotting vcpu->cpu,
> > until after the update completes).
> 
> Right, it didn't as of commit bf9f6ac8d74969690df1485b33b7c238ca9f2269 (when
> VT-d posted interrupts were introduced).
> 
> The interrupt disable/enable pair was added in the same commit that
> motivated the introduction of the sanity checks:

Ya, I found that commit when digging around for different commit in the series
and forgot to come back to this changelog.  I'll incorporate this info into the
next version.

>     commit 8b306e2f3c41939ea528e6174c88cfbfff893ce1
>     Author: Paolo Bonzini <pbonzini@redhat.com>
>     Date:   Tue Jun 6 12:57:05 2017 +0200
> 
>     KVM: VMX: avoid double list add with VT-d posted interrupts
> 
>     In some cases, for example involving hot-unplug of assigned
>     devices, pi_post_block can forget to remove the vCPU from the
>     blocked_vcpu_list.  When this happens, the next call to
>     pi_pre_block corrupts the list.
> 
>     Fix this in two ways.  First, check vcpu->pre_pcpu in pi_pre_block
>     and WARN instead of adding the element twice in the list.  Second,
>     always do the list removal in pi_post_block if vcpu->pre_pcpu is
>     set (not -1).
> 
>     The new code keeps interrupts disabled for the whole duration of
>     pi_pre_block/pi_post_block.  This is not strictly necessary, but
>     easier to follow.  For the same reason, PI.ON is checked only
>     after the cmpxchg, and to handle it we just call the post-block
>     code.  This removes duplication of the list removal code.
> 
> At the time, I didn't notice the now useless NDST update.
> 
> Paolo
> 
> > The vCPU can get preempted_before_  the update starts, but not during.
> > And if the vCPU is preempted before, vmx_vcpu_pi_load() is responsible
> > for updating NDST when the vCPU is scheduled back in.  In that case, the
> > check against the wakeup vector in vmx_vcpu_pi_load() cannot be true as
> > that would require the notification vector to have been set to the wakeup
> > vector_before_  blocking.
> > 
> > Opportunistically switch to using vcpu->cpu for the list/lock lookups,
> > which presumably used pre_pcpu only for some phantom preemption logic.
>
Maxim Levitsky Oct. 28, 2021, 10:53 a.m. UTC | #3
On Fri, 2021-10-08 at 19:12 -0700, Sean Christopherson wrote:
> Don't update Posted Interrupt's NDST, a.k.a. the target pCPU, in the
> pre-block path, as NDST is guaranteed to be up-to-date.  The comment
> about the vCPU being preempted during the update is simply wrong, as the
> update path runs with IRQs disabled (from before snapshotting vcpu->cpu,
> until after the update completes).
> 
> The vCPU can get preempted _before_ the update starts, but not during.
> And if the vCPU is preempted before, vmx_vcpu_pi_load() is responsible
> for updating NDST when the vCPU is scheduled back in.  In that case, the
> check against the wakeup vector in vmx_vcpu_pi_load() cannot be true as
> that would require the notification vector to have been set to the wakeup
> vector _before_ blocking.
> 
> Opportunistically switch to using vcpu->cpu for the list/lock lookups,
> which presumably used pre_pcpu only for some phantom preemption logic.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/vmx/posted_intr.c | 23 +++--------------------
>  1 file changed, 3 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
> index 1688f8dc535a..239e0e72a0dd 100644
> --- a/arch/x86/kvm/vmx/posted_intr.c
> +++ b/arch/x86/kvm/vmx/posted_intr.c
> @@ -130,7 +130,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
>   * - Store the vCPU to the wakeup list, so when interrupts happen
>   *   we can find the right vCPU to wake up.
>   * - Change the Posted-interrupt descriptor as below:
> - *      'NDST' <-- vcpu->pre_pcpu
>   *      'NV' <-- POSTED_INTR_WAKEUP_VECTOR
>   * - If 'ON' is set during this process, which means at least one
>   *   interrupt is posted for this vCPU, we cannot block it, in
> @@ -139,7 +138,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
>   */
>  int pi_pre_block(struct kvm_vcpu *vcpu)
>  {
> -	unsigned int dest;
>  	struct pi_desc old, new;
>  	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
>  
> @@ -153,10 +151,10 @@ int pi_pre_block(struct kvm_vcpu *vcpu)
>  	local_irq_disable();
>  
>  	vcpu->pre_pcpu = vcpu->cpu;
> -	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> +	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->cpu));
>  	list_add_tail(&vcpu->blocked_vcpu_list,
> -		      &per_cpu(blocked_vcpu_on_cpu, vcpu->pre_pcpu));
> -	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> +		      &per_cpu(blocked_vcpu_on_cpu, vcpu->cpu));
> +	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->cpu));
>  
>  	WARN(pi_desc->sn == 1,
>  	     "Posted Interrupt Suppress Notification set before blocking");
> @@ -164,21 +162,6 @@ int pi_pre_block(struct kvm_vcpu *vcpu)
>  	do {
>  		old.control = new.control = pi_desc->control;
>  
> -		/*
> -		 * Since vCPU can be preempted during this process,
> -		 * vcpu->cpu could be different with pre_pcpu, we
> -		 * need to set pre_pcpu as the destination of wakeup
> -		 * notification event, then we can find the right vCPU
> -		 * to wakeup in wakeup handler if interrupts happen
> -		 * when the vCPU is in blocked state.
> -		 */
> -		dest = cpu_physical_id(vcpu->pre_pcpu);
> -
> -		if (x2apic_mode)
> -			new.ndst = dest;
> -		else
> -			new.ndst = (dest << 8) & 0xFF00;
> -
>  		/* set 'NV' to 'wakeup vector' */
>  		new.nv = POSTED_INTR_WAKEUP_VECTOR;
>  	} while (cmpxchg64(&pi_desc->control, old.control,
Reviewed-by : Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 1688f8dc535a..239e0e72a0dd 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -130,7 +130,6 @@  static void __pi_post_block(struct kvm_vcpu *vcpu)
  * - Store the vCPU to the wakeup list, so when interrupts happen
  *   we can find the right vCPU to wake up.
  * - Change the Posted-interrupt descriptor as below:
- *      'NDST' <-- vcpu->pre_pcpu
  *      'NV' <-- POSTED_INTR_WAKEUP_VECTOR
  * - If 'ON' is set during this process, which means at least one
  *   interrupt is posted for this vCPU, we cannot block it, in
@@ -139,7 +138,6 @@  static void __pi_post_block(struct kvm_vcpu *vcpu)
  */
 int pi_pre_block(struct kvm_vcpu *vcpu)
 {
-	unsigned int dest;
 	struct pi_desc old, new;
 	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
 
@@ -153,10 +151,10 @@  int pi_pre_block(struct kvm_vcpu *vcpu)
 	local_irq_disable();
 
 	vcpu->pre_pcpu = vcpu->cpu;
-	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
+	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->cpu));
 	list_add_tail(&vcpu->blocked_vcpu_list,
-		      &per_cpu(blocked_vcpu_on_cpu, vcpu->pre_pcpu));
-	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
+		      &per_cpu(blocked_vcpu_on_cpu, vcpu->cpu));
+	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->cpu));
 
 	WARN(pi_desc->sn == 1,
 	     "Posted Interrupt Suppress Notification set before blocking");
@@ -164,21 +162,6 @@  int pi_pre_block(struct kvm_vcpu *vcpu)
 	do {
 		old.control = new.control = pi_desc->control;
 
-		/*
-		 * Since vCPU can be preempted during this process,
-		 * vcpu->cpu could be different with pre_pcpu, we
-		 * need to set pre_pcpu as the destination of wakeup
-		 * notification event, then we can find the right vCPU
-		 * to wakeup in wakeup handler if interrupts happen
-		 * when the vCPU is in blocked state.
-		 */
-		dest = cpu_physical_id(vcpu->pre_pcpu);
-
-		if (x2apic_mode)
-			new.ndst = dest;
-		else
-			new.ndst = (dest << 8) & 0xFF00;
-
 		/* set 'NV' to 'wakeup vector' */
 		new.nv = POSTED_INTR_WAKEUP_VECTOR;
 	} while (cmpxchg64(&pi_desc->control, old.control,