diff mbox series

[v5,3/4] failover: fix unplug pending detection

Message ID 20211119090718.440793-4-lvivier@redhat.com
State New
Headers show
Series tests/qtest: add some tests for virtio-net failover | expand

Commit Message

Laurent Vivier Nov. 19, 2021, 9:07 a.m. UTC
Failover needs to detect the end of the PCI unplug to start migration
after the VFIO card has been unplugged.

To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and reset in
pcie_unplug_device().

But since
    17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35")
we have switched to ACPI unplug and these functions are not called anymore
and the flag not set. So failover migration is not able to detect if card
is really unplugged and acts as it's done as soon as it's started. So it
doesn't wait the end of the unplug to start the migration. We don't see any
problem when we test that because ACPI unplug is faster than PCIe native
hotplug and when the migration really starts the unplug operation is
already done.

See c000a9bd06ea ("pci: mark device having guest unplug request pending")
    a99c4da9fc2a ("pci: mark devices partially unplugged")

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
---
 hw/acpi/pcihp.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

Comments

Michael S. Tsirkin Dec. 8, 2021, 7:36 a.m. UTC | #1
On Fri, Nov 19, 2021 at 10:07:17AM +0100, Laurent Vivier wrote:
> Failover needs to detect the end of the PCI unplug to start migration
> after the VFIO card has been unplugged.
> 
> To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and reset in
> pcie_unplug_device().
> 
> But since
>     17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35")
> we have switched to ACPI unplug and these functions are not called anymore
> and the flag not set. So failover migration is not able to detect if card
> is really unplugged and acts as it's done as soon as it's started. So it
> doesn't wait the end of the unplug to start the migration. We don't see any
> problem when we test that because ACPI unplug is faster than PCIe native
> hotplug and when the migration really starts the unplug operation is
> already done.
> 
> See c000a9bd06ea ("pci: mark device having guest unplug request pending")
>     a99c4da9fc2a ("pci: mark devices partially unplugged")
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> Reviewed-by: Ani Sinha <ani@anisinha.ca>

Hmm.  I think this one may be needed for this release actually.
Isolate from testing changes and repost?

> ---
>  hw/acpi/pcihp.c | 30 +++++++++++++++++++++++++++---
>  1 file changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index f610a25d2ef9..30405b5113d7 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -222,9 +222,27 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slo
>          PCIDevice *dev = PCI_DEVICE(qdev);
>          if (PCI_SLOT(dev->devfn) == slot) {
>              if (!acpi_pcihp_pc_no_hotplug(s, dev)) {
> -                hotplug_ctrl = qdev_get_hotplug_handler(qdev);
> -                hotplug_handler_unplug(hotplug_ctrl, qdev, &error_abort);
> -                object_unparent(OBJECT(qdev));
> +                /*
> +                 * partially_hotplugged is used by virtio-net failover:
> +                 * failover has asked the guest OS to unplug the device
> +                 * but we need to keep some references to the device
> +                 * to be able to plug it back in case of failure so
> +                 * we don't execute hotplug_handler_unplug().
> +                 */
> +                if (dev->partially_hotplugged) {
> +                    /*
> +                     * pending_deleted_event is set to true when
> +                     * virtio-net failover asks to unplug the device,
> +                     * and set to false here when the operation is done
> +                     * This is used by the migration loop to detect the
> +                     * end of the operation and really start the migration.
> +                     */
> +                    qdev->pending_deleted_event = false;
> +                } else {
> +                    hotplug_ctrl = qdev_get_hotplug_handler(qdev);
> +                    hotplug_handler_unplug(hotplug_ctrl, qdev, &error_abort);
> +                    object_unparent(OBJECT(qdev));
> +                }
>              }
>          }
>      }
> @@ -396,6 +414,12 @@ void acpi_pcihp_device_unplug_request_cb(HotplugHandler *hotplug_dev,
>          return;
>      }
>  
> +    /*
> +     * pending_deleted_event is used by virtio-net failover to detect the
> +     * end of the unplug operation, the flag is set to false in
> +     * acpi_pcihp_eject_slot() when the operation is completed.
> +     */
> +    pdev->qdev.pending_deleted_event = true;
>      s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
>      acpi_send_event(DEVICE(hotplug_dev), ACPI_PCI_HOTPLUG_STATUS);
>  }
> -- 
> 2.33.1
Thomas Huth Dec. 8, 2021, 7:50 a.m. UTC | #2
On 08/12/2021 08.36, Michael S. Tsirkin wrote:
> On Fri, Nov 19, 2021 at 10:07:17AM +0100, Laurent Vivier wrote:
>> Failover needs to detect the end of the PCI unplug to start migration
>> after the VFIO card has been unplugged.
>>
>> To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and reset in
>> pcie_unplug_device().
>>
>> But since
>>      17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35")
>> we have switched to ACPI unplug and these functions are not called anymore
>> and the flag not set. So failover migration is not able to detect if card
>> is really unplugged and acts as it's done as soon as it's started. So it
>> doesn't wait the end of the unplug to start the migration. We don't see any
>> problem when we test that because ACPI unplug is faster than PCIe native
>> hotplug and when the migration really starts the unplug operation is
>> already done.
>>
>> See c000a9bd06ea ("pci: mark device having guest unplug request pending")
>>      a99c4da9fc2a ("pci: mark devices partially unplugged")
>>
>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>> Reviewed-by: Ani Sinha <ani@anisinha.ca>
> 
> Hmm.  I think this one may be needed for this release actually.
> Isolate from testing changes and repost?

You merged it already here:

  https://gitlab.com/qemu-project/qemu/-/commit/9323f892b39d133eb6

so we should be fine :-)

  Thomas
Ani Sinha Dec. 8, 2021, 10:44 a.m. UTC | #3
On Wed, Dec 8, 2021 at 1:20 PM Thomas Huth <thuth@redhat.com> wrote:
>
> On 08/12/2021 08.36, Michael S. Tsirkin wrote:
> > On Fri, Nov 19, 2021 at 10:07:17AM +0100, Laurent Vivier wrote:
> >> Failover needs to detect the end of the PCI unplug to start migration
> >> after the VFIO card has been unplugged.
> >>
> >> To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and reset in
> >> pcie_unplug_device().
> >>
> >> But since
> >>      17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35")
> >> we have switched to ACPI unplug and these functions are not called anymore
> >> and the flag not set. So failover migration is not able to detect if card
> >> is really unplugged and acts as it's done as soon as it's started. So it
> >> doesn't wait the end of the unplug to start the migration. We don't see any
> >> problem when we test that because ACPI unplug is faster than PCIe native
> >> hotplug and when the migration really starts the unplug operation is
> >> already done.
> >>
> >> See c000a9bd06ea ("pci: mark device having guest unplug request pending")
> >>      a99c4da9fc2a ("pci: mark devices partially unplugged")
> >>
> >> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> >> Reviewed-by: Ani Sinha <ani@anisinha.ca>
> >
> > Hmm.  I think this one may be needed for this release actually.
> > Isolate from testing changes and repost?
>
> You merged it already here:
>
>   https://gitlab.com/qemu-project/qemu/-/commit/9323f892b39d133eb6

Yes I pointed this out in the other thread. Michael was not CC'd there
somehow ...

>
> so we should be fine :-)
>
>   Thomas
>
Michael S. Tsirkin Dec. 8, 2021, 11:11 a.m. UTC | #4
On Wed, Dec 08, 2021 at 08:50:34AM +0100, Thomas Huth wrote:
> On 08/12/2021 08.36, Michael S. Tsirkin wrote:
> > On Fri, Nov 19, 2021 at 10:07:17AM +0100, Laurent Vivier wrote:
> > > Failover needs to detect the end of the PCI unplug to start migration
> > > after the VFIO card has been unplugged.
> > > 
> > > To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and reset in
> > > pcie_unplug_device().
> > > 
> > > But since
> > >      17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35")
> > > we have switched to ACPI unplug and these functions are not called anymore
> > > and the flag not set. So failover migration is not able to detect if card
> > > is really unplugged and acts as it's done as soon as it's started. So it
> > > doesn't wait the end of the unplug to start the migration. We don't see any
> > > problem when we test that because ACPI unplug is faster than PCIe native
> > > hotplug and when the migration really starts the unplug operation is
> > > already done.
> > > 
> > > See c000a9bd06ea ("pci: mark device having guest unplug request pending")
> > >      a99c4da9fc2a ("pci: mark devices partially unplugged")
> > > 
> > > Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> > > Reviewed-by: Ani Sinha <ani@anisinha.ca>
> > 
> > Hmm.  I think this one may be needed for this release actually.
> > Isolate from testing changes and repost?
> 
> You merged it already here:
> 
>  https://gitlab.com/qemu-project/qemu/-/commit/9323f892b39d133eb6

Oh, good. Forgot to drop the tag.

> so we should be fine :-)
> 
>  Thomas
diff mbox series

Patch

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index f610a25d2ef9..30405b5113d7 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -222,9 +222,27 @@  static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slo
         PCIDevice *dev = PCI_DEVICE(qdev);
         if (PCI_SLOT(dev->devfn) == slot) {
             if (!acpi_pcihp_pc_no_hotplug(s, dev)) {
-                hotplug_ctrl = qdev_get_hotplug_handler(qdev);
-                hotplug_handler_unplug(hotplug_ctrl, qdev, &error_abort);
-                object_unparent(OBJECT(qdev));
+                /*
+                 * partially_hotplugged is used by virtio-net failover:
+                 * failover has asked the guest OS to unplug the device
+                 * but we need to keep some references to the device
+                 * to be able to plug it back in case of failure so
+                 * we don't execute hotplug_handler_unplug().
+                 */
+                if (dev->partially_hotplugged) {
+                    /*
+                     * pending_deleted_event is set to true when
+                     * virtio-net failover asks to unplug the device,
+                     * and set to false here when the operation is done
+                     * This is used by the migration loop to detect the
+                     * end of the operation and really start the migration.
+                     */
+                    qdev->pending_deleted_event = false;
+                } else {
+                    hotplug_ctrl = qdev_get_hotplug_handler(qdev);
+                    hotplug_handler_unplug(hotplug_ctrl, qdev, &error_abort);
+                    object_unparent(OBJECT(qdev));
+                }
             }
         }
     }
@@ -396,6 +414,12 @@  void acpi_pcihp_device_unplug_request_cb(HotplugHandler *hotplug_dev,
         return;
     }
 
+    /*
+     * pending_deleted_event is used by virtio-net failover to detect the
+     * end of the unplug operation, the flag is set to false in
+     * acpi_pcihp_eject_slot() when the operation is completed.
+     */
+    pdev->qdev.pending_deleted_event = true;
     s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
     acpi_send_event(DEVICE(hotplug_dev), ACPI_PCI_HOTPLUG_STATUS);
 }