Message ID | 20230727072410.135743-4-jing2.liu@intel.com |
---|---|
State | New |
Headers | show |
Series | Support dynamic MSI-X allocation | expand |
On Thu, 27 Jul 2023 03:24:10 -0400 Jing Liu <jing2.liu@intel.com> wrote: > During migration restoring, vfio_enable_vectors() is called to restore > enabling MSI-X interrupts for assigned devices. It sets the range from 0 > to nr_vectors to kernel to enable MSI-X and the vectors unmasked in > guest. During the MSI-X enabling, all the vectors within the range are > allocated according to the ioctl(). > > When dynamic MSI-X allocation is supported, we only want the guest > unmasked vectors being allocated and enabled. Therefore, Qemu can first > set vector 0 to enable MSI-X and after that, all the vectors can be > allocated in need. > > Signed-off-by: Jing Liu <jing2.liu@intel.com> > --- > hw/vfio/pci.c | 32 ++++++++++++++++++++++++++++++++ > 1 file changed, 32 insertions(+) > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > index 8c485636445c..43ffacd5b36a 100644 > --- a/hw/vfio/pci.c > +++ b/hw/vfio/pci.c > @@ -375,6 +375,38 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix) > int ret = 0, i, argsz; > int32_t *fds; > > + /* > + * If dynamic MSI-X allocation is supported, the vectors to be allocated > + * and enabled can be scattered. Before kernel enabling MSI-X, setting > + * nr_vectors causes all these vectors being allocated on host. s/being/to be/ > + * > + * To keep allocation as needed, first setup vector 0 with an invalid > + * fd to make MSI-X enabled, then enable vectors by setting all so that > + * kernel allocates and enables interrupts only when enabled in guest. > + */ > + if (msix && !(vdev->msix->irq_info_flags & VFIO_IRQ_INFO_NORESIZE)) { !vdev->msix->noresize again seems cleaner. > + argsz = sizeof(*irq_set) + sizeof(*fds); > + > + irq_set = g_malloc0(argsz); > + irq_set->argsz = argsz; > + irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | > + VFIO_IRQ_SET_ACTION_TRIGGER; > + irq_set->index = msix ? VFIO_PCI_MSIX_IRQ_INDEX : > + VFIO_PCI_MSI_IRQ_INDEX; Why are we testing msix again within a branch that requires msix? > + irq_set->start = 0; > + irq_set->count = 1; > + fds = (int32_t *)&irq_set->data; > + fds[0] = -1; > + > + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set); > + > + g_free(irq_set); > + > + if (ret) { > + return ret; > + } > + } So your goal here is simply to get the kernel to call vfio_msi_enable() with nvec = 1 to get MSI-X enabled on the device, which then allows the kernel to use the dynamic expansion when we call SET_IRQS again with a potentially sparse set of eventfds to vector mappings. This seems very similar to the nr_vectors == 0 branch of vfio_msix_enable() where it uses a do_use and release call to accomplish getting MSI-X enabled. We should consolidate, probably by pulling this out into a function since it seems cleaner to use the fd = -1 trick than to setup userspace triggering and immediately release. Thanks, Alex > + > argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds)); > > irq_set = g_malloc0(argsz);
Hi Alex, > On July 28, 2023 1:25 AM, Alex Williamson <alex.williamson@redhat.com> wrote: > > On Thu, 27 Jul 2023 03:24:10 -0400 > Jing Liu <jing2.liu@intel.com> wrote: > > > During migration restoring, vfio_enable_vectors() is called to restore > > enabling MSI-X interrupts for assigned devices. It sets the range from > > 0 to nr_vectors to kernel to enable MSI-X and the vectors unmasked in > > guest. During the MSI-X enabling, all the vectors within the range are > > allocated according to the ioctl(). > > > > When dynamic MSI-X allocation is supported, we only want the guest > > unmasked vectors being allocated and enabled. Therefore, Qemu can > > first set vector 0 to enable MSI-X and after that, all the vectors can > > be allocated in need. > > > > Signed-off-by: Jing Liu <jing2.liu@intel.com> > > --- > > hw/vfio/pci.c | 32 ++++++++++++++++++++++++++++++++ > > 1 file changed, 32 insertions(+) > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index > > 8c485636445c..43ffacd5b36a 100644 > > --- a/hw/vfio/pci.c > > +++ b/hw/vfio/pci.c > > @@ -375,6 +375,38 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, > bool msix) > > int ret = 0, i, argsz; > > int32_t *fds; > > > > + /* > > + * If dynamic MSI-X allocation is supported, the vectors to be allocated > > + * and enabled can be scattered. Before kernel enabling MSI-X, setting > > + * nr_vectors causes all these vectors being allocated on host. > > s/being/to be/ Will change. > > > + * > > + * To keep allocation as needed, first setup vector 0 with an invalid > > + * fd to make MSI-X enabled, then enable vectors by setting all so that > > + * kernel allocates and enables interrupts only when enabled in guest. > > + */ > > + if (msix && !(vdev->msix->irq_info_flags & > > + VFIO_IRQ_INFO_NORESIZE)) { > > !vdev->msix->noresize again seems cleaner. Sure, will change. > > > + argsz = sizeof(*irq_set) + sizeof(*fds); > > + > > + irq_set = g_malloc0(argsz); > > + irq_set->argsz = argsz; > > + irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | > > + VFIO_IRQ_SET_ACTION_TRIGGER; > > + irq_set->index = msix ? VFIO_PCI_MSIX_IRQ_INDEX : > > + VFIO_PCI_MSI_IRQ_INDEX; > > Why are we testing msix again within a branch that requires msix? Ah, yes. Will remove the test. > > > + irq_set->start = 0; > > + irq_set->count = 1; > > + fds = (int32_t *)&irq_set->data; > > + fds[0] = -1; > > + > > + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, > > + irq_set); > > + > > + g_free(irq_set); > > + > > + if (ret) { > > + return ret; > > + } > > + } > > So your goal here is simply to get the kernel to call vfio_msi_enable() with nvec > = 1 to get MSI-X enabled on the device, which then allows the kernel to use the > dynamic expansion when we call SET_IRQS again with a potentially sparse set of > eventfds to vector mappings. Yes, that's what I can think out to get MSI-X enabled first. The only question is that, when getting kernel to call vfio_msi_enable() with nvec=1, kernel will allocate one interrupt along with enabling MSI-X, which cannot avoid. Therefore, if we set vector 0 for example, irq for vec 0 will be allocated in kernel. And later if vector 0 is unmasked in guest, then enable it as normal; but if vector 0 is always masked in guest, then we leave an allocated irq there (unenabled though) until MSI-X disable. I'm not sure if this is okay, but cannot think out other cleaner way. And I also wonder if it is possible, or vector 0 is always being enabled? This seems very similar to the nr_vectors == 0 > branch of vfio_msix_enable() where it uses a do_use and release call to > accomplish getting MSI-X enabled. They are similar. Use a do_use to setup userspace triggering also makes kernel one allocated irq there. And my understanding is that, the following release function actually won't release if it is a userspace trigger. static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr) { /* * There are still old guests that mask and unmask vectors on every * interrupt. If we're using QEMU bypass with a KVM irqfd, leave all of * the KVM setup in place, simply switch VFIO to use the non-bypass * eventfd. We'll then fire the interrupt through QEMU and the MSI-X * core will mask the interrupt and set pending bits, allowing it to * be re-asserted on unmask. Nothing to do if already using QEMU mode. */ ... } We should consolidate, probably by pulling > this out into a function since it seems cleaner to use the fd = -1 trick than to > setup userspace triggering and immediately release. Thanks, Oh, yes, agree that uses fd=-1 trick is cleaner and we don't need depend on the maskable bit in qemu. According to your suggestion, I will create a function e.g., vfio_enable_msix_no_vec(vdev), which only sets vector 0 with fd=-1 to kernel, and returns the result back. Thanks, Jing > > Alex > > > + > > argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds)); > > > > irq_set = g_malloc0(argsz);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 8c485636445c..43ffacd5b36a 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -375,6 +375,38 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix) int ret = 0, i, argsz; int32_t *fds; + /* + * If dynamic MSI-X allocation is supported, the vectors to be allocated + * and enabled can be scattered. Before kernel enabling MSI-X, setting + * nr_vectors causes all these vectors being allocated on host. + * + * To keep allocation as needed, first setup vector 0 with an invalid + * fd to make MSI-X enabled, then enable vectors by setting all so that + * kernel allocates and enables interrupts only when enabled in guest. + */ + if (msix && !(vdev->msix->irq_info_flags & VFIO_IRQ_INFO_NORESIZE)) { + argsz = sizeof(*irq_set) + sizeof(*fds); + + irq_set = g_malloc0(argsz); + irq_set->argsz = argsz; + irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | + VFIO_IRQ_SET_ACTION_TRIGGER; + irq_set->index = msix ? VFIO_PCI_MSIX_IRQ_INDEX : + VFIO_PCI_MSI_IRQ_INDEX; + irq_set->start = 0; + irq_set->count = 1; + fds = (int32_t *)&irq_set->data; + fds[0] = -1; + + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set); + + g_free(irq_set); + + if (ret) { + return ret; + } + } + argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds)); irq_set = g_malloc0(argsz);
During migration restoring, vfio_enable_vectors() is called to restore enabling MSI-X interrupts for assigned devices. It sets the range from 0 to nr_vectors to kernel to enable MSI-X and the vectors unmasked in guest. During the MSI-X enabling, all the vectors within the range are allocated according to the ioctl(). When dynamic MSI-X allocation is supported, we only want the guest unmasked vectors being allocated and enabled. Therefore, Qemu can first set vector 0 to enable MSI-X and after that, all the vectors can be allocated in need. Signed-off-by: Jing Liu <jing2.liu@intel.com> --- hw/vfio/pci.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+)