Message ID | 20240823-reuse-v15-4-eddcb960e289@daynix.com |
---|---|
State | New |
Headers | show |
Series | hw/pci: SR-IOV related fixes and improvements | expand |
On 8/23/24 07:00, Akihiko Odaki wrote: > The SR-IOV PFs set the multifunction bits during device realization so > check them after that. This forbids adding SR-IOV devices to s390x. > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> May be add : Fixes: 6069bcdeacee ("s390x/pci: Move some hotplug checks to the pre_plug handler") ? Anyhow, Tested-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Thanks, C. > --- > hw/s390x/s390-pci-bus.c | 14 ++++++-------- > 1 file changed, 6 insertions(+), 8 deletions(-) > > diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c > index 3e57d5faca18..00b2c1f6157b 100644 > --- a/hw/s390x/s390-pci-bus.c > +++ b/hw/s390x/s390-pci-bus.c > @@ -971,14 +971,7 @@ static void s390_pcihost_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev, > "this device"); > } > > - if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) { > - PCIDevice *pdev = PCI_DEVICE(dev); > - > - if (pdev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) { > - error_setg(errp, "multifunction not supported in s390"); > - return; > - } > - } else if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) { > + if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) { > S390PCIBusDevice *pbdev = S390_PCI_DEVICE(dev); > > if (!s390_pci_alloc_idx(s, pbdev)) { > @@ -1069,6 +1062,11 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev, > } else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) { > pdev = PCI_DEVICE(dev); > > + if (pdev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) { > + error_setg(errp, "multifunction not supported in s390"); > + return; > + } > + > if (!dev->id) { > /* In the case the PCI device does not define an id */ > /* we generate one based on the PCI address */ >
+Matthew +Eric Side note for the maintainers : Before this change, the igb device, which is multifunction, was working fine under Linux. Was there a fix in Linux since : 57da367b9ec4 ("s390x/pci: forbid multifunction pci device") 6069bcdeacee ("s390x/pci: Move some hotplug checks to the pre_plug handler") ? s390 PCI devices do not have extended capabilities, so the igb device does not expose the SRIOV capability and only the PF is accessible but it doesn't seem to be an issue. (Btw, CONFIG_PCI_IOV is set to y in the default Linux config which is unexpected) Thanks, C. On 8/23/24 07:00, Akihiko Odaki wrote: > The SR-IOV PFs set the multifunction bits during device realization so > check them after that. This forbids adding SR-IOV devices to s390x. > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > --- > hw/s390x/s390-pci-bus.c | 14 ++++++-------- > 1 file changed, 6 insertions(+), 8 deletions(-) > > diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c > index 3e57d5faca18..00b2c1f6157b 100644 > --- a/hw/s390x/s390-pci-bus.c > +++ b/hw/s390x/s390-pci-bus.c > @@ -971,14 +971,7 @@ static void s390_pcihost_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev, > "this device"); > } > > - if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) { > - PCIDevice *pdev = PCI_DEVICE(dev); > - > - if (pdev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) { > - error_setg(errp, "multifunction not supported in s390"); > - return; > - } > - } else if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) { > + if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) { > S390PCIBusDevice *pbdev = S390_PCI_DEVICE(dev); > > if (!s390_pci_alloc_idx(s, pbdev)) { > @@ -1069,6 +1062,11 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev, > } else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) { > pdev = PCI_DEVICE(dev); > > + if (pdev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) { > + error_setg(errp, "multifunction not supported in s390"); > + return; > + } > + > if (!dev->id) { > /* In the case the PCI device does not define an id */ > /* we generate one based on the PCI address */ >
On 2024/09/11 18:38, Cédric Le Goater wrote: > +Matthew +Eric > > Side note for the maintainers : > > Before this change, the igb device, which is multifunction, was working > fine under Linux. > > Was there a fix in Linux since : > > 57da367b9ec4 ("s390x/pci: forbid multifunction pci device") > 6069bcdeacee ("s390x/pci: Move some hotplug checks to the pre_plug > handler") > > ? > > s390 PCI devices do not have extended capabilities, so the igb device > does not expose the SRIOV capability and only the PF is accessible but > it doesn't seem to be an issue. (Btw, CONFIG_PCI_IOV is set to y in the > default Linux config which is unexpected) Doesn't s390x really see extended capabilities? hw/s390x/s390-pci-inst.c has a call pci_config_size() and pci_host_config_write_common(), which means it is exposing the whole PCI Express configuration space. Why can't s390x use extended capabilities then? The best option for fix would be to replace the SR-IOV implementation with stub if s390x cannot use the SR-IOV capability. However I still need to know at what level I should change the implementation (e.g., is it fine to remove the entire capability, or should I keep the capability while writes to its registers no-op?) Regards, Akihiko Odaki > > Thanks, > > C. > > > > On 8/23/24 07:00, Akihiko Odaki wrote: >> The SR-IOV PFs set the multifunction bits during device realization so >> check them after that. This forbids adding SR-IOV devices to s390x. >> >> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> >> --- >> hw/s390x/s390-pci-bus.c | 14 ++++++-------- >> 1 file changed, 6 insertions(+), 8 deletions(-) >> >> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c >> index 3e57d5faca18..00b2c1f6157b 100644 >> --- a/hw/s390x/s390-pci-bus.c >> +++ b/hw/s390x/s390-pci-bus.c >> @@ -971,14 +971,7 @@ static void s390_pcihost_pre_plug(HotplugHandler >> *hotplug_dev, DeviceState *dev, >> "this device"); >> } >> - if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) { >> - PCIDevice *pdev = PCI_DEVICE(dev); >> - >> - if (pdev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) { >> - error_setg(errp, "multifunction not supported in s390"); >> - return; >> - } >> - } else if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) { >> + if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) { >> S390PCIBusDevice *pbdev = S390_PCI_DEVICE(dev); >> if (!s390_pci_alloc_idx(s, pbdev)) { >> @@ -1069,6 +1062,11 @@ static void s390_pcihost_plug(HotplugHandler >> *hotplug_dev, DeviceState *dev, >> } else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) { >> pdev = PCI_DEVICE(dev); >> + if (pdev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) { >> + error_setg(errp, "multifunction not supported in s390"); >> + return; >> + } >> + >> if (!dev->id) { >> /* In the case the PCI device does not define an id */ >> /* we generate one based on the PCI address */ >> >
On Wed, Sep 11, 2024 at 07:58:15PM +0900, Akihiko Odaki wrote: > On 2024/09/11 18:38, Cédric Le Goater wrote: > > +Matthew +Eric > > > > Side note for the maintainers : > > > > Before this change, the igb device, which is multifunction, was working > > fine under Linux. > > > > Was there a fix in Linux since : > > > > 57da367b9ec4 ("s390x/pci: forbid multifunction pci device") > > 6069bcdeacee ("s390x/pci: Move some hotplug checks to the pre_plug > > handler") > > > > ? > > > > s390 PCI devices do not have extended capabilities, so the igb device > > does not expose the SRIOV capability and only the PF is accessible but > > it doesn't seem to be an issue. (Btw, CONFIG_PCI_IOV is set to y in the > > default Linux config which is unexpected) > > Doesn't s390x really see extended capabilities? hw/s390x/s390-pci-inst.c has > a call pci_config_size() and pci_host_config_write_common(), which means it > is exposing the whole PCI Express configuration space. Why can't s390x use > extended capabilities then? > > The best option for fix would be to replace the SR-IOV implementation with > stub if s390x cannot use the SR-IOV capability. However I still need to know > at what level I should change the implementation (e.g., is it fine to remove > the entire capability, or should I keep the capability while writes to its > registers no-op?) > > Regards, > Akihiko Odaki Note changing caps needs compat hacks for cross version migration to work. > > > > Thanks, > > > > C. > > > > > > > > On 8/23/24 07:00, Akihiko Odaki wrote: > > > The SR-IOV PFs set the multifunction bits during device realization so > > > check them after that. This forbids adding SR-IOV devices to s390x. > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > > > --- > > > hw/s390x/s390-pci-bus.c | 14 ++++++-------- > > > 1 file changed, 6 insertions(+), 8 deletions(-) > > > > > > diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c > > > index 3e57d5faca18..00b2c1f6157b 100644 > > > --- a/hw/s390x/s390-pci-bus.c > > > +++ b/hw/s390x/s390-pci-bus.c > > > @@ -971,14 +971,7 @@ static void > > > s390_pcihost_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev, > > > "this device"); > > > } > > > - if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) { > > > - PCIDevice *pdev = PCI_DEVICE(dev); > > > - > > > - if (pdev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) { > > > - error_setg(errp, "multifunction not supported in s390"); > > > - return; > > > - } > > > - } else if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) { > > > + if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) { > > > S390PCIBusDevice *pbdev = S390_PCI_DEVICE(dev); > > > if (!s390_pci_alloc_idx(s, pbdev)) { > > > @@ -1069,6 +1062,11 @@ static void s390_pcihost_plug(HotplugHandler > > > *hotplug_dev, DeviceState *dev, > > > } else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) { > > > pdev = PCI_DEVICE(dev); > > > + if (pdev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) { > > > + error_setg(errp, "multifunction not supported in s390"); > > > + return; > > > + } > > > + > > > if (!dev->id) { > > > /* In the case the PCI device does not define an id */ > > > /* we generate one based on the PCI address */ > > > > >
On 9/11/24 6:58 AM, Akihiko Odaki wrote: > On 2024/09/11 18:38, Cédric Le Goater wrote: >> +Matthew +Eric >> >> Side note for the maintainers : >> >> Before this change, the igb device, which is multifunction, was working >> fine under Linux. >> >> Was there a fix in Linux since : >> >> 57da367b9ec4 ("s390x/pci: forbid multifunction pci device") >> 6069bcdeacee ("s390x/pci: Move some hotplug checks to the pre_plug handler") >> >> ? The timing of those particular commits predates the linux s390 kernel support of multifunction/SR-IOV. At that time it was simply not possible on s390. >> >> s390 PCI devices do not have extended capabilities, so the igb device >> does not expose the SRIOV capability and only the PF is accessible but >> it doesn't seem to be an issue. (Btw, CONFIG_PCI_IOV is set to y in the >> default Linux config which is unexpected) The linux config option makes sense because the s390 kernel now supports SR-IOV/multifunction. > > Doesn't s390x really see extended capabilities? hw/s390x/s390-pci-inst.c has a call pci_config_size() and pci_host_config_write_common(), which means it is exposing the whole PCI Express configuration space. Why can't s390x use extended capabilities then? > So, rather than poking around in config space, s390 (and thus the s390 kernel) has an extra layer of 'capabilities' that it generally relies on to determine device functionality called 'CLP'. Basically, there are pieces of CLP that are not currently generated (or forwarded from the host in the case of passthrough) by QEMU that would be needed by the guest to recognize the SRIOV/multifunction capability of a device, despite what config space has in it. I suspect this is exactly why only the PF was available to your igb device then (missing CLP info made the device appear to not have multifunction capability as far as the s390 guest is concerned - fwiw adding CLP emulation to enable that is on our todo list). Sounds like the short-term solution here would be to continue allowing the PF without multifunction being visible to the guest (so as to not regress prior functionality) and then aim for proper support after with the necessary CLP pieces. Thanks, Matt
On 2024/09/11 22:53, Matthew Rosato wrote: > On 9/11/24 6:58 AM, Akihiko Odaki wrote: >> On 2024/09/11 18:38, Cédric Le Goater wrote: >>> +Matthew +Eric >>> >>> Side note for the maintainers : >>> >>> Before this change, the igb device, which is multifunction, was working >>> fine under Linux. >>> >>> Was there a fix in Linux since : >>> >>> 57da367b9ec4 ("s390x/pci: forbid multifunction pci device") >>> 6069bcdeacee ("s390x/pci: Move some hotplug checks to the pre_plug handler") >>> >>> ? > The timing of those particular commits predates the linux s390 kernel support of multifunction/SR-IOV. At that time it was simply not possible on s390. Is it OK to remove this check of multifunction now? This code is not working properly with SR-IOV and misleading. It is better to remove the code if it does no good. It would be nice if anyone confirms multifunction works for s390x with the code removed. > >>> >>> s390 PCI devices do not have extended capabilities, so the igb device >>> does not expose the SRIOV capability and only the PF is accessible but >>> it doesn't seem to be an issue. (Btw, CONFIG_PCI_IOV is set to y in the >>> default Linux config which is unexpected) > > The linux config option makes sense because the s390 kernel now supports SR-IOV/multifunction. > >> >> Doesn't s390x really see extended capabilities? hw/s390x/s390-pci-inst.c has a call pci_config_size() and pci_host_config_write_common(), which means it is exposing the whole PCI Express configuration space. Why can't s390x use extended capabilities then? >> > > So, rather than poking around in config space, s390 (and thus the s390 kernel) has an extra layer of 'capabilities' that it generally relies on to determine device functionality called 'CLP'. Basically, there are pieces of CLP that are not currently generated (or forwarded from the host in the case of passthrough) by QEMU that would be needed by the guest to recognize the SRIOV/multifunction capability of a device, despite what config space has in it. I suspect this is exactly why only the PF was available to your igb device then (missing CLP info made the device appear to not have multifunction capability as far as the s390 guest is concerned - fwiw adding CLP emulation to enable that is on our todo list). What is expected to happen if you poke the configuration space anyway? I also wonder if there is some public documentation of CLP and relevant aspect of PCI support in s390x. > > Sounds like the short-term solution here would be to continue allowing the PF without multifunction being visible to the guest (so as to not regress prior functionality) and then aim for proper support after with the necessary CLP pieces. I agree; we can keep the PF working. Regards, Akihiko Odaki
On 9/11/24 11:15 AM, Akihiko Odaki wrote: > On 2024/09/11 22:53, Matthew Rosato wrote: >> On 9/11/24 6:58 AM, Akihiko Odaki wrote: >>> On 2024/09/11 18:38, Cédric Le Goater wrote: >>>> +Matthew +Eric >>>> >>>> Side note for the maintainers : >>>> >>>> Before this change, the igb device, which is multifunction, was working >>>> fine under Linux. >>>> >>>> Was there a fix in Linux since : >>>> >>>> 57da367b9ec4 ("s390x/pci: forbid multifunction pci device") >>>> 6069bcdeacee ("s390x/pci: Move some hotplug checks to the pre_plug handler") >>>> >>>> ? >> The timing of those particular commits predates the linux s390 kernel support of multifunction/SR-IOV. At that time it was simply not possible on s390. > > Is it OK to remove this check of multifunction now? Yes, I think removing the check (which AFAIU was broken since 6069bcdeacee) will get us back to the behavior Cedric was seeing, where a device that reports multifunction in the config space is still allowed through but only the PF will be usable in the guest. > > This code is not working properly with SR-IOV and misleading. It is better to remove the code if it does no good. > > It would be nice if anyone confirms multifunction works for s390x with the code removed. Even if you remove the check, multifunction itself won't work in the s390x guest without these missing CLP pieces too. When I have some time I'll hack something together to fabricate some CLP data and try it out, but it sounds like Cedric could use his setup to right now at least verify that the PF itself should remain usable in the guest (current behavior). > >> >>>> >>>> s390 PCI devices do not have extended capabilities, so the igb device >>>> does not expose the SRIOV capability and only the PF is accessible but >>>> it doesn't seem to be an issue. (Btw, CONFIG_PCI_IOV is set to y in the >>>> default Linux config which is unexpected) >> >> The linux config option makes sense because the s390 kernel now supports SR-IOV/multifunction. >> >>> >>> Doesn't s390x really see extended capabilities? hw/s390x/s390-pci-inst.c has a call pci_config_size() and pci_host_config_write_common(), which means it is exposing the whole PCI Express configuration space. Why can't s390x use extended capabilities then? >>> >> >> So, rather than poking around in config space, s390 (and thus the s390 kernel) has an extra layer of 'capabilities' that it generally relies on to determine device functionality called 'CLP'. Basically, there are pieces of CLP that are not currently generated (or forwarded from the host in the case of passthrough) by QEMU that would be needed by the guest to recognize the SRIOV/multifunction capability of a device, despite what config space has in it. I suspect this is exactly why only the PF was available to your igb device then (missing CLP info made the device appear to not have multifunction capability as far as the s390 guest is concerned - fwiw adding CLP emulation to enable that is on our todo list). > > What is expected to happen if you poke the configuration space anyway? I also wonder if there is some public documentation of CLP and relevant aspect of PCI support in s390x. On s390, the contents of config space might be what is directly reported by the device or may have been modified by a firmware layer in between the device and the LPAR (logical partition that is the closest thing to bare-metal on s390 where you could run a linux host). It's not that config space can't be read on s390 or anything like that, rather that the base s390 PCI kernel layer makes its decisions about multifunction based specifically on what is reported via CLP. If CLP doesn't report that the device is multifunction capable (which it never does for a s390x QEMU guest today) then it is treated as not having multifunction support. Unfortunately, the CLP details are not publicly documented. > >> >> Sounds like the short-term solution here would be to continue allowing the PF without multifunction being visible to the guest (so as to not regress prior functionality) and then aim for proper support after with the necessary CLP pieces. > > I agree; we can keep the PF working. Thanks! Matt
On 2024/09/12 6:11, Matthew Rosato wrote: > On 9/11/24 11:15 AM, Akihiko Odaki wrote: >> On 2024/09/11 22:53, Matthew Rosato wrote: >>> On 9/11/24 6:58 AM, Akihiko Odaki wrote: >>>> On 2024/09/11 18:38, Cédric Le Goater wrote: >>>>> +Matthew +Eric >>>>> >>>>> Side note for the maintainers : >>>>> >>>>> Before this change, the igb device, which is multifunction, was working >>>>> fine under Linux. >>>>> >>>>> Was there a fix in Linux since : >>>>> >>>>> 57da367b9ec4 ("s390x/pci: forbid multifunction pci device") >>>>> 6069bcdeacee ("s390x/pci: Move some hotplug checks to the pre_plug handler") >>>>> >>>>> ? >>> The timing of those particular commits predates the linux s390 kernel support of multifunction/SR-IOV. At that time it was simply not possible on s390. >> >> Is it OK to remove this check of multifunction now? > > Yes, I think removing the check (which AFAIU was broken since 6069bcdeacee) will get us back to the behavior Cedric was seeing, where a device that reports multifunction in the config space is still allowed through but only the PF will be usable in the guest. Commit 6069bcdeacee predates the introduction of SR-IOV (commit 7c0fa8dff811 ["pcie: Add support for Single Root I/O Virtualization (SR/IOV)"]) so it did not break anything, strictly speaking. Ideally, we should have taken care of the check when introducing SR-IOV. > >> >> This code is not working properly with SR-IOV and misleading. It is better to remove the code if it does no good. >> >> It would be nice if anyone confirms multifunction works for s390x with the code removed. > > Even if you remove the check, multifunction itself won't work in the s390x guest without these missing CLP pieces too. When I have some time I'll hack something together to fabricate some CLP data and try it out, but it sounds like Cedric could use his setup to right now at least verify that the PF itself should remain usable in the guest (current behavior). Well, it seems better to keep the check for non-SR-IOV multifunction devices while not enforcing the restriction for SR-IOV. Multifunction devices without SR-IOV are created with explicit requests by specifying multifunction=on for PCI devices. Such requests cannot be fulfilled without multifunction CLP so we should reject them. The situation is different for SR-IOV. SR-IOV is a feature inherent to specific type of devices and gets automatically enabled for these devices. It may make more sense to ignore just the SR-IOV part of such devices and keep the other functionalities working. The current code implements such a behavior, but it is accidental and semantically wrong. I will correct the code to explicitly allow multifunction for SR-IOV. Regards, Akihiko Odaki
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c index 3e57d5faca18..00b2c1f6157b 100644 --- a/hw/s390x/s390-pci-bus.c +++ b/hw/s390x/s390-pci-bus.c @@ -971,14 +971,7 @@ static void s390_pcihost_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev, "this device"); } - if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) { - PCIDevice *pdev = PCI_DEVICE(dev); - - if (pdev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) { - error_setg(errp, "multifunction not supported in s390"); - return; - } - } else if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) { + if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) { S390PCIBusDevice *pbdev = S390_PCI_DEVICE(dev); if (!s390_pci_alloc_idx(s, pbdev)) { @@ -1069,6 +1062,11 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev, } else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) { pdev = PCI_DEVICE(dev); + if (pdev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) { + error_setg(errp, "multifunction not supported in s390"); + return; + } + if (!dev->id) { /* In the case the PCI device does not define an id */ /* we generate one based on the PCI address */
The SR-IOV PFs set the multifunction bits during device realization so check them after that. This forbids adding SR-IOV devices to s390x. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> --- hw/s390x/s390-pci-bus.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)