diff mbox series

[v6,5/5] hw/pci: ensure PCIE devices are plugged into only slot 0 of PCIE port

Message ID 20230629040707.115656-6-anisinha@redhat.com
State New
Headers show
Series test and QEMU fixes to ensure proper PCIE device usage | expand

Commit Message

Ani Sinha June 29, 2023, 4:07 a.m. UTC
PCI Express ports only have one slot, so PCI Express devices can only be
plugged into slot 0 on a PCIE port. Enforce it.

The change has been tested to not break ARI by instantiating seven vfs on an
emulated igb device (the maximum number of vfs the linux igb driver supports).
The vfs are seen to have non-zero device/slot numbers in the conventional
PCI BDF representation.

CC: jusual@redhat.com
CC: imammedo@redhat.com
CC: akihiko.odaki@daynix.com

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Reviewed-by: Julia Suvorova <jusual@redhat.com>
---
 hw/pci/pci.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

Akihiko Odaki June 29, 2023, 6:47 a.m. UTC | #1
On 2023/06/29 13:07, Ani Sinha wrote:
> PCI Express ports only have one slot, so PCI Express devices can only be
> plugged into slot 0 on a PCIE port. Enforce it.
> 
> The change has been tested to not break ARI by instantiating seven vfs on an
> emulated igb device (the maximum number of vfs the linux igb driver supports).
> The vfs are seen to have non-zero device/slot numbers in the conventional
> PCI BDF representation.
> 
> CC: jusual@redhat.com
> CC: imammedo@redhat.com
> CC: akihiko.odaki@daynix.com
> 
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> Reviewed-by: Julia Suvorova <jusual@redhat.com>
> ---
>   hw/pci/pci.c | 15 +++++++++++++++
>   1 file changed, 15 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index e2eb4c3b4a..0320ac2bb3 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -65,6 +65,7 @@ bool pci_available = true;
>   static char *pcibus_get_dev_path(DeviceState *dev);
>   static char *pcibus_get_fw_dev_path(DeviceState *dev);
>   static void pcibus_reset(BusState *qbus);
> +static bool pcie_has_upstream_port(PCIDevice *dev);
>   
>   static Property pci_props[] = {
>       DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
> @@ -1190,6 +1191,20 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev,
>                      name);
>   
>          return NULL;
> +    } /*
> +       * With SRIOV and ARI, vfs can have non-zero slot in the conventional
> +       * PCI interpretation as all five bits reserved for slot addresses are
> +       * also used for function bits for the various vfs. Ignore that case.
> +       * It is too early here to check for ARI capabilities in the PCI config
> +       * space. Hence, we check for a vf device instead.
> +       */

Why don't just perform this check after the capabilities are set?

Regards,
Akihiko Odaki

> +    else if (!pci_is_vf(pci_dev) &&
> +             pcie_has_upstream_port(pci_dev) &&
> +             PCI_SLOT(devfn)) {
> +        error_setg(errp, "PCI: slot %d is not valid for %s,"
> +                   " parent device only allows plugging into slot 0.",
> +                   PCI_SLOT(devfn), name);
> +        return NULL;
>       }
>   
>       pci_dev->devfn = devfn;
Ani Sinha June 29, 2023, 8:05 a.m. UTC | #2
On Thu, 29 Jun, 2023, 12:17 pm Akihiko Odaki, <akihiko.odaki@daynix.com>
wrote:

> On 2023/06/29 13:07, Ani Sinha wrote:
> > PCI Express ports only have one slot, so PCI Express devices can only be
> > plugged into slot 0 on a PCIE port. Enforce it.
> >
> > The change has been tested to not break ARI by instantiating seven vfs
> on an
> > emulated igb device (the maximum number of vfs the linux igb driver
> supports).
> > The vfs are seen to have non-zero device/slot numbers in the conventional
> > PCI BDF representation.
> >
> > CC: jusual@redhat.com
> > CC: imammedo@redhat.com
> > CC: akihiko.odaki@daynix.com
> >
> > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
> > Signed-off-by: Ani Sinha <anisinha@redhat.com>
> > Reviewed-by: Julia Suvorova <jusual@redhat.com>
> > ---
> >   hw/pci/pci.c | 15 +++++++++++++++
> >   1 file changed, 15 insertions(+)
> >
> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > index e2eb4c3b4a..0320ac2bb3 100644
> > --- a/hw/pci/pci.c
> > +++ b/hw/pci/pci.c
> > @@ -65,6 +65,7 @@ bool pci_available = true;
> >   static char *pcibus_get_dev_path(DeviceState *dev);
> >   static char *pcibus_get_fw_dev_path(DeviceState *dev);
> >   static void pcibus_reset(BusState *qbus);
> > +static bool pcie_has_upstream_port(PCIDevice *dev);
> >
> >   static Property pci_props[] = {
> >       DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
> > @@ -1190,6 +1191,20 @@ static PCIDevice
> *do_pci_register_device(PCIDevice *pci_dev,
> >                      name);
> >
> >          return NULL;
> > +    } /*
> > +       * With SRIOV and ARI, vfs can have non-zero slot in the
> conventional
> > +       * PCI interpretation as all five bits reserved for slot
> addresses are
> > +       * also used for function bits for the various vfs. Ignore that
> case.
> > +       * It is too early here to check for ARI capabilities in the PCI
> config
> > +       * space. Hence, we check for a vf device instead.
> > +       */
>
> Why don't just perform this check after the capabilities are set?
>

We don't want to allocate resources for wrong device parameters. We want to
error out early. Other checks also are performed at the same place .
Show quoted text

>
>
>
> Regards,
> Akihiko Odaki
>
> > +    else if (!pci_is_vf(pci_dev) &&
> > +             pcie_has_upstream_port(pci_dev) &&
> > +             PCI_SLOT(devfn)) {
> > +        error_setg(errp, "PCI: slot %d is not valid for %s,"
> > +                   " parent device only allows plugging into slot 0.",
> > +                   PCI_SLOT(devfn), name);
> > +        return NULL;
> >       }
> >
> >       pci_dev->devfn = devfn;
>
>
Akihiko Odaki June 29, 2023, 8:49 a.m. UTC | #3
On 2023/06/29 17:05, Ani Sinha wrote:
> 
> 
> On Thu, 29 Jun, 2023, 12:17 pm Akihiko Odaki, <akihiko.odaki@daynix.com 
> <mailto:akihiko.odaki@daynix.com>> wrote:
> 
>     On 2023/06/29 13:07, Ani Sinha wrote:
>      > PCI Express ports only have one slot, so PCI Express devices can
>     only be
>      > plugged into slot 0 on a PCIE port. Enforce it.
>      >
>      > The change has been tested to not break ARI by instantiating
>     seven vfs on an
>      > emulated igb device (the maximum number of vfs the linux igb
>     driver supports).
>      > The vfs are seen to have non-zero device/slot numbers in the
>     conventional
>      > PCI BDF representation.
>      >
>      > CC: jusual@redhat.com <mailto:jusual@redhat.com>
>      > CC: imammedo@redhat.com <mailto:imammedo@redhat.com>
>      > CC: akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>
>      >
>      > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
>     <https://bugzilla.redhat.com/show_bug.cgi?id=2128929>
>      > Signed-off-by: Ani Sinha <anisinha@redhat.com
>     <mailto:anisinha@redhat.com>>
>      > Reviewed-by: Julia Suvorova <jusual@redhat.com
>     <mailto:jusual@redhat.com>>
>      > ---
>      >   hw/pci/pci.c | 15 +++++++++++++++
>      >   1 file changed, 15 insertions(+)
>      >
>      > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>      > index e2eb4c3b4a..0320ac2bb3 100644
>      > --- a/hw/pci/pci.c
>      > +++ b/hw/pci/pci.c
>      > @@ -65,6 +65,7 @@ bool pci_available = true;
>      >   static char *pcibus_get_dev_path(DeviceState *dev);
>      >   static char *pcibus_get_fw_dev_path(DeviceState *dev);
>      >   static void pcibus_reset(BusState *qbus);
>      > +static bool pcie_has_upstream_port(PCIDevice *dev);
>      >
>      >   static Property pci_props[] = {
>      >       DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>      > @@ -1190,6 +1191,20 @@ static PCIDevice
>     *do_pci_register_device(PCIDevice *pci_dev,
>      >                      name);
>      >
>      >          return NULL;
>      > +    } /*
>      > +       * With SRIOV and ARI, vfs can have non-zero slot in the
>     conventional
>      > +       * PCI interpretation as all five bits reserved for slot
>     addresses are
>      > +       * also used for function bits for the various vfs. Ignore
>     that case.
>      > +       * It is too early here to check for ARI capabilities in
>     the PCI config
>      > +       * space. Hence, we check for a vf device instead.
>      > +       */
> 
>     Why don't just perform this check after the capabilities are set?
> 
> 
> We don't want to allocate resources for wrong device parameters. We want 
> to error out early. Other checks also are performed at the same place .

It is indeed better to raise an error as early as possible so that we 
can avoid allocation and other operations that will be reverted and may 
go wrong due to the invalid condition. That should be the reason why 
other checks for the address are performed here.

However, in this particular case, we cannot confidently perform the 
check here because it is unknown if the ARI capability will be 
advertised until the device realization code runs. This can justify 
delaying the check after the device realization, unlike the other checks.

> Show quoted text
> 
> 
> 
> 
>     Regards,
>     Akihiko Odaki
> 
>      > +    else if (!pci_is_vf(pci_dev) &&
>      > +             pcie_has_upstream_port(pci_dev) &&
>      > +             PCI_SLOT(devfn)) {
>      > +        error_setg(errp, "PCI: slot %d is not valid for %s,"
>      > +                   " parent device only allows plugging into
>     slot 0.",
>      > +                   PCI_SLOT(devfn), name);
>      > +        return NULL;
>      >       }
>      >
>      >       pci_dev->devfn = devfn;
>
Ani Sinha June 29, 2023, 2:18 p.m. UTC | #4
> On 29-Jun-2023, at 2:19 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> 
> On 2023/06/29 17:05, Ani Sinha wrote:
>> On Thu, 29 Jun, 2023, 12:17 pm Akihiko Odaki, <akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>> wrote:
>>    On 2023/06/29 13:07, Ani Sinha wrote:
>>     > PCI Express ports only have one slot, so PCI Express devices can
>>    only be
>>     > plugged into slot 0 on a PCIE port. Enforce it.
>>     >
>>     > The change has been tested to not break ARI by instantiating
>>    seven vfs on an
>>     > emulated igb device (the maximum number of vfs the linux igb
>>    driver supports).
>>     > The vfs are seen to have non-zero device/slot numbers in the
>>    conventional
>>     > PCI BDF representation.
>>     >
>>     > CC: jusual@redhat.com <mailto:jusual@redhat.com>
>>     > CC: imammedo@redhat.com <mailto:imammedo@redhat.com>
>>     > CC: akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>
>>     >
>>     > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
>>    <https://bugzilla.redhat.com/show_bug.cgi?id=2128929>
>>     > Signed-off-by: Ani Sinha <anisinha@redhat.com
>>    <mailto:anisinha@redhat.com>>
>>     > Reviewed-by: Julia Suvorova <jusual@redhat.com
>>    <mailto:jusual@redhat.com>>
>>     > ---
>>     >   hw/pci/pci.c | 15 +++++++++++++++
>>     >   1 file changed, 15 insertions(+)
>>     >
>>     > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>     > index e2eb4c3b4a..0320ac2bb3 100644
>>     > --- a/hw/pci/pci.c
>>     > +++ b/hw/pci/pci.c
>>     > @@ -65,6 +65,7 @@ bool pci_available = true;
>>     >   static char *pcibus_get_dev_path(DeviceState *dev);
>>     >   static char *pcibus_get_fw_dev_path(DeviceState *dev);
>>     >   static void pcibus_reset(BusState *qbus);
>>     > +static bool pcie_has_upstream_port(PCIDevice *dev);
>>     >
>>     >   static Property pci_props[] = {
>>     >       DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>>     > @@ -1190,6 +1191,20 @@ static PCIDevice
>>    *do_pci_register_device(PCIDevice *pci_dev,
>>     >                      name);
>>     >
>>     >          return NULL;
>>     > +    } /*
>>     > +       * With SRIOV and ARI, vfs can have non-zero slot in the
>>    conventional
>>     > +       * PCI interpretation as all five bits reserved for slot
>>    addresses are
>>     > +       * also used for function bits for the various vfs. Ignore
>>    that case.
>>     > +       * It is too early here to check for ARI capabilities in
>>    the PCI config
>>     > +       * space. Hence, we check for a vf device instead.
>>     > +       */
>>    Why don't just perform this check after the capabilities are set?
>> We don't want to allocate resources for wrong device parameters. We want to error out early. Other checks also are performed at the same place .
> 
> It is indeed better to raise an error as early as possible so that we can avoid allocation and other operations that will be reverted and may go wrong due to the invalid condition. That should be the reason why other checks for the address are performed here.
> 
> However, in this particular case, we cannot confidently perform the check here because it is unknown if the ARI capability will be advertised until the device realization code runs. This can justify delaying the check after the device realization, unlike the other checks.

Ok so are you proposing that the check we have right before (the check for unoccupied function 0) be also moved? It also uses the same vf approximation for seemingly to support ARI.
Also where do you propose we move the check?

> 
>> Show quoted text
>>    Regards,
>>    Akihiko Odaki
>>     > +    else if (!pci_is_vf(pci_dev) &&
>>     > +             pcie_has_upstream_port(pci_dev) &&
>>     > +             PCI_SLOT(devfn)) {
>>     > +        error_setg(errp, "PCI: slot %d is not valid for %s,"
>>     > +                   " parent device only allows plugging into
>>    slot 0.",
>>     > +                   PCI_SLOT(devfn), name);
>>     > +        return NULL;
>>     >       }
>>     >
>>     >       pci_dev->devfn = devfn;
Michael S. Tsirkin June 29, 2023, 2:24 p.m. UTC | #5
On Thu, Jun 29, 2023 at 09:37:07AM +0530, Ani Sinha wrote:
> PCI Express ports only have one slot, so PCI Express devices can only be
> plugged into slot 0 on a PCIE port. Enforce it.
> 
> The change has been tested to not break ARI by instantiating seven vfs on an
> emulated igb device (the maximum number of vfs the linux igb driver supports).

I guess we need to test with some other device then? 7 VFs is same
slot so hardly a good test.

> The vfs are seen to have non-zero device/slot numbers in the conventional
> PCI BDF representation.
> 
> CC: jusual@redhat.com
> CC: imammedo@redhat.com
> CC: akihiko.odaki@daynix.com
> 
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> Reviewed-by: Julia Suvorova <jusual@redhat.com>
> ---
>  hw/pci/pci.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index e2eb4c3b4a..0320ac2bb3 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -65,6 +65,7 @@ bool pci_available = true;
>  static char *pcibus_get_dev_path(DeviceState *dev);
>  static char *pcibus_get_fw_dev_path(DeviceState *dev);
>  static void pcibus_reset(BusState *qbus);
> +static bool pcie_has_upstream_port(PCIDevice *dev);
>  
>  static Property pci_props[] = {
>      DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
> @@ -1190,6 +1191,20 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev,
>                     name);
>  
>         return NULL;
> +    } /*
> +       * With SRIOV and ARI, vfs can have non-zero slot in the conventional
> +       * PCI interpretation as all five bits reserved for slot addresses are
> +       * also used for function bits for the various vfs. Ignore that case.
> +       * It is too early here to check for ARI capabilities in the PCI config
> +       * space. Hence, we check for a vf device instead.
> +       */
> +    else if (!pci_is_vf(pci_dev) &&
> +             pcie_has_upstream_port(pci_dev) &&
> +             PCI_SLOT(devfn)) {
> +        error_setg(errp, "PCI: slot %d is not valid for %s,"
> +                   " parent device only allows plugging into slot 0.",
> +                   PCI_SLOT(devfn), name);
> +        return NULL;
>      }
>  
>      pci_dev->devfn = devfn;
> -- 
> 2.39.1
Ani Sinha June 29, 2023, 2:37 p.m. UTC | #6
> On 29-Jun-2023, at 7:54 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Thu, Jun 29, 2023 at 09:37:07AM +0530, Ani Sinha wrote:
>> PCI Express ports only have one slot, so PCI Express devices can only be
>> plugged into slot 0 on a PCIE port. Enforce it.
>> 
>> The change has been tested to not break ARI by instantiating seven vfs on an
>> emulated igb device (the maximum number of vfs the linux igb driver supports).
> 
> I guess we need to test with some other device then? 7 VFs is same
> slot so hardly a good test.

No its not the same slot. Its using different slots/device numbers. I checked that.
The same patch was failing without the vf check.

> 
>> The vfs are seen to have non-zero device/slot numbers in the conventional
>> PCI BDF representation.
>> 
>> CC: jusual@redhat.com
>> CC: imammedo@redhat.com
>> CC: akihiko.odaki@daynix.com
>> 
>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>> Reviewed-by: Julia Suvorova <jusual@redhat.com>
>> ---
>> hw/pci/pci.c | 15 +++++++++++++++
>> 1 file changed, 15 insertions(+)
>> 
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index e2eb4c3b4a..0320ac2bb3 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -65,6 +65,7 @@ bool pci_available = true;
>> static char *pcibus_get_dev_path(DeviceState *dev);
>> static char *pcibus_get_fw_dev_path(DeviceState *dev);
>> static void pcibus_reset(BusState *qbus);
>> +static bool pcie_has_upstream_port(PCIDevice *dev);
>> 
>> static Property pci_props[] = {
>>     DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>> @@ -1190,6 +1191,20 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev,
>>                    name);
>> 
>>        return NULL;
>> +    } /*
>> +       * With SRIOV and ARI, vfs can have non-zero slot in the conventional
>> +       * PCI interpretation as all five bits reserved for slot addresses are
>> +       * also used for function bits for the various vfs. Ignore that case.
>> +       * It is too early here to check for ARI capabilities in the PCI config
>> +       * space. Hence, we check for a vf device instead.
>> +       */
>> +    else if (!pci_is_vf(pci_dev) &&
>> +             pcie_has_upstream_port(pci_dev) &&
>> +             PCI_SLOT(devfn)) {
>> +        error_setg(errp, "PCI: slot %d is not valid for %s,"
>> +                   " parent device only allows plugging into slot 0.",
>> +                   PCI_SLOT(devfn), name);
>> +        return NULL;
>>     }
>> 
>>     pci_dev->devfn = devfn;
>> -- 
>> 2.39.1
>
Michael S. Tsirkin June 29, 2023, 3:32 p.m. UTC | #7
On Thu, Jun 29, 2023 at 08:07:57PM +0530, Ani Sinha wrote:
> 
> 
> > On 29-Jun-2023, at 7:54 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > On Thu, Jun 29, 2023 at 09:37:07AM +0530, Ani Sinha wrote:
> >> PCI Express ports only have one slot, so PCI Express devices can only be
> >> plugged into slot 0 on a PCIE port. Enforce it.
> >> 
> >> The change has been tested to not break ARI by instantiating seven vfs on an
> >> emulated igb device (the maximum number of vfs the linux igb driver supports).
> > 
> > I guess we need to test with some other device then? 7 VFs is same
> > slot so hardly a good test.
> 
> No its not the same slot. Its using different slots/device numbers. I checked that.
> The same patch was failing without the vf check.

Ah, playing with VF stride? Could you show the command line please?

> > 
> >> The vfs are seen to have non-zero device/slot numbers in the conventional
> >> PCI BDF representation.
> >> 
> >> CC: jusual@redhat.com
> >> CC: imammedo@redhat.com
> >> CC: akihiko.odaki@daynix.com
> >> 
> >> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
> >> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> >> Reviewed-by: Julia Suvorova <jusual@redhat.com>
> >> ---
> >> hw/pci/pci.c | 15 +++++++++++++++
> >> 1 file changed, 15 insertions(+)
> >> 
> >> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >> index e2eb4c3b4a..0320ac2bb3 100644
> >> --- a/hw/pci/pci.c
> >> +++ b/hw/pci/pci.c
> >> @@ -65,6 +65,7 @@ bool pci_available = true;
> >> static char *pcibus_get_dev_path(DeviceState *dev);
> >> static char *pcibus_get_fw_dev_path(DeviceState *dev);
> >> static void pcibus_reset(BusState *qbus);
> >> +static bool pcie_has_upstream_port(PCIDevice *dev);
> >> 
> >> static Property pci_props[] = {
> >>     DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
> >> @@ -1190,6 +1191,20 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev,
> >>                    name);
> >> 
> >>        return NULL;
> >> +    } /*
> >> +       * With SRIOV and ARI, vfs can have non-zero slot in the conventional
> >> +       * PCI interpretation as all five bits reserved for slot addresses are
> >> +       * also used for function bits for the various vfs. Ignore that case.
> >> +       * It is too early here to check for ARI capabilities in the PCI config
> >> +       * space. Hence, we check for a vf device instead.
> >> +       */
> >> +    else if (!pci_is_vf(pci_dev) &&
> >> +             pcie_has_upstream_port(pci_dev) &&
> >> +             PCI_SLOT(devfn)) {
> >> +        error_setg(errp, "PCI: slot %d is not valid for %s,"
> >> +                   " parent device only allows plugging into slot 0.",
> >> +                   PCI_SLOT(devfn), name);
> >> +        return NULL;
> >>     }
> >> 
> >>     pci_dev->devfn = devfn;
> >> -- 
> >> 2.39.1
> >
Ani Sinha June 29, 2023, 3:57 p.m. UTC | #8
> On 29-Jun-2023, at 9:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Thu, Jun 29, 2023 at 08:07:57PM +0530, Ani Sinha wrote:
>> 
>> 
>>> On 29-Jun-2023, at 7:54 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> 
>>> On Thu, Jun 29, 2023 at 09:37:07AM +0530, Ani Sinha wrote:
>>>> PCI Express ports only have one slot, so PCI Express devices can only be
>>>> plugged into slot 0 on a PCIE port. Enforce it.
>>>> 
>>>> The change has been tested to not break ARI by instantiating seven vfs on an
>>>> emulated igb device (the maximum number of vfs the linux igb driver supports).
>>> 
>>> I guess we need to test with some other device then? 7 VFs is same
>>> slot so hardly a good test.
>> 
>> No its not the same slot. Its using different slots/device numbers. I checked that.
>> The same patch was failing without the vf check.
> 
> Ah, playing with VF stride? Could you show the command line please?

Akhido mentioned this in the other thread. Basically For QEMU:

-device pcie-root-port,id=p -device igb,bus=p

Then from within the guest (in my case RHEL 9.2):

$ echo 7 > /sys/bus/pci/devices/0000\:01\:00.0/sriov_numvfs

You’ll find that if you use something more than 7 there will be ERANGE from the guest kernel because the driver can create maximum 7 vfs.
This above command line will fail if we do not check for !vfs in the patch with the following error from QEMU:

(qemu) qemu-system-x86_64: PCI: slot 16 is not valid for igbvf, parent device only allows plugging into slot 0.

and an IO error on the write from the guest kernel.

In the current version of the patch with the vf check, you will find the vfs created with the addresses:

01:10.{2,4,6,8} and 01.11.{2,4,6} , that is bus 1 for the root port, devices 10 and 11, functions 2,4,6,8 etc.

There would be no error from QEMU.

> 
>>> 
>>>> The vfs are seen to have non-zero device/slot numbers in the conventional
>>>> PCI BDF representation.
>>>> 
>>>> CC: jusual@redhat.com
>>>> CC: imammedo@redhat.com
>>>> CC: akihiko.odaki@daynix.com
>>>> 
>>>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>> Reviewed-by: Julia Suvorova <jusual@redhat.com>
>>>> ---
>>>> hw/pci/pci.c | 15 +++++++++++++++
>>>> 1 file changed, 15 insertions(+)
>>>> 
>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>> index e2eb4c3b4a..0320ac2bb3 100644
>>>> --- a/hw/pci/pci.c
>>>> +++ b/hw/pci/pci.c
>>>> @@ -65,6 +65,7 @@ bool pci_available = true;
>>>> static char *pcibus_get_dev_path(DeviceState *dev);
>>>> static char *pcibus_get_fw_dev_path(DeviceState *dev);
>>>> static void pcibus_reset(BusState *qbus);
>>>> +static bool pcie_has_upstream_port(PCIDevice *dev);
>>>> 
>>>> static Property pci_props[] = {
>>>>    DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>>>> @@ -1190,6 +1191,20 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev,
>>>>                   name);
>>>> 
>>>>       return NULL;
>>>> +    } /*
>>>> +       * With SRIOV and ARI, vfs can have non-zero slot in the conventional
>>>> +       * PCI interpretation as all five bits reserved for slot addresses are
>>>> +       * also used for function bits for the various vfs. Ignore that case.
>>>> +       * It is too early here to check for ARI capabilities in the PCI config
>>>> +       * space. Hence, we check for a vf device instead.
>>>> +       */
>>>> +    else if (!pci_is_vf(pci_dev) &&
>>>> +             pcie_has_upstream_port(pci_dev) &&
>>>> +             PCI_SLOT(devfn)) {
>>>> +        error_setg(errp, "PCI: slot %d is not valid for %s,"
>>>> +                   " parent device only allows plugging into slot 0.",
>>>> +                   PCI_SLOT(devfn), name);
>>>> +        return NULL;
>>>>    }
>>>> 
>>>>    pci_dev->devfn = devfn;
>>>> -- 
>>>> 2.39.1
Ani Sinha June 29, 2023, 4:45 p.m. UTC | #9
> On 29-Jun-2023, at 9:27 PM, Ani Sinha <anisinha@redhat.com> wrote:
> 
> 
> 
>> On 29-Jun-2023, at 9:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> 
>> On Thu, Jun 29, 2023 at 08:07:57PM +0530, Ani Sinha wrote:
>>> 
>>> 
>>>> On 29-Jun-2023, at 7:54 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>> 
>>>> On Thu, Jun 29, 2023 at 09:37:07AM +0530, Ani Sinha wrote:
>>>>> PCI Express ports only have one slot, so PCI Express devices can only be
>>>>> plugged into slot 0 on a PCIE port. Enforce it.
>>>>> 
>>>>> The change has been tested to not break ARI by instantiating seven vfs on an
>>>>> emulated igb device (the maximum number of vfs the linux igb driver supports).
>>>> 
>>>> I guess we need to test with some other device then? 7 VFs is same
>>>> slot so hardly a good test.
>>> 
>>> No its not the same slot. Its using different slots/device numbers. I checked that.
>>> The same patch was failing without the vf check.
>> 
>> Ah, playing with VF stride?

Indeed. You’ll see IGB_VF_STRIDE is 2. pcie_sriov_pf_init() uses this to initialise the PCIE config space attributes. register_vfs() uses this to increment the devfn values :-) 


>> Could you show the command line please?
> 
> Akhido mentioned this in the other thread. Basically For QEMU:
> 
> -device pcie-root-port,id=p -device igb,bus=p
> 
> Then from within the guest (in my case RHEL 9.2):
> 
> $ echo 7 > /sys/bus/pci/devices/0000\:01\:00.0/sriov_numvfs
> 
> You’ll find that if you use something more than 7 there will be ERANGE from the guest kernel because the driver can create maximum 7 vfs.
> This above command line will fail if we do not check for !vfs in the patch with the following error from QEMU:
> 
> (qemu) qemu-system-x86_64: PCI: slot 16 is not valid for igbvf, parent device only allows plugging into slot 0.
> 
> and an IO error on the write from the guest kernel.
> 
> In the current version of the patch with the vf check, you will find the vfs created with the addresses:
> 
> 01:10.{2,4,6,8} and 01.11.{2,4,6} , that is bus 1 for the root port, devices 10 and 11, functions 2,4,6,8 etc.
> 
> There would be no error from QEMU.
> 
>> 
>>>> 
>>>>> The vfs are seen to have non-zero device/slot numbers in the conventional
>>>>> PCI BDF representation.
>>>>> 
>>>>> CC: jusual@redhat.com
>>>>> CC: imammedo@redhat.com
>>>>> CC: akihiko.odaki@daynix.com
>>>>> 
>>>>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
>>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>>> Reviewed-by: Julia Suvorova <jusual@redhat.com>
>>>>> ---
>>>>> hw/pci/pci.c | 15 +++++++++++++++
>>>>> 1 file changed, 15 insertions(+)
>>>>> 
>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>> index e2eb4c3b4a..0320ac2bb3 100644
>>>>> --- a/hw/pci/pci.c
>>>>> +++ b/hw/pci/pci.c
>>>>> @@ -65,6 +65,7 @@ bool pci_available = true;
>>>>> static char *pcibus_get_dev_path(DeviceState *dev);
>>>>> static char *pcibus_get_fw_dev_path(DeviceState *dev);
>>>>> static void pcibus_reset(BusState *qbus);
>>>>> +static bool pcie_has_upstream_port(PCIDevice *dev);
>>>>> 
>>>>> static Property pci_props[] = {
>>>>>   DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>>>>> @@ -1190,6 +1191,20 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev,
>>>>>                  name);
>>>>> 
>>>>>      return NULL;
>>>>> +    } /*
>>>>> +       * With SRIOV and ARI, vfs can have non-zero slot in the conventional
>>>>> +       * PCI interpretation as all five bits reserved for slot addresses are
>>>>> +       * also used for function bits for the various vfs. Ignore that case.
>>>>> +       * It is too early here to check for ARI capabilities in the PCI config
>>>>> +       * space. Hence, we check for a vf device instead.
>>>>> +       */
>>>>> +    else if (!pci_is_vf(pci_dev) &&
>>>>> +             pcie_has_upstream_port(pci_dev) &&
>>>>> +             PCI_SLOT(devfn)) {
>>>>> +        error_setg(errp, "PCI: slot %d is not valid for %s,"
>>>>> +                   " parent device only allows plugging into slot 0.",
>>>>> +                   PCI_SLOT(devfn), name);
>>>>> +        return NULL;
>>>>>   }
>>>>> 
>>>>>   pci_dev->devfn = devfn;
>>>>> -- 
>>>>> 2.39.1
Akihiko Odaki June 30, 2023, 2:43 a.m. UTC | #10
On 2023/06/29 23:18, Ani Sinha wrote:
> 
> 
>> On 29-Jun-2023, at 2:19 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>
>> On 2023/06/29 17:05, Ani Sinha wrote:
>>> On Thu, 29 Jun, 2023, 12:17 pm Akihiko Odaki, <akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>> wrote:
>>>     On 2023/06/29 13:07, Ani Sinha wrote:
>>>      > PCI Express ports only have one slot, so PCI Express devices can
>>>     only be
>>>      > plugged into slot 0 on a PCIE port. Enforce it.
>>>      >
>>>      > The change has been tested to not break ARI by instantiating
>>>     seven vfs on an
>>>      > emulated igb device (the maximum number of vfs the linux igb
>>>     driver supports).
>>>      > The vfs are seen to have non-zero device/slot numbers in the
>>>     conventional
>>>      > PCI BDF representation.
>>>      >
>>>      > CC: jusual@redhat.com <mailto:jusual@redhat.com>
>>>      > CC: imammedo@redhat.com <mailto:imammedo@redhat.com>
>>>      > CC: akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>
>>>      >
>>>      > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
>>>     <https://bugzilla.redhat.com/show_bug.cgi?id=2128929>
>>>      > Signed-off-by: Ani Sinha <anisinha@redhat.com
>>>     <mailto:anisinha@redhat.com>>
>>>      > Reviewed-by: Julia Suvorova <jusual@redhat.com
>>>     <mailto:jusual@redhat.com>>
>>>      > ---
>>>      >   hw/pci/pci.c | 15 +++++++++++++++
>>>      >   1 file changed, 15 insertions(+)
>>>      >
>>>      > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>      > index e2eb4c3b4a..0320ac2bb3 100644
>>>      > --- a/hw/pci/pci.c
>>>      > +++ b/hw/pci/pci.c
>>>      > @@ -65,6 +65,7 @@ bool pci_available = true;
>>>      >   static char *pcibus_get_dev_path(DeviceState *dev);
>>>      >   static char *pcibus_get_fw_dev_path(DeviceState *dev);
>>>      >   static void pcibus_reset(BusState *qbus);
>>>      > +static bool pcie_has_upstream_port(PCIDevice *dev);
>>>      >
>>>      >   static Property pci_props[] = {
>>>      >       DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>>>      > @@ -1190,6 +1191,20 @@ static PCIDevice
>>>     *do_pci_register_device(PCIDevice *pci_dev,
>>>      >                      name);
>>>      >
>>>      >          return NULL;
>>>      > +    } /*
>>>      > +       * With SRIOV and ARI, vfs can have non-zero slot in the
>>>     conventional
>>>      > +       * PCI interpretation as all five bits reserved for slot
>>>     addresses are
>>>      > +       * also used for function bits for the various vfs. Ignore
>>>     that case.
>>>      > +       * It is too early here to check for ARI capabilities in
>>>     the PCI config
>>>      > +       * space. Hence, we check for a vf device instead.
>>>      > +       */
>>>     Why don't just perform this check after the capabilities are set?
>>> We don't want to allocate resources for wrong device parameters. We want to error out early. Other checks also are performed at the same place .
>>
>> It is indeed better to raise an error as early as possible so that we can avoid allocation and other operations that will be reverted and may go wrong due to the invalid condition. That should be the reason why other checks for the address are performed here.
>>
>> However, in this particular case, we cannot confidently perform the check here because it is unknown if the ARI capability will be advertised until the device realization code runs. This can justify delaying the check after the device realization, unlike the other checks.
> 
> Ok so are you proposing that the check we have right before (the check for unoccupied function 0) be also moved? It also uses the same vf approximation for seemingly to support ARI.

No, I don't think the check for function 0 is required to be disabled 
because of the change of addressing caused by ARI, but it is required 
because SR-IOV VF can be added and removed while the PF (function 0) 
remains. I think this check should be performed also when SR-IOV is 
disabled and ARI is enabled.

Thus the check for unoccupied function 0 needs to use pci_is_vf() 
instead of checking ARI capability, and that can happen in 
do_pci_register_device().

> Also where do you propose we move the check?

In pci_qdev_realize(), somewhere after pc->realize() and before option 
ROM loading. See the check for failover pair as an example. I guess it's 
also placed in this region because it needs capability information.

> 
>>
>>> Show quoted text
>>>     Regards,
>>>     Akihiko Odaki
>>>      > +    else if (!pci_is_vf(pci_dev) &&
>>>      > +             pcie_has_upstream_port(pci_dev) &&
>>>      > +             PCI_SLOT(devfn)) {
>>>      > +        error_setg(errp, "PCI: slot %d is not valid for %s,"
>>>      > +                   " parent device only allows plugging into
>>>     slot 0.",
>>>      > +                   PCI_SLOT(devfn), name);
>>>      > +        return NULL;
>>>      >       }
>>>      >
>>>      >       pci_dev->devfn = devfn;
>
Michael S. Tsirkin June 30, 2023, 6:02 a.m. UTC | #11
On Fri, Jun 30, 2023 at 11:43:15AM +0900, Akihiko Odaki wrote:
> On 2023/06/29 23:18, Ani Sinha wrote:
> > 
> > 
> > > On 29-Jun-2023, at 2:19 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> > > 
> > > On 2023/06/29 17:05, Ani Sinha wrote:
> > > > On Thu, 29 Jun, 2023, 12:17 pm Akihiko Odaki, <akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>> wrote:
> > > >     On 2023/06/29 13:07, Ani Sinha wrote:
> > > >      > PCI Express ports only have one slot, so PCI Express devices can
> > > >     only be
> > > >      > plugged into slot 0 on a PCIE port. Enforce it.
> > > >      >
> > > >      > The change has been tested to not break ARI by instantiating
> > > >     seven vfs on an
> > > >      > emulated igb device (the maximum number of vfs the linux igb
> > > >     driver supports).
> > > >      > The vfs are seen to have non-zero device/slot numbers in the
> > > >     conventional
> > > >      > PCI BDF representation.
> > > >      >
> > > >      > CC: jusual@redhat.com <mailto:jusual@redhat.com>
> > > >      > CC: imammedo@redhat.com <mailto:imammedo@redhat.com>
> > > >      > CC: akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>
> > > >      >
> > > >      > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
> > > >     <https://bugzilla.redhat.com/show_bug.cgi?id=2128929>
> > > >      > Signed-off-by: Ani Sinha <anisinha@redhat.com
> > > >     <mailto:anisinha@redhat.com>>
> > > >      > Reviewed-by: Julia Suvorova <jusual@redhat.com
> > > >     <mailto:jusual@redhat.com>>
> > > >      > ---
> > > >      >   hw/pci/pci.c | 15 +++++++++++++++
> > > >      >   1 file changed, 15 insertions(+)
> > > >      >
> > > >      > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > >      > index e2eb4c3b4a..0320ac2bb3 100644
> > > >      > --- a/hw/pci/pci.c
> > > >      > +++ b/hw/pci/pci.c
> > > >      > @@ -65,6 +65,7 @@ bool pci_available = true;
> > > >      >   static char *pcibus_get_dev_path(DeviceState *dev);
> > > >      >   static char *pcibus_get_fw_dev_path(DeviceState *dev);
> > > >      >   static void pcibus_reset(BusState *qbus);
> > > >      > +static bool pcie_has_upstream_port(PCIDevice *dev);
> > > >      >
> > > >      >   static Property pci_props[] = {
> > > >      >       DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
> > > >      > @@ -1190,6 +1191,20 @@ static PCIDevice
> > > >     *do_pci_register_device(PCIDevice *pci_dev,
> > > >      >                      name);
> > > >      >
> > > >      >          return NULL;
> > > >      > +    } /*
> > > >      > +       * With SRIOV and ARI, vfs can have non-zero slot in the
> > > >     conventional
> > > >      > +       * PCI interpretation as all five bits reserved for slot
> > > >     addresses are
> > > >      > +       * also used for function bits for the various vfs. Ignore
> > > >     that case.
> > > >      > +       * It is too early here to check for ARI capabilities in
> > > >     the PCI config
> > > >      > +       * space. Hence, we check for a vf device instead.
> > > >      > +       */
> > > >     Why don't just perform this check after the capabilities are set?
> > > > We don't want to allocate resources for wrong device parameters. We want to error out early. Other checks also are performed at the same place .
> > > 
> > > It is indeed better to raise an error as early as possible so that we can avoid allocation and other operations that will be reverted and may go wrong due to the invalid condition. That should be the reason why other checks for the address are performed here.
> > > 
> > > However, in this particular case, we cannot confidently perform the check here because it is unknown if the ARI capability will be advertised until the device realization code runs. This can justify delaying the check after the device realization, unlike the other checks.
> > 
> > Ok so are you proposing that the check we have right before (the check for unoccupied function 0) be also moved? It also uses the same vf approximation for seemingly to support ARI.
> 
> No, I don't think the check for function 0 is required to be disabled
> because of the change of addressing caused by ARI, but it is required
> because SR-IOV VF can be added and removed while the PF (function 0)
> remains. I think this check should be performed also when SR-IOV is disabled
> and ARI is enabled.
> 
> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of
> checking ARI capability, and that can happen in do_pci_register_device().
> 
> > Also where do you propose we move the check?
> 
> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM
> loading. See the check for failover pair as an example. I guess it's also
> placed in this region because it needs capability information.

How about instead of spending so much time on working around
incomplete ARI support we actually complete ARI support?



> > 
> > > 
> > > > Show quoted text
> > > >     Regards,
> > > >     Akihiko Odaki
> > > >      > +    else if (!pci_is_vf(pci_dev) &&
> > > >      > +             pcie_has_upstream_port(pci_dev) &&
> > > >      > +             PCI_SLOT(devfn)) {
> > > >      > +        error_setg(errp, "PCI: slot %d is not valid for %s,"
> > > >      > +                   " parent device only allows plugging into
> > > >     slot 0.",
> > > >      > +                   PCI_SLOT(devfn), name);
> > > >      > +        return NULL;
> > > >      >       }
> > > >      >
> > > >      >       pci_dev->devfn = devfn;
> >
Ani Sinha June 30, 2023, 7:41 a.m. UTC | #12
> On 30-Jun-2023, at 8:13 AM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> 
> On 2023/06/29 23:18, Ani Sinha wrote:
>>> On 29-Jun-2023, at 2:19 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>> 
>>> On 2023/06/29 17:05, Ani Sinha wrote:
>>>> On Thu, 29 Jun, 2023, 12:17 pm Akihiko Odaki, <akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>> wrote:
>>>>    On 2023/06/29 13:07, Ani Sinha wrote:
>>>>     > PCI Express ports only have one slot, so PCI Express devices can
>>>>    only be
>>>>     > plugged into slot 0 on a PCIE port. Enforce it.
>>>>     >
>>>>     > The change has been tested to not break ARI by instantiating
>>>>    seven vfs on an
>>>>     > emulated igb device (the maximum number of vfs the linux igb
>>>>    driver supports).
>>>>     > The vfs are seen to have non-zero device/slot numbers in the
>>>>    conventional
>>>>     > PCI BDF representation.
>>>>     >
>>>>     > CC: jusual@redhat.com <mailto:jusual@redhat.com>
>>>>     > CC: imammedo@redhat.com <mailto:imammedo@redhat.com>
>>>>     > CC: akihiko.odaki@daynix.com <mailto:akihiko.odaki@daynix.com>
>>>>     >
>>>>     > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
>>>>    <https://bugzilla.redhat.com/show_bug.cgi?id=2128929>
>>>>     > Signed-off-by: Ani Sinha <anisinha@redhat.com
>>>>    <mailto:anisinha@redhat.com>>
>>>>     > Reviewed-by: Julia Suvorova <jusual@redhat.com
>>>>    <mailto:jusual@redhat.com>>
>>>>     > ---
>>>>     >   hw/pci/pci.c | 15 +++++++++++++++
>>>>     >   1 file changed, 15 insertions(+)
>>>>     >
>>>>     > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>     > index e2eb4c3b4a..0320ac2bb3 100644
>>>>     > --- a/hw/pci/pci.c
>>>>     > +++ b/hw/pci/pci.c
>>>>     > @@ -65,6 +65,7 @@ bool pci_available = true;
>>>>     >   static char *pcibus_get_dev_path(DeviceState *dev);
>>>>     >   static char *pcibus_get_fw_dev_path(DeviceState *dev);
>>>>     >   static void pcibus_reset(BusState *qbus);
>>>>     > +static bool pcie_has_upstream_port(PCIDevice *dev);
>>>>     >
>>>>     >   static Property pci_props[] = {
>>>>     >       DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>>>>     > @@ -1190,6 +1191,20 @@ static PCIDevice
>>>>    *do_pci_register_device(PCIDevice *pci_dev,
>>>>     >                      name);
>>>>     >
>>>>     >          return NULL;
>>>>     > +    } /*
>>>>     > +       * With SRIOV and ARI, vfs can have non-zero slot in the
>>>>    conventional
>>>>     > +       * PCI interpretation as all five bits reserved for slot
>>>>    addresses are
>>>>     > +       * also used for function bits for the various vfs. Ignore
>>>>    that case.
>>>>     > +       * It is too early here to check for ARI capabilities in
>>>>    the PCI config
>>>>     > +       * space. Hence, we check for a vf device instead.
>>>>     > +       */
>>>>    Why don't just perform this check after the capabilities are set?
>>>> We don't want to allocate resources for wrong device parameters. We want to error out early. Other checks also are performed at the same place .
>>> 
>>> It is indeed better to raise an error as early as possible so that we can avoid allocation and other operations that will be reverted and may go wrong due to the invalid condition. That should be the reason why other checks for the address are performed here.
>>> 
>>> However, in this particular case, we cannot confidently perform the check here because it is unknown if the ARI capability will be advertised until the device realization code runs. This can justify delaying the check after the device realization, unlike the other checks.
>> Ok so are you proposing that the check we have right before (the check for unoccupied function 0) be also moved? It also uses the same vf approximation for seemingly to support ARI.
> 
> No, I don't think the check for function 0 is required to be disabled because of the change of addressing caused by ARI, but it is required because SR-IOV VF can be added and removed while the PF (function 0) remains. I think this check should be performed also when SR-IOV is disabled and ARI is enabled.

OK I am a little confused with your explanation here. I understand the non-ARI case - to detect the PF we need to have function 0 available. Looking at the comment in pci_init_multifunction() it seems in ARI case, we can simply have a vf in function 0 (conventional interpretation) as well. Hence we need to ignore vfs both in ARI and non-ARI cases. This is what I understood.

> 
> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
> 
>> Also where do you propose we move the check?
> 
> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.

Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:

-device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0

The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.

> See the check for failover pair as an example. I guess it's also placed in this region because it needs capability information.
> 
>>> 
>>>> Show quoted text
>>>>    Regards,
>>>>    Akihiko Odaki
>>>>     > +    else if (!pci_is_vf(pci_dev) &&
>>>>     > +             pcie_has_upstream_port(pci_dev) &&
>>>>     > +             PCI_SLOT(devfn)) {
>>>>     > +        error_setg(errp, "PCI: slot %d is not valid for %s,"
>>>>     > +                   " parent device only allows plugging into
>>>>    slot 0.",
>>>>     > +                   PCI_SLOT(devfn), name);
>>>>     > +        return NULL;
>>>>     >       }
>>>>     >
>>>>     >       pci_dev->devfn = devfn;
Michael S. Tsirkin June 30, 2023, 8:32 a.m. UTC | #13
On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
> > 
> > Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
> > 
> >> Also where do you propose we move the check?
> > 
> > In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
> 
> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
> 
> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
> 
> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.

I think it's allowed because it expects you to hotplug function 0 later,
no?

I am quite worried about all this work going into blocking
what we think is disallowed configurations. We should have
maybe blocked them originally, but now that we didn't
there's a non zero chance of regressions, and the benefit
is not guaranteed.
Ani Sinha June 30, 2023, 8:36 a.m. UTC | #14
> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
>>> 
>>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
>>> 
>>>> Also where do you propose we move the check?
>>> 
>>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
>> 
>> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
>> 
>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
>> 
>> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.
> 
> I think it's allowed because it expects you to hotplug function 0 later,

This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.

> no?
> 
> I am quite worried about all this work going into blocking
> what we think is disallowed configurations. We should have
> maybe blocked them originally, but now that we didn't
> there's a non zero chance of regressions,

Sigh, no medals here for being brave :-)

> and the benefit
> is not guaranteed.
> 
> -- 
> MST
>
Michael S. Tsirkin June 30, 2023, 8:43 a.m. UTC | #15
On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:
> 
> 
> > On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
> >>> 
> >>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
> >>> 
> >>>> Also where do you propose we move the check?
> >>> 
> >>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
> >> 
> >> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
> >> 
> >> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
> >> 
> >> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.
> > 
> > I think it's allowed because it expects you to hotplug function 0 later,
> 
> This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.

yes but if you later add a device with ARI and with next field pointing
slot 2 guest will suddently find both.

> > no?
> > 
> > I am quite worried about all this work going into blocking
> > what we think is disallowed configurations. We should have
> > maybe blocked them originally, but now that we didn't
> > there's a non zero chance of regressions,
> 
> Sigh,

There's value in patches 1-4 I think - the last patch helped you find
these. so there's value in this work.

> no medals here for being brave :-)

Try removing support for a 3.5mm jack next. Oh wait ...

> > and the benefit
> > is not guaranteed.
> > 
> > -- 
> > MST
> >
Ani Sinha June 30, 2023, 9:22 a.m. UTC | #16
> On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:
>> 
>> 
>>> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> 
>>> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
>>>>> 
>>>>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
>>>>> 
>>>>>> Also where do you propose we move the check?
>>>>> 
>>>>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
>>>> 
>>>> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
>>>> 
>>>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
>>>> 
>>>> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.
>>> 
>>> I think it's allowed because it expects you to hotplug function 0 later,
>> 
>> This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.
> 
> yes but if you later add a device with ARI and with next field pointing
> slot 2 guest will suddently find both.

Hmm, I tried this:

-device pcie-root-port,id=p \
-device igb,bus=p,addr=0x2.0x0 \
-device igb,bus=p,addr=0x0.0x0 \

The guest only found the second igb device not the first. You can try too.

> 
>>> no?
>>> 
>>> I am quite worried about all this work going into blocking
>>> what we think is disallowed configurations. We should have
>>> maybe blocked them originally, but now that we didn't
>>> there's a non zero chance of regressions,
>> 
>> Sigh,
> 
> There's value in patches 1-4 I think - the last patch helped you find
> these. so there's value in this work.
> 
>> no medals here for being brave :-)
> 
> Try removing support for a 3.5mm jack next. Oh wait ...

Indeed. Everyone uses bluetooth these days. I for one is happy that the jack is gone (and they were bold enough to do it while Samsung and others still carry the useless port ) :-)
 
> 
>>> and the benefit
>>> is not guaranteed.
>>> 
>>> -- 
>>> MST
Michael S. Tsirkin June 30, 2023, 10 a.m. UTC | #17
On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:
> 
> 
> > On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:
> >> 
> >> 
> >>> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>> 
> >>> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
> >>>>> 
> >>>>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
> >>>>> 
> >>>>>> Also where do you propose we move the check?
> >>>>> 
> >>>>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
> >>>> 
> >>>> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
> >>>> 
> >>>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
> >>>> 
> >>>> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.
> >>> 
> >>> I think it's allowed because it expects you to hotplug function 0 later,
> >> 
> >> This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.
> > 
> > yes but if you later add a device with ARI and with next field pointing
> > slot 2 guest will suddently find both.
> 
> Hmm, I tried this:
> 
> -device pcie-root-port,id=p \
> -device igb,bus=p,addr=0x2.0x0 \
> -device igb,bus=p,addr=0x0.0x0 \
> 
> The guest only found the second igb device not the first. You can try too.

Because next parameter in pcie_ari_init does not match.


> > 
> >>> no?
> >>> 
> >>> I am quite worried about all this work going into blocking
> >>> what we think is disallowed configurations. We should have
> >>> maybe blocked them originally, but now that we didn't
> >>> there's a non zero chance of regressions,
> >> 
> >> Sigh,
> > 
> > There's value in patches 1-4 I think - the last patch helped you find
> > these. so there's value in this work.
> > 
> >> no medals here for being brave :-)
> > 
> > Try removing support for a 3.5mm jack next. Oh wait ...
> 
> Indeed. Everyone uses bluetooth these days. I for one is happy that the jack is gone (and they were bold enough to do it while Samsung and others still carry the useless port ) :-)
>  
> > 
> >>> and the benefit
> >>> is not guaranteed.
> >>> 
> >>> -- 
> >>> MST
Michael S. Tsirkin June 30, 2023, 10:25 a.m. UTC | #18
On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:
> Indeed. Everyone uses bluetooth these days. I for one is happy that
> the jack is gone (and they were bold enough to do it while Samsung and
> others still carry the useless port ) :-)

🤦🏻
Ani Sinha June 30, 2023, 10:37 a.m. UTC | #19
> On 30-Jun-2023, at 3:30 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:
>> 
>> 
>>> On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> 
>>> On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:
>>>> 
>>>> 
>>>>> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>> 
>>>>> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
>>>>>>> 
>>>>>>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
>>>>>>> 
>>>>>>>> Also where do you propose we move the check?
>>>>>>> 
>>>>>>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
>>>>>> 
>>>>>> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
>>>>>> 
>>>>>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
>>>>>> 
>>>>>> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.
>>>>> 
>>>>> I think it's allowed because it expects you to hotplug function 0 later,
>>>> 
>>>> This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.
>>> 
>>> yes but if you later add a device with ARI and with next field pointing
>>> slot 2 guest will suddently find both.
>> 
>> Hmm, I tried this:
>> 
>> -device pcie-root-port,id=p \
>> -device igb,bus=p,addr=0x2.0x0 \
>> -device igb,bus=p,addr=0x0.0x0 \
>> 
>> The guest only found the second igb device not the first. You can try too.
> 
> Because next parameter in pcie_ari_init does not match.

OK send me a command line that I can test it with. I can’t come up with a case that actually works in practice.

> 
> 
>>> 
>>>>> no?
>>>>> 
>>>>> I am quite worried about all this work going into blocking
>>>>> what we think is disallowed configurations. We should have
>>>>> maybe blocked them originally, but now that we didn't
>>>>> there's a non zero chance of regressions,
>>>> 
>>>> Sigh,
>>> 
>>> There's value in patches 1-4 I think - the last patch helped you find
>>> these. so there's value in this work.
>>> 
>>>> no medals here for being brave :-)
>>> 
>>> Try removing support for a 3.5mm jack next. Oh wait ...
>> 
>> Indeed. Everyone uses bluetooth these days. I for one is happy that the jack is gone (and they were bold enough to do it while Samsung and others still carry the useless port ) :-)
>> 
>>> 
>>>>> and the benefit
>>>>> is not guaranteed.
>>>>> 
>>>>> -- 
>>>>> MST
Michael S. Tsirkin June 30, 2023, 10:40 a.m. UTC | #20
On Fri, Jun 30, 2023 at 04:07:32PM +0530, Ani Sinha wrote:
> 
> 
> > On 30-Jun-2023, at 3:30 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:
> >> 
> >> 
> >>> On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>> 
> >>> On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:
> >>>> 
> >>>> 
> >>>>> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>>> 
> >>>>> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
> >>>>>>> 
> >>>>>>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
> >>>>>>> 
> >>>>>>>> Also where do you propose we move the check?
> >>>>>>> 
> >>>>>>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
> >>>>>> 
> >>>>>> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
> >>>>>> 
> >>>>>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
> >>>>>> 
> >>>>>> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.
> >>>>> 
> >>>>> I think it's allowed because it expects you to hotplug function 0 later,
> >>>> 
> >>>> This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.
> >>> 
> >>> yes but if you later add a device with ARI and with next field pointing
> >>> slot 2 guest will suddently find both.
> >> 
> >> Hmm, I tried this:
> >> 
> >> -device pcie-root-port,id=p \
> >> -device igb,bus=p,addr=0x2.0x0 \
> >> -device igb,bus=p,addr=0x0.0x0 \
> >> 
> >> The guest only found the second igb device not the first. You can try too.
> > 
> > Because next parameter in pcie_ari_init does not match.
> 
> OK send me a command line that I can test it with. I can’t come up with a case that actually works in practice.


you need to patch igb to pass 2 as next parameter.
maybe add a property to make it easier to play with.

> > 
> > 
> >>> 
> >>>>> no?
> >>>>> 
> >>>>> I am quite worried about all this work going into blocking
> >>>>> what we think is disallowed configurations. We should have
> >>>>> maybe blocked them originally, but now that we didn't
> >>>>> there's a non zero chance of regressions,
> >>>> 
> >>>> Sigh,
> >>> 
> >>> There's value in patches 1-4 I think - the last patch helped you find
> >>> these. so there's value in this work.
> >>> 
> >>>> no medals here for being brave :-)
> >>> 
> >>> Try removing support for a 3.5mm jack next. Oh wait ...
> >> 
> >> Indeed. Everyone uses bluetooth these days. I for one is happy that the jack is gone (and they were bold enough to do it while Samsung and others still carry the useless port ) :-)
> >> 
> >>> 
> >>>>> and the benefit
> >>>>> is not guaranteed.
> >>>>> 
> >>>>> -- 
> >>>>> MST
Ani Sinha June 30, 2023, 10:45 a.m. UTC | #21
> On 30-Jun-2023, at 4:10 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Fri, Jun 30, 2023 at 04:07:32PM +0530, Ani Sinha wrote:
>> 
>> 
>>> On 30-Jun-2023, at 3:30 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> 
>>> On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:
>>>> 
>>>> 
>>>>> On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>> 
>>>>> On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:
>>>>>> 
>>>>>> 
>>>>>>> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>> 
>>>>>>> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
>>>>>>>>> 
>>>>>>>>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
>>>>>>>>> 
>>>>>>>>>> Also where do you propose we move the check?
>>>>>>>>> 
>>>>>>>>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
>>>>>>>> 
>>>>>>>> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
>>>>>>>> 
>>>>>>>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
>>>>>>>> 
>>>>>>>> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.
>>>>>>> 
>>>>>>> I think it's allowed because it expects you to hotplug function 0 later,
>>>>>> 
>>>>>> This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.
>>>>> 
>>>>> yes but if you later add a device with ARI and with next field pointing
>>>>> slot 2 guest will suddently find both.
>>>> 
>>>> Hmm, I tried this:
>>>> 
>>>> -device pcie-root-port,id=p \
>>>> -device igb,bus=p,addr=0x2.0x0 \
>>>> -device igb,bus=p,addr=0x0.0x0 \
>>>> 
>>>> The guest only found the second igb device not the first. You can try too.
>>> 
>>> Because next parameter in pcie_ari_init does not match.
>> 
>> OK send me a command line that I can test it with. I can’t come up with a case that actually works in practice.
> 
> 
> you need to patch igb to pass 2 as next parameter.
> maybe add a property to make it easier to play with.

Yes but without patching, could I not change the device parameters like

-device pcie-root-port,id=p \
-device igb,bus=p,addr=0x1.0x0 \
-device igb,bus=p,addr=0x0.0x0 \

I tried the above and it did not work.

> 
>>> 
>>> 
>>>>> 
>>>>>>> no?
>>>>>>> 
>>>>>>> I am quite worried about all this work going into blocking
>>>>>>> what we think is disallowed configurations. We should have
>>>>>>> maybe blocked them originally, but now that we didn't
>>>>>>> there's a non zero chance of regressions,
>>>>>> 
>>>>>> Sigh,
>>>>> 
>>>>> There's value in patches 1-4 I think - the last patch helped you find
>>>>> these. so there's value in this work.
>>>>> 
>>>>>> no medals here for being brave :-)
>>>>> 
>>>>> Try removing support for a 3.5mm jack next. Oh wait ...
>>>> 
>>>> Indeed. Everyone uses bluetooth these days. I for one is happy that the jack is gone (and they were bold enough to do it while Samsung and others still carry the useless port ) :-)
>>>> 
>>>>> 
>>>>>>> and the benefit
>>>>>>> is not guaranteed.
>>>>>>> 
>>>>>>> -- 
>>>>>>> MST
Ani Sinha June 30, 2023, 10:49 a.m. UTC | #22
> On 30-Jun-2023, at 4:10 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Fri, Jun 30, 2023 at 04:07:32PM +0530, Ani Sinha wrote:
>> 
>> 
>>> On 30-Jun-2023, at 3:30 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> 
>>> On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:
>>>> 
>>>> 
>>>>> On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>> 
>>>>> On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:
>>>>>> 
>>>>>> 
>>>>>>> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>> 
>>>>>>> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
>>>>>>>>> 
>>>>>>>>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
>>>>>>>>> 
>>>>>>>>>> Also where do you propose we move the check?
>>>>>>>>> 
>>>>>>>>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
>>>>>>>> 
>>>>>>>> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
>>>>>>>> 
>>>>>>>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
>>>>>>>> 
>>>>>>>> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.
>>>>>>> 
>>>>>>> I think it's allowed because it expects you to hotplug function 0 later,
>>>>>> 
>>>>>> This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.
>>>>> 
>>>>> yes but if you later add a device with ARI and with next field pointing
>>>>> slot 2 guest will suddently find both.
>>>> 
>>>> Hmm, I tried this:
>>>> 
>>>> -device pcie-root-port,id=p \
>>>> -device igb,bus=p,addr=0x2.0x0 \
>>>> -device igb,bus=p,addr=0x0.0x0 \
>>>> 
>>>> The guest only found the second igb device not the first. You can try too.
>>> 
>>> Because next parameter in pcie_ari_init does not match.
>> 
>> OK send me a command line that I can test it with. I can’t come up with a case that actually works in practice.
> 
> 
> you need to patch igb to pass 2 as next parameter.

OK I tried this too along with

-device pcie-root-port,id=p \
-device igb,bus=p,addr=0x2.0x0 \
-device igb,bus=p,addr=0x0.0x0 \

Still same result. The guest only detects the last one.

> maybe add a property to make it easier to play with.
> 
>>> 
>>> 
>>>>> 
>>>>>>> no?
>>>>>>> 
>>>>>>> I am quite worried about all this work going into blocking
>>>>>>> what we think is disallowed configurations. We should have
>>>>>>> maybe blocked them originally, but now that we didn't
>>>>>>> there's a non zero chance of regressions,
>>>>>> 
>>>>>> Sigh,
>>>>> 
>>>>> There's value in patches 1-4 I think - the last patch helped you find
>>>>> these. so there's value in this work.
>>>>> 
>>>>>> no medals here for being brave :-)
>>>>> 
>>>>> Try removing support for a 3.5mm jack next. Oh wait ...
>>>> 
>>>> Indeed. Everyone uses bluetooth these days. I for one is happy that the jack is gone (and they were bold enough to do it while Samsung and others still carry the useless port ) :-)
>>>> 
>>>>> 
>>>>>>> and the benefit
>>>>>>> is not guaranteed.
>>>>>>> 
>>>>>>> -- 
>>>>>>> MST
Akihiko Odaki June 30, 2023, 11:36 a.m. UTC | #23
On 2023/06/30 19:37, Ani Sinha wrote:
> 
> 
>> On 30-Jun-2023, at 3:30 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>
>> On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:
>>>
>>>
>>>> On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>
>>>> On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:
>>>>>
>>>>>
>>>>>> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>
>>>>>> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
>>>>>>>>
>>>>>>>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
>>>>>>>>
>>>>>>>>> Also where do you propose we move the check?
>>>>>>>>
>>>>>>>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
>>>>>>>
>>>>>>> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
>>>>>>>
>>>>>>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
>>>>>>>
>>>>>>> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.
>>>>>>
>>>>>> I think it's allowed because it expects you to hotplug function 0 later,
>>>>>
>>>>> This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.
>>>>
>>>> yes but if you later add a device with ARI and with next field pointing
>>>> slot 2 guest will suddently find both.
>>>
>>> Hmm, I tried this:
>>>
>>> -device pcie-root-port,id=p \
>>> -device igb,bus=p,addr=0x2.0x0 \
>>> -device igb,bus=p,addr=0x0.0x0 \
>>>
>>> The guest only found the second igb device not the first. You can try too.
>>
>> Because next parameter in pcie_ari_init does not match.
> 
> OK send me a command line that I can test it with. I can’t come up with a case that actually works in practice.

I don't think there is one because the code for PCI multifunction does 
not care ARI. In my opinion, we need yet another check to make 
non-SR-IOV multifunction and ARI capability mutually exclusive; if a 
function has the ARI capability and it is not a VF, an attempt to assign 
non-zero function number for it should fail.

But it should be a distinct check as it will need to check the function 
number bits.

> 
>>
>>
>>>>
>>>>>> no?
>>>>>>
>>>>>> I am quite worried about all this work going into blocking
>>>>>> what we think is disallowed configurations. We should have
>>>>>> maybe blocked them originally, but now that we didn't
>>>>>> there's a non zero chance of regressions,
>>>>>
>>>>> Sigh,
>>>>
>>>> There's value in patches 1-4 I think - the last patch helped you find
>>>> these. so there's value in this work.
>>>>
>>>>> no medals here for being brave :-)
>>>>
>>>> Try removing support for a 3.5mm jack next. Oh wait ...
>>>
>>> Indeed. Everyone uses bluetooth these days. I for one is happy that the jack is gone (and they were bold enough to do it while Samsung and others still carry the useless port ) :-)

Hello from a guy using a shiny M2 Macbook Air carrying the legacy jack 
with a 100-yen earphone. Even people who ported Linux to this machine 
spent efforts to get the jack to work on Linux ;)

>>>
>>>>
>>>>>> and the benefit
>>>>>> is not guaranteed.
>>>>>>
>>>>>> -- 
>>>>>> MST
>
Ani Sinha June 30, 2023, 11:47 a.m. UTC | #24
> On 30-Jun-2023, at 5:06 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> 
> On 2023/06/30 19:37, Ani Sinha wrote:
>>> On 30-Jun-2023, at 3:30 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> 
>>> On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:
>>>> 
>>>> 
>>>>> On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>> 
>>>>> On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:
>>>>>> 
>>>>>> 
>>>>>>> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>> 
>>>>>>> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
>>>>>>>>> 
>>>>>>>>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
>>>>>>>>> 
>>>>>>>>>> Also where do you propose we move the check?
>>>>>>>>> 
>>>>>>>>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
>>>>>>>> 
>>>>>>>> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
>>>>>>>> 
>>>>>>>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
>>>>>>>> 
>>>>>>>> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.
>>>>>>> 
>>>>>>> I think it's allowed because it expects you to hotplug function 0 later,
>>>>>> 
>>>>>> This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.
>>>>> 
>>>>> yes but if you later add a device with ARI and with next field pointing
>>>>> slot 2 guest will suddently find both.
>>>> 
>>>> Hmm, I tried this:
>>>> 
>>>> -device pcie-root-port,id=p \
>>>> -device igb,bus=p,addr=0x2.0x0 \
>>>> -device igb,bus=p,addr=0x0.0x0 \
>>>> 
>>>> The guest only found the second igb device not the first. You can try too.
>>> 
>>> Because next parameter in pcie_ari_init does not match.
>> OK send me a command line that I can test it with. I can’t come up with a case that actually works in practice.
> 
> I don't think there is one because the code for PCI multifunction does not care ARI. In my opinion, we need yet another check to make non-SR-IOV multifunction and ARI capability mutually exclusive; if a function has the ARI capability and it is not a VF, an attempt to assign non-zero function number for it should fail.
> 
> But it should be a distinct check as it will need to check the function number bits.

I will leave this for mst to comment.

> 
>>> 
>>> 
>>>>> 
>>>>>>> no?
>>>>>>> 
>>>>>>> I am quite worried about all this work going into blocking
>>>>>>> what we think is disallowed configurations. We should have
>>>>>>> maybe blocked them originally, but now that we didn't
>>>>>>> there's a non zero chance of regressions,
>>>>>> 
>>>>>> Sigh,
>>>>> 
>>>>> There's value in patches 1-4 I think - the last patch helped you find
>>>>> these. so there's value in this work.
>>>>> 
>>>>>> no medals here for being brave :-)
>>>>> 
>>>>> Try removing support for a 3.5mm jack next. Oh wait ...
>>>> 
>>>> Indeed. Everyone uses bluetooth these days. I for one is happy that the jack is gone (and they were bold enough to do it while Samsung and others still carry the useless port ) :-)
> 
> Hello from a guy using a shiny M2 Macbook Air carrying the legacy jack with a 100-yen earphone. Even people who ported Linux to this machine spent efforts to get the jack to work on Linux ;)

Yes that is because the jack exists. If they did not make the jack work, it would be a broken device. With bluetooth being so pervasive, its mostly a matter of time before most devices stop shipping with a 3.5mm.

All I am saying is sometimes one needs to be bold to bring changes. Otherwise changes never come. Status quo is not an option IMHO. 

> 
>>>> 
>>>>> 
>>>>>>> and the benefit
>>>>>>> is not guaranteed.
>>>>>>> 
>>>>>>> -- 
>>>>>>> MST
Akihiko Odaki June 30, 2023, 11:55 a.m. UTC | #25
On 2023/06/30 20:36, Akihiko Odaki wrote:
> On 2023/06/30 19:37, Ani Sinha wrote:
>>
>>
>>> On 30-Jun-2023, at 3:30 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>
>>> On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:
>>>>
>>>>
>>>>> On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>
>>>>> On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:
>>>>>>
>>>>>>
>>>>>>> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> 
>>>>>>> wrote:
>>>>>>>
>>>>>>> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
>>>>>>>>>
>>>>>>>>> Thus the check for unoccupied function 0 needs to use 
>>>>>>>>> pci_is_vf() instead of checking ARI capability, and that can 
>>>>>>>>> happen in do_pci_register_device().
>>>>>>>>>
>>>>>>>>>> Also where do you propose we move the check?
>>>>>>>>>
>>>>>>>>> In pci_qdev_realize(), somewhere after pc->realize() and before 
>>>>>>>>> option ROM loading.
>>>>>>>>
>>>>>>>> Hmm, I tried this. The issue here is something like this would 
>>>>>>>> be now allowed since the PF has ARI capability:
>>>>>>>>
>>>>>>>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
>>>>>>>>
>>>>>>>> The above should not be allowed and when used, we do not see the 
>>>>>>>> igb ethernet device from the guest OS.
>>>>>>>
>>>>>>> I think it's allowed because it expects you to hotplug function 0 
>>>>>>> later,
>>>>>>
>>>>>> This is about the igb device being plugged into the non-zero slot 
>>>>>> of the pci-root-port. The guest OS ignores it.
>>>>>
>>>>> yes but if you later add a device with ARI and with next field 
>>>>> pointing
>>>>> slot 2 guest will suddently find both.
>>>>
>>>> Hmm, I tried this:
>>>>
>>>> -device pcie-root-port,id=p \
>>>> -device igb,bus=p,addr=0x2.0x0 \
>>>> -device igb,bus=p,addr=0x0.0x0 \
>>>>
>>>> The guest only found the second igb device not the first. You can 
>>>> try too.
>>>
>>> Because next parameter in pcie_ari_init does not match.
>>
>> OK send me a command line that I can test it with. I can’t come up 
>> with a case that actually works in practice.
> 
> I don't think there is one because the code for PCI multifunction does 
> not care ARI. In my opinion, we need yet another check to make 
> non-SR-IOV multifunction and ARI capability mutually exclusive; if a 
> function has the ARI capability and it is not a VF, an attempt to assign 
> non-zero function number for it should fail.

No, the more straightforward way to fix this problem is to check the 
device function number is less than the next function number advertised 
with ARI.

> 
> But it should be a distinct check as it will need to check the function 
> number bits.
Ani Sinha June 30, 2023, 1:56 p.m. UTC | #26
> On 30-Jun-2023, at 5:25 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> 
> On 2023/06/30 20:36, Akihiko Odaki wrote:
>> On 2023/06/30 19:37, Ani Sinha wrote:
>>> 
>>> 
>>>> On 30-Jun-2023, at 3:30 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>> 
>>>> On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:
>>>>> 
>>>>> 
>>>>>> On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>> 
>>>>>> On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>> 
>>>>>>>> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
>>>>>>>>>> 
>>>>>>>>>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
>>>>>>>>>> 
>>>>>>>>>>> Also where do you propose we move the check?
>>>>>>>>>> 
>>>>>>>>>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
>>>>>>>>> 
>>>>>>>>> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
>>>>>>>>> 
>>>>>>>>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
>>>>>>>>> 
>>>>>>>>> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.
>>>>>>>> 
>>>>>>>> I think it's allowed because it expects you to hotplug function 0 later,
>>>>>>> 
>>>>>>> This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.
>>>>>> 
>>>>>> yes but if you later add a device with ARI and with next field pointing
>>>>>> slot 2 guest will suddently find both.
>>>>> 
>>>>> Hmm, I tried this:
>>>>> 
>>>>> -device pcie-root-port,id=p \
>>>>> -device igb,bus=p,addr=0x2.0x0 \
>>>>> -device igb,bus=p,addr=0x0.0x0 \
>>>>> 
>>>>> The guest only found the second igb device not the first. You can try too.
>>>> 
>>>> Because next parameter in pcie_ari_init does not match.
>>> 
>>> OK send me a command line that I can test it with. I can’t come up with a case that actually works in practice.
>> I don't think there is one because the code for PCI multifunction does not care ARI. In my opinion, we need yet another check to make non-SR-IOV multifunction and ARI capability mutually exclusive; if a function has the ARI capability and it is not a VF, an attempt to assign non-zero function number for it should fail.
> 
> No, the more straightforward way to fix this problem is to check the device function number is less than the next function number advertised with ARI.
> 

Personally I would leave the check for ARI capable devices to someone else. I am ok with moving the check to pci_qdev_realize() and I verified that both unit tests and the breaking test case for BZ 2128929 is caught. I have also verified that the change does not break igb vf generation. 
If there is any interest to push this change, I will spin a new version with tags for test fixes added and the rework for this patch with the check moved to the new location as you had suggested.


>> But it should be a distinct check as it will need to check the function number bits.
Michael S. Tsirkin June 30, 2023, 3:29 p.m. UTC | #27
On Fri, Jun 30, 2023 at 08:36:38PM +0900, Akihiko Odaki wrote:
> On 2023/06/30 19:37, Ani Sinha wrote:
> > 
> > 
> > > On 30-Jun-2023, at 3:30 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > 
> > > On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:
> > > > 
> > > > 
> > > > > On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > 
> > > > > On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:
> > > > > > 
> > > > > > 
> > > > > > > On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > 
> > > > > > > On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
> > > > > > > > > 
> > > > > > > > > Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
> > > > > > > > > 
> > > > > > > > > > Also where do you propose we move the check?
> > > > > > > > > 
> > > > > > > > > In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
> > > > > > > > 
> > > > > > > > Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
> > > > > > > > 
> > > > > > > > -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
> > > > > > > > 
> > > > > > > > The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.
> > > > > > > 
> > > > > > > I think it's allowed because it expects you to hotplug function 0 later,
> > > > > > 
> > > > > > This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.
> > > > > 
> > > > > yes but if you later add a device with ARI and with next field pointing
> > > > > slot 2 guest will suddently find both.
> > > > 
> > > > Hmm, I tried this:
> > > > 
> > > > -device pcie-root-port,id=p \
> > > > -device igb,bus=p,addr=0x2.0x0 \
> > > > -device igb,bus=p,addr=0x0.0x0 \
> > > > 
> > > > The guest only found the second igb device not the first. You can try too.
> > > 
> > > Because next parameter in pcie_ari_init does not match.
> > 
> > OK send me a command line that I can test it with. I can’t come up with a case that actually works in practice.
> 
> I don't think there is one because the code for PCI multifunction does not
> care ARI. In my opinion, we need yet another check to make non-SR-IOV
> multifunction and ARI capability mutually exclusive; if a function has the
> ARI capability and it is not a VF, an attempt to assign non-zero function
> number for it should fail.

Why is that? My understanding is that ARI capable devices should also
set the multifunction bit in the header. It's not terribly clear from
the spec though.

> But it should be a distinct check as it will need to check the function
> number bits.
> 
> > 
> > > 
> > > 
> > > > > 
> > > > > > > no?
> > > > > > > 
> > > > > > > I am quite worried about all this work going into blocking
> > > > > > > what we think is disallowed configurations. We should have
> > > > > > > maybe blocked them originally, but now that we didn't
> > > > > > > there's a non zero chance of regressions,
> > > > > > 
> > > > > > Sigh,
> > > > > 
> > > > > There's value in patches 1-4 I think - the last patch helped you find
> > > > > these. so there's value in this work.
> > > > > 
> > > > > > no medals here for being brave :-)
> > > > > 
> > > > > Try removing support for a 3.5mm jack next. Oh wait ...
> > > > 
> > > > Indeed. Everyone uses bluetooth these days. I for one is happy that the jack is gone (and they were bold enough to do it while Samsung and others still carry the useless port ) :-)
> 
> Hello from a guy using a shiny M2 Macbook Air carrying the legacy jack with
> a 100-yen earphone. Even people who ported Linux to this machine spent
> efforts to get the jack to work on Linux ;)
> 
> > > > 
> > > > > 
> > > > > > > and the benefit
> > > > > > > is not guaranteed.
> > > > > > > 
> > > > > > > -- 
> > > > > > > MST
> >
Akihiko Odaki July 1, 2023, 7:09 a.m. UTC | #28
On 2023/06/30 22:56, Ani Sinha wrote:
> 
> 
>> On 30-Jun-2023, at 5:25 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>
>> On 2023/06/30 20:36, Akihiko Odaki wrote:
>>> On 2023/06/30 19:37, Ani Sinha wrote:
>>>>
>>>>
>>>>> On 30-Jun-2023, at 3:30 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>
>>>>> On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:
>>>>>>
>>>>>>
>>>>>>> On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>
>>>>>>> On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
>>>>>>>>>>>
>>>>>>>>>>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
>>>>>>>>>>>
>>>>>>>>>>>> Also where do you propose we move the check?
>>>>>>>>>>>
>>>>>>>>>>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
>>>>>>>>>>
>>>>>>>>>> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
>>>>>>>>>>
>>>>>>>>>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
>>>>>>>>>>
>>>>>>>>>> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.
>>>>>>>>>
>>>>>>>>> I think it's allowed because it expects you to hotplug function 0 later,
>>>>>>>>
>>>>>>>> This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.
>>>>>>>
>>>>>>> yes but if you later add a device with ARI and with next field pointing
>>>>>>> slot 2 guest will suddently find both.
>>>>>>
>>>>>> Hmm, I tried this:
>>>>>>
>>>>>> -device pcie-root-port,id=p \
>>>>>> -device igb,bus=p,addr=0x2.0x0 \
>>>>>> -device igb,bus=p,addr=0x0.0x0 \
>>>>>>
>>>>>> The guest only found the second igb device not the first. You can try too.
>>>>>
>>>>> Because next parameter in pcie_ari_init does not match.
>>>>
>>>> OK send me a command line that I can test it with. I can’t come up with a case that actually works in practice.
>>> I don't think there is one because the code for PCI multifunction does not care ARI. In my opinion, we need yet another check to make non-SR-IOV multifunction and ARI capability mutually exclusive; if a function has the ARI capability and it is not a VF, an attempt to assign non-zero function number for it should fail.
>>
>> No, the more straightforward way to fix this problem is to check the device function number is less than the next function number advertised with ARI.
>>
> 
> Personally I would leave the check for ARI capable devices to someone else. I am ok with moving the check to pci_qdev_realize() and I verified that both unit tests and the breaking test case for BZ 2128929 is caught. I have also verified that the change does not break igb vf generation.
> If there is any interest to push this change, I will spin a new version with tags for test fixes added and the rework for this patch with the check moved to the new location as you had suggested.

I sent a patch series to add ARI next function number check:
https://lore.kernel.org/qemu-devel/20230701070133.24877-1-akihiko.odaki@daynix.com/

Yes, I want the slot number restriction to be enforced. If it worries 
you too much for regressions, you may implement it as a warning first 
and then turn it a hard error when the next development phase starts.
Akihiko Odaki July 1, 2023, 7:28 a.m. UTC | #29
On 2023/07/01 0:29, Michael S. Tsirkin wrote:
> On Fri, Jun 30, 2023 at 08:36:38PM +0900, Akihiko Odaki wrote:
>> On 2023/06/30 19:37, Ani Sinha wrote:
>>>
>>>
>>>> On 30-Jun-2023, at 3:30 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>
>>>> On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:
>>>>>
>>>>>
>>>>>> On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>
>>>>>> On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>
>>>>>>>> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
>>>>>>>>>>
>>>>>>>>>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
>>>>>>>>>>
>>>>>>>>>>> Also where do you propose we move the check?
>>>>>>>>>>
>>>>>>>>>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
>>>>>>>>>
>>>>>>>>> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
>>>>>>>>>
>>>>>>>>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
>>>>>>>>>
>>>>>>>>> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.
>>>>>>>>
>>>>>>>> I think it's allowed because it expects you to hotplug function 0 later,
>>>>>>>
>>>>>>> This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.
>>>>>>
>>>>>> yes but if you later add a device with ARI and with next field pointing
>>>>>> slot 2 guest will suddently find both.
>>>>>
>>>>> Hmm, I tried this:
>>>>>
>>>>> -device pcie-root-port,id=p \
>>>>> -device igb,bus=p,addr=0x2.0x0 \
>>>>> -device igb,bus=p,addr=0x0.0x0 \
>>>>>
>>>>> The guest only found the second igb device not the first. You can try too.
>>>>
>>>> Because next parameter in pcie_ari_init does not match.
>>>
>>> OK send me a command line that I can test it with. I can’t come up with a case that actually works in practice.
>>
>> I don't think there is one because the code for PCI multifunction does not
>> care ARI. In my opinion, we need yet another check to make non-SR-IOV
>> multifunction and ARI capability mutually exclusive; if a function has the
>> ARI capability and it is not a VF, an attempt to assign non-zero function
>> number for it should fail.
> 
> Why is that? My understanding is that ARI capable devices should also
> set the multifunction bit in the header. It's not terribly clear from
> the spec though.

Something like the following will not work properly with ARI-capable 
device (think of a as an ARI-capable device):
-device a,addr=0x1.0x0,multifunction=on -device a,addr=0x1.0x1

This is because the next function numbers advertised with ARI are not 
updated with the multifunction configuration, but they are hardcoded in 
the device implementation. In this sense, the traditional (non-SR/IOV) 
multifunction mechanism QEMU has will not work with ARI-capable devices.

> 
>> But it should be a distinct check as it will need to check the function
>> number bits.
>>
>>>
>>>>
>>>>
>>>>>>
>>>>>>>> no?
>>>>>>>>
>>>>>>>> I am quite worried about all this work going into blocking
>>>>>>>> what we think is disallowed configurations. We should have
>>>>>>>> maybe blocked them originally, but now that we didn't
>>>>>>>> there's a non zero chance of regressions,
>>>>>>>
>>>>>>> Sigh,
>>>>>>
>>>>>> There's value in patches 1-4 I think - the last patch helped you find
>>>>>> these. so there's value in this work.
>>>>>>
>>>>>>> no medals here for being brave :-)
>>>>>>
>>>>>> Try removing support for a 3.5mm jack next. Oh wait ...
>>>>>
>>>>> Indeed. Everyone uses bluetooth these days. I for one is happy that the jack is gone (and they were bold enough to do it while Samsung and others still carry the useless port ) :-)
>>
>> Hello from a guy using a shiny M2 Macbook Air carrying the legacy jack with
>> a 100-yen earphone. Even people who ported Linux to this machine spent
>> efforts to get the jack to work on Linux ;)
>>
>>>>>
>>>>>>
>>>>>>>> and the benefit
>>>>>>>> is not guaranteed.
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> MST
>>>
>
Michael S. Tsirkin July 2, 2023, 4:59 a.m. UTC | #30
On Sat, Jul 01, 2023 at 04:09:31PM +0900, Akihiko Odaki wrote:
> Yes, I want the slot number restriction to be enforced. If it worries you
> too much for regressions, you may implement it as a warning first and then
> turn it a hard error when the next development phase starts.

That's not a bad idea.
Ani Sinha July 3, 2023, 6:08 a.m. UTC | #31
> On 02-Jul-2023, at 10:29 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Sat, Jul 01, 2023 at 04:09:31PM +0900, Akihiko Odaki wrote:
>> Yes, I want the slot number restriction to be enforced. If it worries you
>> too much for regressions, you may implement it as a warning first and then
>> turn it a hard error when the next development phase starts.
> 
> That's not a bad idea.

If we had not enforced the check strongly, the tests that we fixed would not get noticed.
Ani Sinha July 3, 2023, 1:31 p.m. UTC | #32
> On 02-Jul-2023, at 10:29 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Sat, Jul 01, 2023 at 04:09:31PM +0900, Akihiko Odaki wrote:
>> Yes, I want the slot number restriction to be enforced. If it worries you
>> too much for regressions, you may implement it as a warning first and then
>> turn it a hard error when the next development phase starts.
> 
> That's not a bad idea.

This is with just the warning - the device still gets added:

(qemu) netdev_add socket,id=hostnet1,listen=:1234
(qemu)  device_add e1000e,netdev=hostnet1,mac=00:11:22:33:44:03,id=net1,bus=pci.6,addr=0x2.0x5
warning: PCI: slot 2 is not valid for e1000e, parent device only allows plugging into slot 0.
(qemu) info network
igb.0: index=0,type=nic,model=igb,macaddr=52:54:00:12:34:56
igb.1: index=0,type=nic,model=igb,macaddr=52:54:00:12:34:57
net1: index=0,type=nic,model=e1000e,macaddr=00:11:22:33:44:03
 \ hostnet1: index=0,type=socket,

device_remove won’t be able to remove the nic unless guest cooperates.
Akihiko Odaki July 4, 2023, 5:01 a.m. UTC | #33
On 2023/07/03 15:08, Ani Sinha wrote:
> 
> 
>> On 02-Jul-2023, at 10:29 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>
>> On Sat, Jul 01, 2023 at 04:09:31PM +0900, Akihiko Odaki wrote:
>>> Yes, I want the slot number restriction to be enforced. If it worries you
>>> too much for regressions, you may implement it as a warning first and then
>>> turn it a hard error when the next development phase starts.
>>
>> That's not a bad idea.
> 
> If we had not enforced the check strongly, the tests that we fixed would not get noticed.
> 

Perhaps so, but we don't have much time before feature freeze. I rather 
want to see the check implemented as warning in 8.1 instead of delaying 
the initial implementation of the check after 8.1 (though I worry if 
it's already too late for 8.1.)
Ani Sinha July 4, 2023, 5:39 a.m. UTC | #34
> On 04-Jul-2023, at 10:31 AM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> 
> On 2023/07/03 15:08, Ani Sinha wrote:
>>> On 02-Jul-2023, at 10:29 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> 
>>> On Sat, Jul 01, 2023 at 04:09:31PM +0900, Akihiko Odaki wrote:
>>>> Yes, I want the slot number restriction to be enforced. If it worries you
>>>> too much for regressions, you may implement it as a warning first and then
>>>> turn it a hard error when the next development phase starts.
>>> 
>>> That's not a bad idea.
>> If we had not enforced the check strongly, the tests that we fixed would not get noticed.
> 
> Perhaps so, but we don't have much time before feature freeze. I rather want to see the check implemented as warning in 8.1 instead of delaying the initial implementation of the check after 8.1 (though I worry if it's already too late for 8.1.)

The feature hard freeze window starts from 12th of next week. So I am still debating whether to keep the hard check or just have a warning. If the hard check causes regressions, we can always revert it to a warning later.
Ani Sinha July 4, 2023, 10:33 a.m. UTC | #35
> On 04-Jul-2023, at 11:09 AM, Ani Sinha <anisinha@redhat.com> wrote:
> 
> 
> 
>> On 04-Jul-2023, at 10:31 AM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>> 
>> On 2023/07/03 15:08, Ani Sinha wrote:
>>>> On 02-Jul-2023, at 10:29 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>> 
>>>> On Sat, Jul 01, 2023 at 04:09:31PM +0900, Akihiko Odaki wrote:
>>>>> Yes, I want the slot number restriction to be enforced. If it worries you
>>>>> too much for regressions, you may implement it as a warning first and then
>>>>> turn it a hard error when the next development phase starts.
>>>> 
>>>> That's not a bad idea.
>>> If we had not enforced the check strongly, the tests that we fixed would not get noticed.
>> 
>> Perhaps so, but we don't have much time before feature freeze. I rather want to see the check implemented as warning in 8.1 instead of delaying the initial implementation of the check after 8.1 (though I worry if it's already too late for 8.1.)
> 
> The feature hard freeze window starts from 12th of next week. So I am still debating whether to keep the hard check or just have a warning. If the hard check causes regressions, we can always revert it to a warning later.

mst?
Michael S. Tsirkin July 4, 2023, 10:36 a.m. UTC | #36
On Tue, Jul 04, 2023 at 04:03:54PM +0530, Ani Sinha wrote:
> 
> 
> > On 04-Jul-2023, at 11:09 AM, Ani Sinha <anisinha@redhat.com> wrote:
> > 
> > 
> > 
> >> On 04-Jul-2023, at 10:31 AM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> >> 
> >> On 2023/07/03 15:08, Ani Sinha wrote:
> >>>> On 02-Jul-2023, at 10:29 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>> 
> >>>> On Sat, Jul 01, 2023 at 04:09:31PM +0900, Akihiko Odaki wrote:
> >>>>> Yes, I want the slot number restriction to be enforced. If it worries you
> >>>>> too much for regressions, you may implement it as a warning first and then
> >>>>> turn it a hard error when the next development phase starts.
> >>>> 
> >>>> That's not a bad idea.
> >>> If we had not enforced the check strongly, the tests that we fixed would not get noticed.
> >> 
> >> Perhaps so, but we don't have much time before feature freeze. I rather want to see the check implemented as warning in 8.1 instead of delaying the initial implementation of the check after 8.1 (though I worry if it's already too late for 8.1.)
> > 
> > The feature hard freeze window starts from 12th of next week. So I am still debating whether to keep the hard check or just have a warning. If the hard check causes regressions, we can always revert it to a warning later.
> 
> mst?

I'd go for a warning now. Let's see what triggers for users without
actually breaking things too badly for them.
Ani Sinha July 4, 2023, 11:10 a.m. UTC | #37
> On 04-Jul-2023, at 4:06 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Tue, Jul 04, 2023 at 04:03:54PM +0530, Ani Sinha wrote:
>> 
>> 
>>> On 04-Jul-2023, at 11:09 AM, Ani Sinha <anisinha@redhat.com> wrote:
>>> 
>>> 
>>> 
>>>> On 04-Jul-2023, at 10:31 AM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>> 
>>>> On 2023/07/03 15:08, Ani Sinha wrote:
>>>>>> On 02-Jul-2023, at 10:29 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>> 
>>>>>> On Sat, Jul 01, 2023 at 04:09:31PM +0900, Akihiko Odaki wrote:
>>>>>>> Yes, I want the slot number restriction to be enforced. If it worries you
>>>>>>> too much for regressions, you may implement it as a warning first and then
>>>>>>> turn it a hard error when the next development phase starts.
>>>>>> 
>>>>>> That's not a bad idea.
>>>>> If we had not enforced the check strongly, the tests that we fixed would not get noticed.
>>>> 
>>>> Perhaps so, but we don't have much time before feature freeze. I rather want to see the check implemented as warning in 8.1 instead of delaying the initial implementation of the check after 8.1 (though I worry if it's already too late for 8.1.)
>>> 
>>> The feature hard freeze window starts from 12th of next week. So I am still debating whether to keep the hard check or just have a warning. If the hard check causes regressions, we can always revert it to a warning later.
>> 
>> mst?
> 
> I'd go for a warning now. Let's see what triggers for users without
> actually breaking things too badly for them.

🤦🏻
Igor Mammedov July 4, 2023, 11:38 a.m. UTC | #38
On Sat, 1 Jul 2023 16:28:30 +0900
Akihiko Odaki <akihiko.odaki@daynix.com> wrote:

> On 2023/07/01 0:29, Michael S. Tsirkin wrote:
> > On Fri, Jun 30, 2023 at 08:36:38PM +0900, Akihiko Odaki wrote:  
> >> On 2023/06/30 19:37, Ani Sinha wrote:  
> >>>
> >>>  
> >>>> On 30-Jun-2023, at 3:30 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>>
> >>>> On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:  
> >>>>>
> >>>>>  
> >>>>>> On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>>>>
> >>>>>> On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:  
> >>>>>>>
> >>>>>>>  
> >>>>>>>> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>>>>>>
> >>>>>>>> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:  
> >>>>>>>>>>
> >>>>>>>>>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
> >>>>>>>>>>  
> >>>>>>>>>>> Also where do you propose we move the check?  
> >>>>>>>>>>
> >>>>>>>>>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.  
> >>>>>>>>>
> >>>>>>>>> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
> >>>>>>>>>
> >>>>>>>>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
> >>>>>>>>>
> >>>>>>>>> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.  
> >>>>>>>>
> >>>>>>>> I think it's allowed because it expects you to hotplug function 0 later,  
> >>>>>>>
> >>>>>>> This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.  
> >>>>>>
> >>>>>> yes but if you later add a device with ARI and with next field pointing
> >>>>>> slot 2 guest will suddently find both.  
> >>>>>
> >>>>> Hmm, I tried this:
> >>>>>
> >>>>> -device pcie-root-port,id=p \
> >>>>> -device igb,bus=p,addr=0x2.0x0 \
> >>>>> -device igb,bus=p,addr=0x0.0x0 \
> >>>>>
> >>>>> The guest only found the second igb device not the first. You can try too.  
> >>>>
> >>>> Because next parameter in pcie_ari_init does not match.  
> >>>
> >>> OK send me a command line that I can test it with. I can’t come up with a case that actually works in practice.  
> >>
> >> I don't think there is one because the code for PCI multifunction does not
> >> care ARI. In my opinion, we need yet another check to make non-SR-IOV
> >> multifunction and ARI capability mutually exclusive; if a function has the
> >> ARI capability and it is not a VF, an attempt to assign non-zero function
> >> number for it should fail.  

is it stated somewhere in spec(s) that ARI and !SR-IOV are mutually exclusive?

> > 
> > Why is that? My understanding is that ARI capable devices should also
> > set the multifunction bit in the header. It's not terribly clear from
> > the spec though.  
> 
> Something like the following will not work properly with ARI-capable 
> device (think of a as an ARI-capable device):
> -device a,addr=0x1.0x0,multifunction=on -device a,addr=0x1.0x1
(I had a crazy idea, to use it like that so we could put more devices
on port without resorting to adding extra bridges)

Can you elaborate some more why it won't work?

> This is because the next function numbers advertised with ARI are not 
> updated with the multifunction configuration, but they are hardcoded in 
> the device implementation. In this sense, the traditional (non-SR/IOV) 
> multifunction mechanism QEMU has will not work with ARI-capable devices.
> 
> >   
> >> But it should be a distinct check as it will need to check the function
> >> number bits.
> >>  
> >>>  
> >>>>
> >>>>  
> >>>>>>  
> >>>>>>>> no?
> >>>>>>>>
> >>>>>>>> I am quite worried about all this work going into blocking
> >>>>>>>> what we think is disallowed configurations. We should have
> >>>>>>>> maybe blocked them originally, but now that we didn't
> >>>>>>>> there's a non zero chance of regressions,  
> >>>>>>>
> >>>>>>> Sigh,  
> >>>>>>
> >>>>>> There's value in patches 1-4 I think - the last patch helped you find
> >>>>>> these. so there's value in this work.
> >>>>>>  
> >>>>>>> no medals here for being brave :-)  
> >>>>>>
> >>>>>> Try removing support for a 3.5mm jack next. Oh wait ...  
> >>>>>
> >>>>> Indeed. Everyone uses bluetooth these days. I for one is happy that the jack is gone (and they were bold enough to do it while Samsung and others still carry the useless port ) :-)  
> >>
> >> Hello from a guy using a shiny M2 Macbook Air carrying the legacy jack with
> >> a 100-yen earphone. Even people who ported Linux to this machine spent
> >> efforts to get the jack to work on Linux ;)
> >>  
> >>>>>  
> >>>>>>  
> >>>>>>>> and the benefit
> >>>>>>>> is not guaranteed.
> >>>>>>>>
> >>>>>>>> -- 
> >>>>>>>> MST  
> >>>  
> >   
>
Akihiko Odaki July 4, 2023, 11:50 a.m. UTC | #39
On 2023/07/04 20:38, Igor Mammedov wrote:
> On Sat, 1 Jul 2023 16:28:30 +0900
> Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> 
>> On 2023/07/01 0:29, Michael S. Tsirkin wrote:
>>> On Fri, Jun 30, 2023 at 08:36:38PM +0900, Akihiko Odaki wrote:
>>>> On 2023/06/30 19:37, Ani Sinha wrote:
>>>>>
>>>>>   
>>>>>> On 30-Jun-2023, at 3:30 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>
>>>>>> On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:
>>>>>>>
>>>>>>>   
>>>>>>>> On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>
>>>>>>>> On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:
>>>>>>>>>
>>>>>>>>>   
>>>>>>>>>> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
>>>>>>>>>>>>   
>>>>>>>>>>>>> Also where do you propose we move the check?
>>>>>>>>>>>>
>>>>>>>>>>>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
>>>>>>>>>>>
>>>>>>>>>>> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
>>>>>>>>>>>
>>>>>>>>>>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
>>>>>>>>>>>
>>>>>>>>>>> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.
>>>>>>>>>>
>>>>>>>>>> I think it's allowed because it expects you to hotplug function 0 later,
>>>>>>>>>
>>>>>>>>> This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.
>>>>>>>>
>>>>>>>> yes but if you later add a device with ARI and with next field pointing
>>>>>>>> slot 2 guest will suddently find both.
>>>>>>>
>>>>>>> Hmm, I tried this:
>>>>>>>
>>>>>>> -device pcie-root-port,id=p \
>>>>>>> -device igb,bus=p,addr=0x2.0x0 \
>>>>>>> -device igb,bus=p,addr=0x0.0x0 \
>>>>>>>
>>>>>>> The guest only found the second igb device not the first. You can try too.
>>>>>>
>>>>>> Because next parameter in pcie_ari_init does not match.
>>>>>
>>>>> OK send me a command line that I can test it with. I can’t come up with a case that actually works in practice.
>>>>
>>>> I don't think there is one because the code for PCI multifunction does not
>>>> care ARI. In my opinion, we need yet another check to make non-SR-IOV
>>>> multifunction and ARI capability mutually exclusive; if a function has the
>>>> ARI capability and it is not a VF, an attempt to assign non-zero function
>>>> number for it should fail.
> 
> is it stated somewhere in spec(s) that ARI and !SR-IOV are mutually exclusive?
> 
>>>
>>> Why is that? My understanding is that ARI capable devices should also
>>> set the multifunction bit in the header. It's not terribly clear from
>>> the spec though.
>>
>> Something like the following will not work properly with ARI-capable
>> device (think of a as an ARI-capable device):
>> -device a,addr=0x1.0x0,multifunction=on -device a,addr=0x1.0x1
> (I had a crazy idea, to use it like that so we could put more devices
> on port without resorting to adding extra bridges)
> 
> Can you elaborate some more why it won't work?

It won't work because the ARI next function number field is fixed. In 
this case, the field of the Function at 0x1.0x0 should point to 0x1.0x1, 
but it doesn't. As the result, the Function at 0x1.0x1 won't be recognized.

It's more problematic if some of the Functions are ARI-capable but 
others are not. In my understanding, all Functions in a ARI-capable 
device need to have ARI capability, but that's not enforced.

> 
>> This is because the next function numbers advertised with ARI are not
>> updated with the multifunction configuration, but they are hardcoded in
>> the device implementation. In this sense, the traditional (non-SR/IOV)
>> multifunction mechanism QEMU has will not work with ARI-capable devices.
>>
>>>    
>>>> But it should be a distinct check as it will need to check the function
>>>> number bits.
>>>>   
>>>>>   
>>>>>>
>>>>>>   
>>>>>>>>   
>>>>>>>>>> no?
>>>>>>>>>>
>>>>>>>>>> I am quite worried about all this work going into blocking
>>>>>>>>>> what we think is disallowed configurations. We should have
>>>>>>>>>> maybe blocked them originally, but now that we didn't
>>>>>>>>>> there's a non zero chance of regressions,
>>>>>>>>>
>>>>>>>>> Sigh,
>>>>>>>>
>>>>>>>> There's value in patches 1-4 I think - the last patch helped you find
>>>>>>>> these. so there's value in this work.
>>>>>>>>   
>>>>>>>>> no medals here for being brave :-)
>>>>>>>>
>>>>>>>> Try removing support for a 3.5mm jack next. Oh wait ...
>>>>>>>
>>>>>>> Indeed. Everyone uses bluetooth these days. I for one is happy that the jack is gone (and they were bold enough to do it while Samsung and others still carry the useless port ) :-)
>>>>
>>>> Hello from a guy using a shiny M2 Macbook Air carrying the legacy jack with
>>>> a 100-yen earphone. Even people who ported Linux to this machine spent
>>>> efforts to get the jack to work on Linux ;)
>>>>   
>>>>>>>   
>>>>>>>>   
>>>>>>>>>> and the benefit
>>>>>>>>>> is not guaranteed.
>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>> MST
>>>>>   
>>>    
>>
>
Igor Mammedov July 4, 2023, 12:36 p.m. UTC | #40
On Tue, 4 Jul 2023 20:50:49 +0900
Akihiko Odaki <akihiko.odaki@daynix.com> wrote:

> On 2023/07/04 20:38, Igor Mammedov wrote:
> > On Sat, 1 Jul 2023 16:28:30 +0900
> > Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> >   
> >> On 2023/07/01 0:29, Michael S. Tsirkin wrote:  
> >>> On Fri, Jun 30, 2023 at 08:36:38PM +0900, Akihiko Odaki wrote:  
> >>>> On 2023/06/30 19:37, Ani Sinha wrote:  
> >>>>>
> >>>>>     
> >>>>>> On 30-Jun-2023, at 3:30 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>>>>
> >>>>>> On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:  
> >>>>>>>
> >>>>>>>     
> >>>>>>>> On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>>>>>>
> >>>>>>>> On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:  
> >>>>>>>>>
> >>>>>>>>>     
> >>>>>>>>>> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:  
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
> >>>>>>>>>>>>     
> >>>>>>>>>>>>> Also where do you propose we move the check?  
> >>>>>>>>>>>>
> >>>>>>>>>>>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.  
> >>>>>>>>>>>
> >>>>>>>>>>> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
> >>>>>>>>>>>
> >>>>>>>>>>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
> >>>>>>>>>>>
> >>>>>>>>>>> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.  
> >>>>>>>>>>
> >>>>>>>>>> I think it's allowed because it expects you to hotplug function 0 later,  
> >>>>>>>>>
> >>>>>>>>> This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.  
> >>>>>>>>
> >>>>>>>> yes but if you later add a device with ARI and with next field pointing
> >>>>>>>> slot 2 guest will suddently find both.  
> >>>>>>>
> >>>>>>> Hmm, I tried this:
> >>>>>>>
> >>>>>>> -device pcie-root-port,id=p \
> >>>>>>> -device igb,bus=p,addr=0x2.0x0 \
> >>>>>>> -device igb,bus=p,addr=0x0.0x0 \
> >>>>>>>
> >>>>>>> The guest only found the second igb device not the first. You can try too.  
> >>>>>>
> >>>>>> Because next parameter in pcie_ari_init does not match.  
> >>>>>
> >>>>> OK send me a command line that I can test it with. I can’t come up with a case that actually works in practice.  
> >>>>
> >>>> I don't think there is one because the code for PCI multifunction does not
> >>>> care ARI. In my opinion, we need yet another check to make non-SR-IOV
> >>>> multifunction and ARI capability mutually exclusive; if a function has the
> >>>> ARI capability and it is not a VF, an attempt to assign non-zero function
> >>>> number for it should fail.  
> > 
> > is it stated somewhere in spec(s) that ARI and !SR-IOV are mutually exclusive?
> >   
> >>>
> >>> Why is that? My understanding is that ARI capable devices should also
> >>> set the multifunction bit in the header. It's not terribly clear from
> >>> the spec though.  
> >>
> >> Something like the following will not work properly with ARI-capable
> >> device (think of a as an ARI-capable device):
> >> -device a,addr=0x1.0x0,multifunction=on -device a,addr=0x1.0x1  
> > (I had a crazy idea, to use it like that so we could put more devices
> > on port without resorting to adding extra bridges)
> > 
> > Can you elaborate some more why it won't work?  
> 
> It won't work because the ARI next function number field is fixed. In 
> this case, the field of the Function at 0x1.0x0 should point to 0x1.0x1, 
> but it doesn't. As the result, the Function at 0x1.0x1 won't be recognized.
> 
> It's more problematic if some of the Functions are ARI-capable but 
> others are not. In my understanding, all Functions in a ARI-capable 
> device need to have ARI capability, but that's not enforced.

that doesn't look to me as an real issue but rather as
a QEMU problem that we could fix and handle it properly.

> >> This is because the next function numbers advertised with ARI are not
> >> updated with the multifunction configuration, but they are hardcoded in
> >> the device implementation. In this sense, the traditional (non-SR/IOV)
> >> multifunction mechanism QEMU has will not work with ARI-capable devices.
> >>  
> >>>      
> >>>> But it should be a distinct check as it will need to check the function
> >>>> number bits.
> >>>>     
> >>>>>     
> >>>>>>
> >>>>>>     
> >>>>>>>>     
> >>>>>>>>>> no?
> >>>>>>>>>>
> >>>>>>>>>> I am quite worried about all this work going into blocking
> >>>>>>>>>> what we think is disallowed configurations. We should have
> >>>>>>>>>> maybe blocked them originally, but now that we didn't
> >>>>>>>>>> there's a non zero chance of regressions,  
> >>>>>>>>>
> >>>>>>>>> Sigh,  
> >>>>>>>>
> >>>>>>>> There's value in patches 1-4 I think - the last patch helped you find
> >>>>>>>> these. so there's value in this work.
> >>>>>>>>     
> >>>>>>>>> no medals here for being brave :-)  
> >>>>>>>>
> >>>>>>>> Try removing support for a 3.5mm jack next. Oh wait ...  
> >>>>>>>
> >>>>>>> Indeed. Everyone uses bluetooth these days. I for one is happy that the jack is gone (and they were bold enough to do it while Samsung and others still carry the useless port ) :-)  
> >>>>
> >>>> Hello from a guy using a shiny M2 Macbook Air carrying the legacy jack with
> >>>> a 100-yen earphone. Even people who ported Linux to this machine spent
> >>>> efforts to get the jack to work on Linux ;)
> >>>>     
> >>>>>>>     
> >>>>>>>>     
> >>>>>>>>>> and the benefit
> >>>>>>>>>> is not guaranteed.
> >>>>>>>>>>
> >>>>>>>>>> -- 
> >>>>>>>>>> MST  
> >>>>>     
> >>>      
> >>  
> >   
>
Akihiko Odaki July 4, 2023, 12:51 p.m. UTC | #41
On 2023/07/04 21:36, Igor Mammedov wrote:
> On Tue, 4 Jul 2023 20:50:49 +0900
> Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> 
>> On 2023/07/04 20:38, Igor Mammedov wrote:
>>> On Sat, 1 Jul 2023 16:28:30 +0900
>>> Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>    
>>>> On 2023/07/01 0:29, Michael S. Tsirkin wrote:
>>>>> On Fri, Jun 30, 2023 at 08:36:38PM +0900, Akihiko Odaki wrote:
>>>>>> On 2023/06/30 19:37, Ani Sinha wrote:
>>>>>>>
>>>>>>>      
>>>>>>>> On 30-Jun-2023, at 3:30 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>
>>>>>>>> On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:
>>>>>>>>>
>>>>>>>>>      
>>>>>>>>>> On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:
>>>>>>>>>>>
>>>>>>>>>>>      
>>>>>>>>>>>> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
>>>>>>>>>>>>>>      
>>>>>>>>>>>>>>> Also where do you propose we move the check?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
>>>>>>>>>>>>>
>>>>>>>>>>>>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
>>>>>>>>>>>>>
>>>>>>>>>>>>> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.
>>>>>>>>>>>>
>>>>>>>>>>>> I think it's allowed because it expects you to hotplug function 0 later,
>>>>>>>>>>>
>>>>>>>>>>> This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.
>>>>>>>>>>
>>>>>>>>>> yes but if you later add a device with ARI and with next field pointing
>>>>>>>>>> slot 2 guest will suddently find both.
>>>>>>>>>
>>>>>>>>> Hmm, I tried this:
>>>>>>>>>
>>>>>>>>> -device pcie-root-port,id=p \
>>>>>>>>> -device igb,bus=p,addr=0x2.0x0 \
>>>>>>>>> -device igb,bus=p,addr=0x0.0x0 \
>>>>>>>>>
>>>>>>>>> The guest only found the second igb device not the first. You can try too.
>>>>>>>>
>>>>>>>> Because next parameter in pcie_ari_init does not match.
>>>>>>>
>>>>>>> OK send me a command line that I can test it with. I can’t come up with a case that actually works in practice.
>>>>>>
>>>>>> I don't think there is one because the code for PCI multifunction does not
>>>>>> care ARI. In my opinion, we need yet another check to make non-SR-IOV
>>>>>> multifunction and ARI capability mutually exclusive; if a function has the
>>>>>> ARI capability and it is not a VF, an attempt to assign non-zero function
>>>>>> number for it should fail.
>>>
>>> is it stated somewhere in spec(s) that ARI and !SR-IOV are mutually exclusive?
>>>    
>>>>>
>>>>> Why is that? My understanding is that ARI capable devices should also
>>>>> set the multifunction bit in the header. It's not terribly clear from
>>>>> the spec though.
>>>>
>>>> Something like the following will not work properly with ARI-capable
>>>> device (think of a as an ARI-capable device):
>>>> -device a,addr=0x1.0x0,multifunction=on -device a,addr=0x1.0x1
>>> (I had a crazy idea, to use it like that so we could put more devices
>>> on port without resorting to adding extra bridges)
>>>
>>> Can you elaborate some more why it won't work?
>>
>> It won't work because the ARI next function number field is fixed. In
>> this case, the field of the Function at 0x1.0x0 should point to 0x1.0x1,
>> but it doesn't. As the result, the Function at 0x1.0x1 won't be recognized.
>>
>> It's more problematic if some of the Functions are ARI-capable but
>> others are not. In my understanding, all Functions in a ARI-capable
>> device need to have ARI capability, but that's not enforced.
> 
> that doesn't look to me as an real issue but rather as
> a QEMU problem that we could fix and handle it properly.

Yes, it's certainly possible to fix QEMU so that non-SR/IOV 
multifunction works with ARI-capable Functions.

> 
>>>> This is because the next function numbers advertised with ARI are not
>>>> updated with the multifunction configuration, but they are hardcoded in
>>>> the device implementation. In this sense, the traditional (non-SR/IOV)
>>>> multifunction mechanism QEMU has will not work with ARI-capable devices.
>>>>   
>>>>>       
>>>>>> But it should be a distinct check as it will need to check the function
>>>>>> number bits.
>>>>>>      
>>>>>>>      
>>>>>>>>
>>>>>>>>      
>>>>>>>>>>      
>>>>>>>>>>>> no?
>>>>>>>>>>>>
>>>>>>>>>>>> I am quite worried about all this work going into blocking
>>>>>>>>>>>> what we think is disallowed configurations. We should have
>>>>>>>>>>>> maybe blocked them originally, but now that we didn't
>>>>>>>>>>>> there's a non zero chance of regressions,
>>>>>>>>>>>
>>>>>>>>>>> Sigh,
>>>>>>>>>>
>>>>>>>>>> There's value in patches 1-4 I think - the last patch helped you find
>>>>>>>>>> these. so there's value in this work.
>>>>>>>>>>      
>>>>>>>>>>> no medals here for being brave :-)
>>>>>>>>>>
>>>>>>>>>> Try removing support for a 3.5mm jack next. Oh wait ...
>>>>>>>>>
>>>>>>>>> Indeed. Everyone uses bluetooth these days. I for one is happy that the jack is gone (and they were bold enough to do it while Samsung and others still carry the useless port ) :-)
>>>>>>
>>>>>> Hello from a guy using a shiny M2 Macbook Air carrying the legacy jack with
>>>>>> a 100-yen earphone. Even people who ported Linux to this machine spent
>>>>>> efforts to get the jack to work on Linux ;)
>>>>>>      
>>>>>>>>>      
>>>>>>>>>>      
>>>>>>>>>>>> and the benefit
>>>>>>>>>>>> is not guaranteed.
>>>>>>>>>>>>
>>>>>>>>>>>> -- 
>>>>>>>>>>>> MST
>>>>>>>      
>>>>>       
>>>>   
>>>    
>>
>
Ani Sinha July 4, 2023, 2:03 p.m. UTC | #42
> On 04-Jul-2023, at 5:20 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> 
> On 2023/07/04 20:38, Igor Mammedov wrote:
>> On Sat, 1 Jul 2023 16:28:30 +0900
>> Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>> On 2023/07/01 0:29, Michael S. Tsirkin wrote:
>>>> On Fri, Jun 30, 2023 at 08:36:38PM +0900, Akihiko Odaki wrote:
>>>>> On 2023/06/30 19:37, Ani Sinha wrote:
>>>>>> 
>>>>>>  
>>>>>>> On 30-Jun-2023, at 3:30 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>> 
>>>>>>> On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:
>>>>>>>> 
>>>>>>>>  
>>>>>>>>> On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>> 
>>>>>>>>> On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:
>>>>>>>>>> 
>>>>>>>>>>  
>>>>>>>>>>> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
>>>>>>>>>>>>>  
>>>>>>>>>>>>>> Also where do you propose we move the check?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
>>>>>>>>>>>> 
>>>>>>>>>>>> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
>>>>>>>>>>>> 
>>>>>>>>>>>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
>>>>>>>>>>>> 
>>>>>>>>>>>> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.
>>>>>>>>>>> 
>>>>>>>>>>> I think it's allowed because it expects you to hotplug function 0 later,
>>>>>>>>>> 
>>>>>>>>>> This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.
>>>>>>>>> 
>>>>>>>>> yes but if you later add a device with ARI and with next field pointing
>>>>>>>>> slot 2 guest will suddently find both.
>>>>>>>> 
>>>>>>>> Hmm, I tried this:
>>>>>>>> 
>>>>>>>> -device pcie-root-port,id=p \
>>>>>>>> -device igb,bus=p,addr=0x2.0x0 \
>>>>>>>> -device igb,bus=p,addr=0x0.0x0 \
>>>>>>>> 
>>>>>>>> The guest only found the second igb device not the first. You can try too.
>>>>>>> 
>>>>>>> Because next parameter in pcie_ari_init does not match.
>>>>>> 
>>>>>> OK send me a command line that I can test it with. I can’t come up with a case that actually works in practice.
>>>>> 
>>>>> I don't think there is one because the code for PCI multifunction does not
>>>>> care ARI. In my opinion, we need yet another check to make non-SR-IOV
>>>>> multifunction and ARI capability mutually exclusive; if a function has the
>>>>> ARI capability and it is not a VF, an attempt to assign non-zero function
>>>>> number for it should fail.
>> is it stated somewhere in spec(s) that ARI and !SR-IOV are mutually exclusive?
>>>> 
>>>> Why is that? My understanding is that ARI capable devices should also
>>>> set the multifunction bit in the header. It's not terribly clear from
>>>> the spec though.
>>> 
>>> Something like the following will not work properly with ARI-capable
>>> device (think of a as an ARI-capable device):
>>> -device a,addr=0x1.0x0,multifunction=on -device a,addr=0x1.0x1
>> (I had a crazy idea, to use it like that so we could put more devices
>> on port without resorting to adding extra bridges)
>> Can you elaborate some more why it won't work?
> 
> It won't work because the ARI next function number field is fixed. In this case, the field of the Function at 0x1.0x0 should point to 0x1.0x1, but it doesn’t.

Where does it point to in this case then? 0x1.0x2 becasue of the stride?

> As the result, the Function at 0x1.0x1 won't be recognized.
> 
> It's more problematic if some of the Functions are ARI-capable but others are not. In my understanding, all Functions in a ARI-capable device need to have ARI capability, but that's not enforced.
> 
>>> This is because the next function numbers advertised with ARI are not
>>> updated with the multifunction configuration, but they are hardcoded in
>>> the device implementation. In this sense, the traditional (non-SR/IOV)
>>> multifunction mechanism QEMU has will not work with ARI-capable devices.
>>> 
>>>>   
>>>>> But it should be a distinct check as it will need to check the function
>>>>> number bits.
>>>>>  
>>>>>>  
>>>>>>> 
>>>>>>>  
>>>>>>>>>  
>>>>>>>>>>> no?
>>>>>>>>>>> 
>>>>>>>>>>> I am quite worried about all this work going into blocking
>>>>>>>>>>> what we think is disallowed configurations. We should have
>>>>>>>>>>> maybe blocked them originally, but now that we didn't
>>>>>>>>>>> there's a non zero chance of regressions,
>>>>>>>>>> 
>>>>>>>>>> Sigh,
>>>>>>>>> 
>>>>>>>>> There's value in patches 1-4 I think - the last patch helped you find
>>>>>>>>> these. so there's value in this work.
>>>>>>>>>  
>>>>>>>>>> no medals here for being brave :-)
>>>>>>>>> 
>>>>>>>>> Try removing support for a 3.5mm jack next. Oh wait ...
>>>>>>>> 
>>>>>>>> Indeed. Everyone uses bluetooth these days. I for one is happy that the jack is gone (and they were bold enough to do it while Samsung and others still carry the useless port ) :-)
>>>>> 
>>>>> Hello from a guy using a shiny M2 Macbook Air carrying the legacy jack with
>>>>> a 100-yen earphone. Even people who ported Linux to this machine spent
>>>>> efforts to get the jack to work on Linux ;)
>>>>>  
>>>>>>>>  
>>>>>>>>>  
>>>>>>>>>>> and the benefit
>>>>>>>>>>> is not guaranteed.
>>>>>>>>>>> 
>>>>>>>>>>> -- 
>>>>>>>>>>> MST
Akihiko Odaki July 5, 2023, 2:30 a.m. UTC | #43
On 2023/07/04 23:03, Ani Sinha wrote:
> 
> 
>> On 04-Jul-2023, at 5:20 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>
>> On 2023/07/04 20:38, Igor Mammedov wrote:
>>> On Sat, 1 Jul 2023 16:28:30 +0900
>>> Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>> On 2023/07/01 0:29, Michael S. Tsirkin wrote:
>>>>> On Fri, Jun 30, 2023 at 08:36:38PM +0900, Akihiko Odaki wrote:
>>>>>> On 2023/06/30 19:37, Ani Sinha wrote:
>>>>>>>
>>>>>>>   
>>>>>>>> On 30-Jun-2023, at 3:30 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>
>>>>>>>> On Fri, Jun 30, 2023 at 02:52:52PM +0530, Ani Sinha wrote:
>>>>>>>>>
>>>>>>>>>   
>>>>>>>>>> On 30-Jun-2023, at 2:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 30, 2023 at 02:06:59PM +0530, Ani Sinha wrote:
>>>>>>>>>>>
>>>>>>>>>>>   
>>>>>>>>>>>> On 30-Jun-2023, at 2:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jun 30, 2023 at 01:11:33PM +0530, Ani Sinha wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thus the check for unoccupied function 0 needs to use pci_is_vf() instead of checking ARI capability, and that can happen in do_pci_register_device().
>>>>>>>>>>>>>>   
>>>>>>>>>>>>>>> Also where do you propose we move the check?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In pci_qdev_realize(), somewhere after pc->realize() and before option ROM loading.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hmm, I tried this. The issue here is something like this would be now allowed since the PF has ARI capability:
>>>>>>>>>>>>>
>>>>>>>>>>>>> -device pcie-root-port,id=p -device igb,bus=p,addr=0x2.0x0
>>>>>>>>>>>>>
>>>>>>>>>>>>> The above should not be allowed and when used, we do not see the igb ethernet device from the guest OS.
>>>>>>>>>>>>
>>>>>>>>>>>> I think it's allowed because it expects you to hotplug function 0 later,
>>>>>>>>>>>
>>>>>>>>>>> This is about the igb device being plugged into the non-zero slot of the pci-root-port. The guest OS ignores it.
>>>>>>>>>>
>>>>>>>>>> yes but if you later add a device with ARI and with next field pointing
>>>>>>>>>> slot 2 guest will suddently find both.
>>>>>>>>>
>>>>>>>>> Hmm, I tried this:
>>>>>>>>>
>>>>>>>>> -device pcie-root-port,id=p \
>>>>>>>>> -device igb,bus=p,addr=0x2.0x0 \
>>>>>>>>> -device igb,bus=p,addr=0x0.0x0 \
>>>>>>>>>
>>>>>>>>> The guest only found the second igb device not the first. You can try too.
>>>>>>>>
>>>>>>>> Because next parameter in pcie_ari_init does not match.
>>>>>>>
>>>>>>> OK send me a command line that I can test it with. I can’t come up with a case that actually works in practice.
>>>>>>
>>>>>> I don't think there is one because the code for PCI multifunction does not
>>>>>> care ARI. In my opinion, we need yet another check to make non-SR-IOV
>>>>>> multifunction and ARI capability mutually exclusive; if a function has the
>>>>>> ARI capability and it is not a VF, an attempt to assign non-zero function
>>>>>> number for it should fail.
>>> is it stated somewhere in spec(s) that ARI and !SR-IOV are mutually exclusive?
>>>>>
>>>>> Why is that? My understanding is that ARI capable devices should also
>>>>> set the multifunction bit in the header. It's not terribly clear from
>>>>> the spec though.
>>>>
>>>> Something like the following will not work properly with ARI-capable
>>>> device (think of a as an ARI-capable device):
>>>> -device a,addr=0x1.0x0,multifunction=on -device a,addr=0x1.0x1
>>> (I had a crazy idea, to use it like that so we could put more devices
>>> on port without resorting to adding extra bridges)
>>> Can you elaborate some more why it won't work?
>>
>> It won't work because the ARI next function number field is fixed. In this case, the field of the Function at 0x1.0x0 should point to 0x1.0x1, but it doesn’t.
> 
> Where does it point to in this case then? 0x1.0x2 becasue of the stride?

With "[PATCH v5 0/2] pcie: Fix ARI next function numbers"*, it will 
point to 0, which ends the linked list formed with the ARI next number 
fields and make it only contain Function 0.

* 
https://lore.kernel.org/qemu-devel/20230705022421.13115-1-akihiko.odaki@daynix.com/

> 
>> As the result, the Function at 0x1.0x1 won't be recognized.
>>
>> It's more problematic if some of the Functions are ARI-capable but others are not. In my understanding, all Functions in a ARI-capable device need to have ARI capability, but that's not enforced.
>>
>>>> This is because the next function numbers advertised with ARI are not
>>>> updated with the multifunction configuration, but they are hardcoded in
>>>> the device implementation. In this sense, the traditional (non-SR/IOV)
>>>> multifunction mechanism QEMU has will not work with ARI-capable devices.
>>>>
>>>>>    
>>>>>> But it should be a distinct check as it will need to check the function
>>>>>> number bits.
>>>>>>   
>>>>>>>   
>>>>>>>>
>>>>>>>>   
>>>>>>>>>>   
>>>>>>>>>>>> no?
>>>>>>>>>>>>
>>>>>>>>>>>> I am quite worried about all this work going into blocking
>>>>>>>>>>>> what we think is disallowed configurations. We should have
>>>>>>>>>>>> maybe blocked them originally, but now that we didn't
>>>>>>>>>>>> there's a non zero chance of regressions,
>>>>>>>>>>>
>>>>>>>>>>> Sigh,
>>>>>>>>>>
>>>>>>>>>> There's value in patches 1-4 I think - the last patch helped you find
>>>>>>>>>> these. so there's value in this work.
>>>>>>>>>>   
>>>>>>>>>>> no medals here for being brave :-)
>>>>>>>>>>
>>>>>>>>>> Try removing support for a 3.5mm jack next. Oh wait ...
>>>>>>>>>
>>>>>>>>> Indeed. Everyone uses bluetooth these days. I for one is happy that the jack is gone (and they were bold enough to do it while Samsung and others still carry the useless port ) :-)
>>>>>>
>>>>>> Hello from a guy using a shiny M2 Macbook Air carrying the legacy jack with
>>>>>> a 100-yen earphone. Even people who ported Linux to this machine spent
>>>>>> efforts to get the jack to work on Linux ;)
>>>>>>   
>>>>>>>>>   
>>>>>>>>>>   
>>>>>>>>>>>> and the benefit
>>>>>>>>>>>> is not guaranteed.
>>>>>>>>>>>>
>>>>>>>>>>>> -- 
>>>>>>>>>>>> MST
>
diff mbox series

Patch

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e2eb4c3b4a..0320ac2bb3 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -65,6 +65,7 @@  bool pci_available = true;
 static char *pcibus_get_dev_path(DeviceState *dev);
 static char *pcibus_get_fw_dev_path(DeviceState *dev);
 static void pcibus_reset(BusState *qbus);
+static bool pcie_has_upstream_port(PCIDevice *dev);
 
 static Property pci_props[] = {
     DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
@@ -1190,6 +1191,20 @@  static PCIDevice *do_pci_register_device(PCIDevice *pci_dev,
                    name);
 
        return NULL;
+    } /*
+       * With SRIOV and ARI, vfs can have non-zero slot in the conventional
+       * PCI interpretation as all five bits reserved for slot addresses are
+       * also used for function bits for the various vfs. Ignore that case.
+       * It is too early here to check for ARI capabilities in the PCI config
+       * space. Hence, we check for a vf device instead.
+       */
+    else if (!pci_is_vf(pci_dev) &&
+             pcie_has_upstream_port(pci_dev) &&
+             PCI_SLOT(devfn)) {
+        error_setg(errp, "PCI: slot %d is not valid for %s,"
+                   " parent device only allows plugging into slot 0.",
+                   PCI_SLOT(devfn), name);
+        return NULL;
     }
 
     pci_dev->devfn = devfn;