diff mbox series

[v7,5/6] hw/pci: ensure PCIE devices are plugged into only slot 0 of PCIE port

Message ID 20230704112555.5629-6-anisinha@redhat.com
State New
Headers show
Series test and QEMU fixes to ensure proper PCIE device usage | expand

Commit Message

Ani Sinha July 4, 2023, 11:25 a.m. UTC
PCI Express ports only have one slot, so PCI Express devices can only be
plugged into slot 0 on a PCIE port. Add a warning to let users know when the
invalid configuration is used. We may enforce this more strongly later on once
we get more clarity on whether we are introducing a bad regression for users
currenly using the wrong configuration.

The change has been tested to not break or alter behaviors of ARI capable
devices by instantiating seven vfs on an emulated igb device (the maximum
number of vfs the linux igb driver supports). The vfs instantiated correctly
and are seen to have non-zero device/slot numbers in the conventional PCI BDF
representation.

CC: jusual@redhat.com
CC: imammedo@redhat.com
CC: mst@redhat.com
CC: akihiko.odaki@daynix.com

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Reviewed-by: Julia Suvorova <jusual@redhat.com>
---
 hw/pci/pci.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

Ani Sinha July 4, 2023, 11:38 a.m. UTC | #1
> On 04-Jul-2023, at 4:55 PM, Ani Sinha <anisinha@redhat.com> wrote:
> 
> PCI Express ports only have one slot, so PCI Express devices can only be
> plugged into slot 0 on a PCIE port. Add a warning to let users know when the
> invalid configuration is used. We may enforce this more strongly later on once
> we get more clarity on whether we are introducing a bad regression for users
> currenly using the wrong configuration.
> 
> The change has been tested to not break or alter behaviors of ARI capable
> devices by instantiating seven vfs on an emulated igb device (the maximum
> number of vfs the linux igb driver supports). The vfs instantiated correctly
> and are seen to have non-zero device/slot numbers in the conventional PCI BDF
> representation.

I wil send a v8 with the patch subject line changed to something like 

"hw/pci: warn when PCIE devices are plugged into non-zero slot of PCIE port”

which is more appropriate. Meanwhile,  I will wait to get more comments/tags on v7.

> 
> CC: jusual@redhat.com
> CC: imammedo@redhat.com
> CC: mst@redhat.com
> CC: akihiko.odaki@daynix.com
> 
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> Reviewed-by: Julia Suvorova <jusual@redhat.com>
> ---
> hw/pci/pci.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index e2eb4c3b4a..47517ba3db 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -65,6 +65,7 @@ bool pci_available = true;
> static char *pcibus_get_dev_path(DeviceState *dev);
> static char *pcibus_get_fw_dev_path(DeviceState *dev);
> static void pcibus_reset(BusState *qbus);
> +static bool pcie_has_upstream_port(PCIDevice *dev);
> 
> static Property pci_props[] = {
>     DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
> @@ -2121,6 +2122,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>         }
>     }
> 
> +    /*
> +     * With SRIOV and ARI, vfs can have non-zero slot in the conventional
> +     * PCI interpretation as all five bits reserved for slot addresses are
> +     * also used for function bits for the various vfs. Ignore that case.
> +     */
> +    if (pci_is_express(pci_dev) &&
> +        !pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI) &&
> +        pcie_has_upstream_port(pci_dev) &&
> +        PCI_SLOT(pci_dev->devfn)) {
> +        warn_report("PCI: slot %d is not valid for %s,"
> +                    " parent device only allows plugging into slot 0.",
> +                    PCI_SLOT(pci_dev->devfn), pci_dev->name);
> +    }
> +
>     if (pci_dev->failover_pair_id) {
>         if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
>             error_setg(errp, "failover primary device must be on "
> -- 
> 2.39.1
>
Akihiko Odaki July 4, 2023, 11:54 a.m. UTC | #2
On 2023/07/04 20:25, Ani Sinha wrote:
> PCI Express ports only have one slot, so PCI Express devices can only be
> plugged into slot 0 on a PCIE port. Add a warning to let users know when the
> invalid configuration is used. We may enforce this more strongly later on once
> we get more clarity on whether we are introducing a bad regression for users
> currenly using the wrong configuration.
> 
> The change has been tested to not break or alter behaviors of ARI capable
> devices by instantiating seven vfs on an emulated igb device (the maximum
> number of vfs the linux igb driver supports). The vfs instantiated correctly
> and are seen to have non-zero device/slot numbers in the conventional PCI BDF
> representation.
> 
> CC: jusual@redhat.com
> CC: imammedo@redhat.com
> CC: mst@redhat.com
> CC: akihiko.odaki@daynix.com
> 
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> Reviewed-by: Julia Suvorova <jusual@redhat.com>
> ---
>   hw/pci/pci.c | 15 +++++++++++++++
>   1 file changed, 15 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index e2eb4c3b4a..47517ba3db 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -65,6 +65,7 @@ bool pci_available = true;
>   static char *pcibus_get_dev_path(DeviceState *dev);
>   static char *pcibus_get_fw_dev_path(DeviceState *dev);
>   static void pcibus_reset(BusState *qbus);
> +static bool pcie_has_upstream_port(PCIDevice *dev);
>   
>   static Property pci_props[] = {
>       DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
> @@ -2121,6 +2122,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>           }
>       }
>   
> +    /*
> +     * With SRIOV and ARI, vfs can have non-zero slot in the conventional
> +     * PCI interpretation as all five bits reserved for slot addresses are
> +     * also used for function bits for the various vfs. Ignore that case.

You don't have to mention SR/IOV; it affects all ARI-capable devices. A 
PF can also have non-zero slot number in the conventional interpretation 
so you shouldn't call it vf either.

> +     */
> +    if (pci_is_express(pci_dev) &&
> +        !pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI) &&
> +        pcie_has_upstream_port(pci_dev) &&
> +        PCI_SLOT(pci_dev->devfn)) {
> +        warn_report("PCI: slot %d is not valid for %s,"
> +                    " parent device only allows plugging into slot 0.",
> +                    PCI_SLOT(pci_dev->devfn), pci_dev->name);
> +    }
> +
>       if (pci_dev->failover_pair_id) {
>           if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
>               error_setg(errp, "failover primary device must be on "
Ani Sinha July 4, 2023, 11:59 a.m. UTC | #3
> On 04-Jul-2023, at 5:24 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> 
> On 2023/07/04 20:25, Ani Sinha wrote:
>> PCI Express ports only have one slot, so PCI Express devices can only be
>> plugged into slot 0 on a PCIE port. Add a warning to let users know when the
>> invalid configuration is used. We may enforce this more strongly later on once
>> we get more clarity on whether we are introducing a bad regression for users
>> currenly using the wrong configuration.
>> The change has been tested to not break or alter behaviors of ARI capable
>> devices by instantiating seven vfs on an emulated igb device (the maximum
>> number of vfs the linux igb driver supports). The vfs instantiated correctly
>> and are seen to have non-zero device/slot numbers in the conventional PCI BDF
>> representation.
>> CC: jusual@redhat.com
>> CC: imammedo@redhat.com
>> CC: mst@redhat.com
>> CC: akihiko.odaki@daynix.com
>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>> Reviewed-by: Julia Suvorova <jusual@redhat.com>
>> ---
>>  hw/pci/pci.c | 15 +++++++++++++++
>>  1 file changed, 15 insertions(+)
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index e2eb4c3b4a..47517ba3db 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -65,6 +65,7 @@ bool pci_available = true;
>>  static char *pcibus_get_dev_path(DeviceState *dev);
>>  static char *pcibus_get_fw_dev_path(DeviceState *dev);
>>  static void pcibus_reset(BusState *qbus);
>> +static bool pcie_has_upstream_port(PCIDevice *dev);
>>    static Property pci_props[] = {
>>      DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>> @@ -2121,6 +2122,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>>          }
>>      }
>>  +    /*
>> +     * With SRIOV and ARI, vfs can have non-zero slot in the conventional
>> +     * PCI interpretation as all five bits reserved for slot addresses are
>> +     * also used for function bits for the various vfs. Ignore that case.
> 
> You don't have to mention SR/IOV; it affects all ARI-capable devices. A PF can also have non-zero slot number in the conventional interpretation so you shouldn't call it vf either.

Can you please help write a comment that explains this properly for all cases - ARI/non-ARI, PFs and VFs? Once everyone agrees that its clear and correct, I will re-spin.

> 
>> +     */
>> +    if (pci_is_express(pci_dev) &&
>> +        !pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI) &&
>> +        pcie_has_upstream_port(pci_dev) &&
>> +        PCI_SLOT(pci_dev->devfn)) {
>> +        warn_report("PCI: slot %d is not valid for %s,"
>> +                    " parent device only allows plugging into slot 0.",
>> +                    PCI_SLOT(pci_dev->devfn), pci_dev->name);
>> +    }
>> +
>>      if (pci_dev->failover_pair_id) {
>>          if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
>>              error_setg(errp, "failover primary device must be on "
>
Akihiko Odaki July 4, 2023, 12:02 p.m. UTC | #4
On 2023/07/04 20:59, Ani Sinha wrote:
> 
> 
>> On 04-Jul-2023, at 5:24 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>
>> On 2023/07/04 20:25, Ani Sinha wrote:
>>> PCI Express ports only have one slot, so PCI Express devices can only be
>>> plugged into slot 0 on a PCIE port. Add a warning to let users know when the
>>> invalid configuration is used. We may enforce this more strongly later on once
>>> we get more clarity on whether we are introducing a bad regression for users
>>> currenly using the wrong configuration.
>>> The change has been tested to not break or alter behaviors of ARI capable
>>> devices by instantiating seven vfs on an emulated igb device (the maximum
>>> number of vfs the linux igb driver supports). The vfs instantiated correctly
>>> and are seen to have non-zero device/slot numbers in the conventional PCI BDF
>>> representation.
>>> CC: jusual@redhat.com
>>> CC: imammedo@redhat.com
>>> CC: mst@redhat.com
>>> CC: akihiko.odaki@daynix.com
>>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>> Reviewed-by: Julia Suvorova <jusual@redhat.com>
>>> ---
>>>   hw/pci/pci.c | 15 +++++++++++++++
>>>   1 file changed, 15 insertions(+)
>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>> index e2eb4c3b4a..47517ba3db 100644
>>> --- a/hw/pci/pci.c
>>> +++ b/hw/pci/pci.c
>>> @@ -65,6 +65,7 @@ bool pci_available = true;
>>>   static char *pcibus_get_dev_path(DeviceState *dev);
>>>   static char *pcibus_get_fw_dev_path(DeviceState *dev);
>>>   static void pcibus_reset(BusState *qbus);
>>> +static bool pcie_has_upstream_port(PCIDevice *dev);
>>>     static Property pci_props[] = {
>>>       DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>>> @@ -2121,6 +2122,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>>>           }
>>>       }
>>>   +    /*
>>> +     * With SRIOV and ARI, vfs can have non-zero slot in the conventional
>>> +     * PCI interpretation as all five bits reserved for slot addresses are
>>> +     * also used for function bits for the various vfs. Ignore that case.
>>
>> You don't have to mention SR/IOV; it affects all ARI-capable devices. A PF can also have non-zero slot number in the conventional interpretation so you shouldn't call it vf either.
> 
> Can you please help write a comment that explains this properly for all cases - ARI/non-ARI, PFs and VFs? Once everyone agrees that its clear and correct, I will re-spin.

Simply, you can say:
With ARI, the slot number field in the conventional PCI interpretation 
can have a non-zero value as the field bits are reused to extend the 
function number bits. Ignore that case.

> 
>>
>>> +     */
>>> +    if (pci_is_express(pci_dev) &&
>>> +        !pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI) &&
>>> +        pcie_has_upstream_port(pci_dev) &&
>>> +        PCI_SLOT(pci_dev->devfn)) {
>>> +        warn_report("PCI: slot %d is not valid for %s,"
>>> +                    " parent device only allows plugging into slot 0.",
>>> +                    PCI_SLOT(pci_dev->devfn), pci_dev->name);
>>> +    }
>>> +
>>>       if (pci_dev->failover_pair_id) {
>>>           if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
>>>               error_setg(errp, "failover primary device must be on "
>>
>
Ani Sinha July 4, 2023, 12:08 p.m. UTC | #5
> On 04-Jul-2023, at 5:32 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> 
> On 2023/07/04 20:59, Ani Sinha wrote:
>>> On 04-Jul-2023, at 5:24 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>> 
>>> On 2023/07/04 20:25, Ani Sinha wrote:
>>>> PCI Express ports only have one slot, so PCI Express devices can only be
>>>> plugged into slot 0 on a PCIE port. Add a warning to let users know when the
>>>> invalid configuration is used. We may enforce this more strongly later on once
>>>> we get more clarity on whether we are introducing a bad regression for users
>>>> currenly using the wrong configuration.
>>>> The change has been tested to not break or alter behaviors of ARI capable
>>>> devices by instantiating seven vfs on an emulated igb device (the maximum
>>>> number of vfs the linux igb driver supports). The vfs instantiated correctly
>>>> and are seen to have non-zero device/slot numbers in the conventional PCI BDF
>>>> representation.
>>>> CC: jusual@redhat.com
>>>> CC: imammedo@redhat.com
>>>> CC: mst@redhat.com
>>>> CC: akihiko.odaki@daynix.com
>>>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>> Reviewed-by: Julia Suvorova <jusual@redhat.com>
>>>> ---
>>>>  hw/pci/pci.c | 15 +++++++++++++++
>>>>  1 file changed, 15 insertions(+)
>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>> index e2eb4c3b4a..47517ba3db 100644
>>>> --- a/hw/pci/pci.c
>>>> +++ b/hw/pci/pci.c
>>>> @@ -65,6 +65,7 @@ bool pci_available = true;
>>>>  static char *pcibus_get_dev_path(DeviceState *dev);
>>>>  static char *pcibus_get_fw_dev_path(DeviceState *dev);
>>>>  static void pcibus_reset(BusState *qbus);
>>>> +static bool pcie_has_upstream_port(PCIDevice *dev);
>>>>    static Property pci_props[] = {
>>>>      DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>>>> @@ -2121,6 +2122,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>>>>          }
>>>>      }
>>>>  +    /*
>>>> +     * With SRIOV and ARI, vfs can have non-zero slot in the conventional
>>>> +     * PCI interpretation as all five bits reserved for slot addresses are
>>>> +     * also used for function bits for the various vfs. Ignore that case.
>>> 
>>> You don't have to mention SR/IOV; it affects all ARI-capable devices. A PF can also have non-zero slot number in the conventional interpretation so you shouldn't call it vf either.
>> Can you please help write a comment that explains this properly for all cases - ARI/non-ARI, PFs and VFs? Once everyone agrees that its clear and correct, I will re-spin.
> 
> Simply, you can say:
> With ARI, the slot number field in the conventional PCI interpretation can have a non-zero value as the field bits are reused to extend the function number bits. Ignore that case.

but we are not checking for ARI capability here in the code. So the comment is confusing.

> 
>>> 
>>>> +     */
>>>> +    if (pci_is_express(pci_dev) &&
>>>> +        !pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI) &&
>>>> +        pcie_has_upstream_port(pci_dev) &&
>>>> +        PCI_SLOT(pci_dev->devfn)) {
>>>> +        warn_report("PCI: slot %d is not valid for %s,"
>>>> +                    " parent device only allows plugging into slot 0.",
>>>> +                    PCI_SLOT(pci_dev->devfn), pci_dev->name);
>>>> +    }
>>>> +
>>>>      if (pci_dev->failover_pair_id) {
>>>>          if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
>>>>              error_setg(errp, "failover primary device must be on "
Akihiko Odaki July 4, 2023, 12:09 p.m. UTC | #6
On 2023/07/04 21:08, Ani Sinha wrote:
> 
> 
>> On 04-Jul-2023, at 5:32 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>
>> On 2023/07/04 20:59, Ani Sinha wrote:
>>>> On 04-Jul-2023, at 5:24 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>>
>>>> On 2023/07/04 20:25, Ani Sinha wrote:
>>>>> PCI Express ports only have one slot, so PCI Express devices can only be
>>>>> plugged into slot 0 on a PCIE port. Add a warning to let users know when the
>>>>> invalid configuration is used. We may enforce this more strongly later on once
>>>>> we get more clarity on whether we are introducing a bad regression for users
>>>>> currenly using the wrong configuration.
>>>>> The change has been tested to not break or alter behaviors of ARI capable
>>>>> devices by instantiating seven vfs on an emulated igb device (the maximum
>>>>> number of vfs the linux igb driver supports). The vfs instantiated correctly
>>>>> and are seen to have non-zero device/slot numbers in the conventional PCI BDF
>>>>> representation.
>>>>> CC: jusual@redhat.com
>>>>> CC: imammedo@redhat.com
>>>>> CC: mst@redhat.com
>>>>> CC: akihiko.odaki@daynix.com
>>>>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
>>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>>> Reviewed-by: Julia Suvorova <jusual@redhat.com>
>>>>> ---
>>>>>   hw/pci/pci.c | 15 +++++++++++++++
>>>>>   1 file changed, 15 insertions(+)
>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>> index e2eb4c3b4a..47517ba3db 100644
>>>>> --- a/hw/pci/pci.c
>>>>> +++ b/hw/pci/pci.c
>>>>> @@ -65,6 +65,7 @@ bool pci_available = true;
>>>>>   static char *pcibus_get_dev_path(DeviceState *dev);
>>>>>   static char *pcibus_get_fw_dev_path(DeviceState *dev);
>>>>>   static void pcibus_reset(BusState *qbus);
>>>>> +static bool pcie_has_upstream_port(PCIDevice *dev);
>>>>>     static Property pci_props[] = {
>>>>>       DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>>>>> @@ -2121,6 +2122,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>>>>>           }
>>>>>       }
>>>>>   +    /*
>>>>> +     * With SRIOV and ARI, vfs can have non-zero slot in the conventional
>>>>> +     * PCI interpretation as all five bits reserved for slot addresses are
>>>>> +     * also used for function bits for the various vfs. Ignore that case.
>>>>
>>>> You don't have to mention SR/IOV; it affects all ARI-capable devices. A PF can also have non-zero slot number in the conventional interpretation so you shouldn't call it vf either.
>>> Can you please help write a comment that explains this properly for all cases - ARI/non-ARI, PFs and VFs? Once everyone agrees that its clear and correct, I will re-spin.
>>
>> Simply, you can say:
>> With ARI, the slot number field in the conventional PCI interpretation can have a non-zero value as the field bits are reused to extend the function number bits. Ignore that case.
> 
> but we are not checking for ARI capability here in the code. So the comment is confusing.

Don't we? We check for:
!pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI)

> 
>>
>>>>
>>>>> +     */
>>>>> +    if (pci_is_express(pci_dev) &&
>>>>> +        !pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI) &&
>>>>> +        pcie_has_upstream_port(pci_dev) &&
>>>>> +        PCI_SLOT(pci_dev->devfn)) {
>>>>> +        warn_report("PCI: slot %d is not valid for %s,"
>>>>> +                    " parent device only allows plugging into slot 0.",
>>>>> +                    PCI_SLOT(pci_dev->devfn), pci_dev->name);
>>>>> +    }
>>>>> +
>>>>>       if (pci_dev->failover_pair_id) {
>>>>>           if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
>>>>>               error_setg(errp, "failover primary device must be on "
>
Ani Sinha July 4, 2023, 12:28 p.m. UTC | #7
On Tue, 4 Jul, 2023, 5:39 pm Akihiko Odaki, <akihiko.odaki@daynix.com>
wrote:

> On 2023/07/04 21:08, Ani Sinha wrote:
> >
> >
> >> On 04-Jul-2023, at 5:32 PM, Akihiko Odaki <akihiko.odaki@daynix.com>
> wrote:
> >>
> >> On 2023/07/04 20:59, Ani Sinha wrote:
> >>>> On 04-Jul-2023, at 5:24 PM, Akihiko Odaki <akihiko.odaki@daynix.com>
> wrote:
> >>>>
> >>>> On 2023/07/04 20:25, Ani Sinha wrote:
> >>>>> PCI Express ports only have one slot, so PCI Express devices can
> only be
> >>>>> plugged into slot 0 on a PCIE port. Add a warning to let users know
> when the
> >>>>> invalid configuration is used. We may enforce this more strongly
> later on once
> >>>>> we get more clarity on whether we are introducing a bad regression
> for users
> >>>>> currenly using the wrong configuration.
> >>>>> The change has been tested to not break or alter behaviors of ARI
> capable
> >>>>> devices by instantiating seven vfs on an emulated igb device (the
> maximum
> >>>>> number of vfs the linux igb driver supports). The vfs instantiated
> correctly
> >>>>> and are seen to have non-zero device/slot numbers in the
> conventional PCI BDF
> >>>>> representation.
> >>>>> CC: jusual@redhat.com
> >>>>> CC: imammedo@redhat.com
> >>>>> CC: mst@redhat.com
> >>>>> CC: akihiko.odaki@daynix.com
> >>>>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
> >>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> >>>>> Reviewed-by: Julia Suvorova <jusual@redhat.com>
> >>>>> ---
> >>>>>   hw/pci/pci.c | 15 +++++++++++++++
> >>>>>   1 file changed, 15 insertions(+)
> >>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >>>>> index e2eb4c3b4a..47517ba3db 100644
> >>>>> --- a/hw/pci/pci.c
> >>>>> +++ b/hw/pci/pci.c
> >>>>> @@ -65,6 +65,7 @@ bool pci_available = true;
> >>>>>   static char *pcibus_get_dev_path(DeviceState *dev);
> >>>>>   static char *pcibus_get_fw_dev_path(DeviceState *dev);
> >>>>>   static void pcibus_reset(BusState *qbus);
> >>>>> +static bool pcie_has_upstream_port(PCIDevice *dev);
> >>>>>     static Property pci_props[] = {
> >>>>>       DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
> >>>>> @@ -2121,6 +2122,20 @@ static void pci_qdev_realize(DeviceState
> *qdev, Error **errp)
> >>>>>           }
> >>>>>       }
> >>>>>   +    /*
> >>>>> +     * With SRIOV and ARI, vfs can have non-zero slot in the
> conventional
> >>>>> +     * PCI interpretation as all five bits reserved for slot
> addresses are
> >>>>> +     * also used for function bits for the various vfs. Ignore that
> case.
> >>>>
> >>>> You don't have to mention SR/IOV; it affects all ARI-capable devices.
> A PF can also have non-zero slot number in the conventional interpretation
> so you shouldn't call it vf either.
> >>> Can you please help write a comment that explains this properly for
> all cases - ARI/non-ARI, PFs and VFs? Once everyone agrees that its clear
> and correct, I will re-spin.
> >>
> >> Simply, you can say:
> >> With ARI, the slot number field in the conventional PCI interpretation
> can have a non-zero value as the field bits are reused to extend the
> function number bits. Ignore that case.
> >
> > but we are not checking for ARI capability here in the code. So the
> comment is confusing.
>
> Don't we? We check for:
> !pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI)
>

Yes I was thinking of patch 6 in the series which also adds a comment for
ARI.

I'll wait to see what others thought of your suggestion before respinning
patch 5


>
> >>
> >>>>
> >>>>> +     */
> >>>>> +    if (pci_is_express(pci_dev) &&
> >>>>> +        !pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI) &&
> >>>>> +        pcie_has_upstream_port(pci_dev) &&
> >>>>> +        PCI_SLOT(pci_dev->devfn)) {
> >>>>> +        warn_report("PCI: slot %d is not valid for %s,"
> >>>>> +                    " parent device only allows plugging into slot
> 0.",
> >>>>> +                    PCI_SLOT(pci_dev->devfn), pci_dev->name);
> >>>>> +    }
> >>>>> +
> >>>>>       if (pci_dev->failover_pair_id) {
> >>>>>           if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
> >>>>>               error_setg(errp, "failover primary device must be on "
> >
>
>
Igor Mammedov July 4, 2023, 12:48 p.m. UTC | #8
On Tue, 4 Jul 2023 21:02:09 +0900
Akihiko Odaki <akihiko.odaki@daynix.com> wrote:

> On 2023/07/04 20:59, Ani Sinha wrote:
> > 
> >   
> >> On 04-Jul-2023, at 5:24 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> >>
> >> On 2023/07/04 20:25, Ani Sinha wrote:  
> >>> PCI Express ports only have one slot, so PCI Express devices can only be
> >>> plugged into slot 0 on a PCIE port. Add a warning to let users know when the
> >>> invalid configuration is used. We may enforce this more strongly later on once
> >>> we get more clarity on whether we are introducing a bad regression for users
> >>> currenly using the wrong configuration.
> >>> The change has been tested to not break or alter behaviors of ARI capable
> >>> devices by instantiating seven vfs on an emulated igb device (the maximum
> >>> number of vfs the linux igb driver supports). The vfs instantiated correctly
> >>> and are seen to have non-zero device/slot numbers in the conventional PCI BDF
> >>> representation.
> >>> CC: jusual@redhat.com
> >>> CC: imammedo@redhat.com
> >>> CC: mst@redhat.com
> >>> CC: akihiko.odaki@daynix.com
> >>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
> >>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> >>> Reviewed-by: Julia Suvorova <jusual@redhat.com>
> >>> ---
> >>>   hw/pci/pci.c | 15 +++++++++++++++
> >>>   1 file changed, 15 insertions(+)
> >>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >>> index e2eb4c3b4a..47517ba3db 100644
> >>> --- a/hw/pci/pci.c
> >>> +++ b/hw/pci/pci.c
> >>> @@ -65,6 +65,7 @@ bool pci_available = true;
> >>>   static char *pcibus_get_dev_path(DeviceState *dev);
> >>>   static char *pcibus_get_fw_dev_path(DeviceState *dev);
> >>>   static void pcibus_reset(BusState *qbus);
> >>> +static bool pcie_has_upstream_port(PCIDevice *dev);
> >>>     static Property pci_props[] = {
> >>>       DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
> >>> @@ -2121,6 +2122,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
> >>>           }
> >>>       }
> >>>   +    /*
> >>> +     * With SRIOV and ARI, vfs can have non-zero slot in the conventional
> >>> +     * PCI interpretation as all five bits reserved for slot addresses are
> >>> +     * also used for function bits for the various vfs. Ignore that case.  
> >>
> >> You don't have to mention SR/IOV; it affects all ARI-capable devices. A PF can also have non-zero slot number in the conventional interpretation so you shouldn't call it vf either.  
> > 
> > Can you please help write a comment that explains this properly for all cases - ARI/non-ARI, PFs and VFs? Once everyone agrees that its clear and correct, I will re-spin.  
> 
> Simply, you can say:
> With ARI, the slot number field in the conventional PCI interpretation 
> can have a non-zero value as the field bits are reused to extend the 
> function number bits. Ignore that case.

mentioning 'conventional PCI interpretation' in comment and then immediately
checking 'pci_is_express(pci_dev)' is confusing. Since comment belongs
only to PCIE branch it would be better to talk in only about PCIe stuff
and referring to relevant portions of spec.
(for example see how it's done in kernel code: only_one_child(...)

PS:
kernel can be forced  to scan for !0 device numbers, but that's rather
a hack, so we shouldn't really care about that.

> 
> >   
> >>  
> >>> +     */
> >>> +    if (pci_is_express(pci_dev) &&
> >>> +        !pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI) &&
> >>> +        pcie_has_upstream_port(pci_dev) &&
> >>> +        PCI_SLOT(pci_dev->devfn)) {
> >>> +        warn_report("PCI: slot %d is not valid for %s,"
> >>> +                    " parent device only allows plugging into slot 0.",
> >>> +                    PCI_SLOT(pci_dev->devfn), pci_dev->name);
> >>> +    }
> >>> +
> >>>       if (pci_dev->failover_pair_id) {
> >>>           if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
> >>>               error_setg(errp, "failover primary device must be on "  
> >>  
> >   
>
Ani Sinha July 4, 2023, 1:50 p.m. UTC | #9
> On 04-Jul-2023, at 6:18 PM, Igor Mammedov <imammedo@redhat.com> wrote:
> 
> On Tue, 4 Jul 2023 21:02:09 +0900
> Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> 
>> On 2023/07/04 20:59, Ani Sinha wrote:
>>> 
>>> 
>>>> On 04-Jul-2023, at 5:24 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>> 
>>>> On 2023/07/04 20:25, Ani Sinha wrote:  
>>>>> PCI Express ports only have one slot, so PCI Express devices can only be
>>>>> plugged into slot 0 on a PCIE port. Add a warning to let users know when the
>>>>> invalid configuration is used. We may enforce this more strongly later on once
>>>>> we get more clarity on whether we are introducing a bad regression for users
>>>>> currenly using the wrong configuration.
>>>>> The change has been tested to not break or alter behaviors of ARI capable
>>>>> devices by instantiating seven vfs on an emulated igb device (the maximum
>>>>> number of vfs the linux igb driver supports). The vfs instantiated correctly
>>>>> and are seen to have non-zero device/slot numbers in the conventional PCI BDF
>>>>> representation.
>>>>> CC: jusual@redhat.com
>>>>> CC: imammedo@redhat.com
>>>>> CC: mst@redhat.com
>>>>> CC: akihiko.odaki@daynix.com
>>>>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
>>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>>> Reviewed-by: Julia Suvorova <jusual@redhat.com>
>>>>> ---
>>>>>  hw/pci/pci.c | 15 +++++++++++++++
>>>>>  1 file changed, 15 insertions(+)
>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>> index e2eb4c3b4a..47517ba3db 100644
>>>>> --- a/hw/pci/pci.c
>>>>> +++ b/hw/pci/pci.c
>>>>> @@ -65,6 +65,7 @@ bool pci_available = true;
>>>>>  static char *pcibus_get_dev_path(DeviceState *dev);
>>>>>  static char *pcibus_get_fw_dev_path(DeviceState *dev);
>>>>>  static void pcibus_reset(BusState *qbus);
>>>>> +static bool pcie_has_upstream_port(PCIDevice *dev);
>>>>>    static Property pci_props[] = {
>>>>>      DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>>>>> @@ -2121,6 +2122,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>>>>>          }
>>>>>      }
>>>>>  +    /*
>>>>> +     * With SRIOV and ARI, vfs can have non-zero slot in the conventional
>>>>> +     * PCI interpretation as all five bits reserved for slot addresses are
>>>>> +     * also used for function bits for the various vfs. Ignore that case.  
>>>> 
>>>> You don't have to mention SR/IOV; it affects all ARI-capable devices. A PF can also have non-zero slot number in the conventional interpretation so you shouldn't call it vf either.  
>>> 
>>> Can you please help write a comment that explains this properly for all cases - ARI/non-ARI, PFs and VFs? Once everyone agrees that its clear and correct, I will re-spin.  
>> 
>> Simply, you can say:
>> With ARI, the slot number field in the conventional PCI interpretation 
>> can have a non-zero value as the field bits are reused to extend the 
>> function number bits. Ignore that case.
> 
> mentioning 'conventional PCI interpretation' in comment and then immediately
> checking 'pci_is_express(pci_dev)' is confusing. Since comment belongs
> only to PCIE branch it would be better to talk in only about PCIe stuff
> and referring to relevant portions of spec.

Ok so how about this?

   * With ARI, devices can have non-zero slot in the traditional BDF                                                                                  
     * representation as all five bits reserved for slot addresses are                                                                                  
     * also used for function bits. Ignore that case.                       


> (for example see how it's done in kernel code: only_one_child(...)
> 
> PS:
> kernel can be forced  to scan for !0 device numbers, but that's rather
> a hack, so we shouldn't really care about that.
> 
>> 
>>> 
>>>> 
>>>>> +     */
>>>>> +    if (pci_is_express(pci_dev) &&
>>>>> +        !pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI) &&
>>>>> +        pcie_has_upstream_port(pci_dev) &&
>>>>> +        PCI_SLOT(pci_dev->devfn)) {
>>>>> +        warn_report("PCI: slot %d is not valid for %s,"
>>>>> +                    " parent device only allows plugging into slot 0.",
>>>>> +                    PCI_SLOT(pci_dev->devfn), pci_dev->name);
>>>>> +    }
>>>>> +
>>>>>      if (pci_dev->failover_pair_id) {
>>>>>          if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
>>>>>              error_setg(errp, "failover primary device must be on "
Igor Mammedov July 4, 2023, 2:28 p.m. UTC | #10
On Tue, 4 Jul 2023 19:20:00 +0530
Ani Sinha <anisinha@redhat.com> wrote:

> > On 04-Jul-2023, at 6:18 PM, Igor Mammedov <imammedo@redhat.com> wrote:
> > 
> > On Tue, 4 Jul 2023 21:02:09 +0900
> > Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> >   
> >> On 2023/07/04 20:59, Ani Sinha wrote:  
> >>> 
> >>>   
> >>>> On 04-Jul-2023, at 5:24 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> >>>> 
> >>>> On 2023/07/04 20:25, Ani Sinha wrote:    
> >>>>> PCI Express ports only have one slot, so PCI Express devices can only be
> >>>>> plugged into slot 0 on a PCIE port. Add a warning to let users know when the
> >>>>> invalid configuration is used. We may enforce this more strongly later on once
> >>>>> we get more clarity on whether we are introducing a bad regression for users
> >>>>> currenly using the wrong configuration.
> >>>>> The change has been tested to not break or alter behaviors of ARI capable
> >>>>> devices by instantiating seven vfs on an emulated igb device (the maximum
> >>>>> number of vfs the linux igb driver supports). The vfs instantiated correctly
> >>>>> and are seen to have non-zero device/slot numbers in the conventional PCI BDF
> >>>>> representation.
> >>>>> CC: jusual@redhat.com
> >>>>> CC: imammedo@redhat.com
> >>>>> CC: mst@redhat.com
> >>>>> CC: akihiko.odaki@daynix.com
> >>>>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
> >>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
> >>>>> Reviewed-by: Julia Suvorova <jusual@redhat.com>
> >>>>> ---
> >>>>>  hw/pci/pci.c | 15 +++++++++++++++
> >>>>>  1 file changed, 15 insertions(+)
> >>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >>>>> index e2eb4c3b4a..47517ba3db 100644
> >>>>> --- a/hw/pci/pci.c
> >>>>> +++ b/hw/pci/pci.c
> >>>>> @@ -65,6 +65,7 @@ bool pci_available = true;
> >>>>>  static char *pcibus_get_dev_path(DeviceState *dev);
> >>>>>  static char *pcibus_get_fw_dev_path(DeviceState *dev);
> >>>>>  static void pcibus_reset(BusState *qbus);
> >>>>> +static bool pcie_has_upstream_port(PCIDevice *dev);
> >>>>>    static Property pci_props[] = {
> >>>>>      DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
> >>>>> @@ -2121,6 +2122,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
> >>>>>          }
> >>>>>      }
> >>>>>  +    /*
> >>>>> +     * With SRIOV and ARI, vfs can have non-zero slot in the conventional
> >>>>> +     * PCI interpretation as all five bits reserved for slot addresses are
> >>>>> +     * also used for function bits for the various vfs. Ignore that case.    
> >>>> 
> >>>> You don't have to mention SR/IOV; it affects all ARI-capable devices. A PF can also have non-zero slot number in the conventional interpretation so you shouldn't call it vf either.    
> >>> 
> >>> Can you please help write a comment that explains this properly for all cases - ARI/non-ARI, PFs and VFs? Once everyone agrees that its clear and correct, I will re-spin.    
> >> 
> >> Simply, you can say:
> >> With ARI, the slot number field in the conventional PCI interpretation 
> >> can have a non-zero value as the field bits are reused to extend the 
> >> function number bits. Ignore that case.  
> > 
> > mentioning 'conventional PCI interpretation' in comment and then immediately
> > checking 'pci_is_express(pci_dev)' is confusing. Since comment belongs
> > only to PCIE branch it would be better to talk in only about PCIe stuff
> > and referring to relevant portions of spec.  
> 
> Ok so how about this?
> 
>    * With ARI, devices can have non-zero slot in the traditional BDF                                                                                  
>      * representation as all five bits reserved for slot addresses are                                                                                  
>      * also used for function bits. Ignore that case.  

you still refer to traditional (which I misread as 'conventional'),
steal the linux comment and argument it with ARI if necessary,
something like this (probably needs some more massaging):


         /*                                                                       
         * A PCIe Downstream Port normally leads to a Link with only Device      
         * 0 on it (PCIe spec r3.1, sec 7.3.1). 
          However PCI_SLOT() is broken if ARI is enabled, hence work around it
          by skipping check if the later cap is present.                                  
         */
                     
> 
> 
> > (for example see how it's done in kernel code: only_one_child(...)
> > 
> > PS:
> > kernel can be forced  to scan for !0 device numbers, but that's rather
> > a hack, so we shouldn't really care about that.
> >   
> >>   
> >>>   
> >>>>   
> >>>>> +     */
> >>>>> +    if (pci_is_express(pci_dev) &&
> >>>>> +        !pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI) &&
> >>>>> +        pcie_has_upstream_port(pci_dev) &&
> >>>>> +        PCI_SLOT(pci_dev->devfn)) {
> >>>>> +        warn_report("PCI: slot %d is not valid for %s,"
> >>>>> +                    " parent device only allows plugging into slot 0.",
> >>>>> +                    PCI_SLOT(pci_dev->devfn), pci_dev->name);
> >>>>> +    }
> >>>>> +
> >>>>>      if (pci_dev->failover_pair_id) {
> >>>>>          if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
> >>>>>              error_setg(errp, "failover primary device must be on "    
>
Ani Sinha July 4, 2023, 3:07 p.m. UTC | #11
> On 04-Jul-2023, at 7:58 PM, Igor Mammedov <imammedo@redhat.com> wrote:
> 
> On Tue, 4 Jul 2023 19:20:00 +0530
> Ani Sinha <anisinha@redhat.com> wrote:
> 
>>> On 04-Jul-2023, at 6:18 PM, Igor Mammedov <imammedo@redhat.com> wrote:
>>> 
>>> On Tue, 4 Jul 2023 21:02:09 +0900
>>> Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>> 
>>>> On 2023/07/04 20:59, Ani Sinha wrote:  
>>>>> 
>>>>> 
>>>>>> On 04-Jul-2023, at 5:24 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>>>> 
>>>>>> On 2023/07/04 20:25, Ani Sinha wrote:    
>>>>>>> PCI Express ports only have one slot, so PCI Express devices can only be
>>>>>>> plugged into slot 0 on a PCIE port. Add a warning to let users know when the
>>>>>>> invalid configuration is used. We may enforce this more strongly later on once
>>>>>>> we get more clarity on whether we are introducing a bad regression for users
>>>>>>> currenly using the wrong configuration.
>>>>>>> The change has been tested to not break or alter behaviors of ARI capable
>>>>>>> devices by instantiating seven vfs on an emulated igb device (the maximum
>>>>>>> number of vfs the linux igb driver supports). The vfs instantiated correctly
>>>>>>> and are seen to have non-zero device/slot numbers in the conventional PCI BDF
>>>>>>> representation.
>>>>>>> CC: jusual@redhat.com
>>>>>>> CC: imammedo@redhat.com
>>>>>>> CC: mst@redhat.com
>>>>>>> CC: akihiko.odaki@daynix.com
>>>>>>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
>>>>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>>>>> Reviewed-by: Julia Suvorova <jusual@redhat.com>
>>>>>>> ---
>>>>>>> hw/pci/pci.c | 15 +++++++++++++++
>>>>>>> 1 file changed, 15 insertions(+)
>>>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>>>> index e2eb4c3b4a..47517ba3db 100644
>>>>>>> --- a/hw/pci/pci.c
>>>>>>> +++ b/hw/pci/pci.c
>>>>>>> @@ -65,6 +65,7 @@ bool pci_available = true;
>>>>>>> static char *pcibus_get_dev_path(DeviceState *dev);
>>>>>>> static char *pcibus_get_fw_dev_path(DeviceState *dev);
>>>>>>> static void pcibus_reset(BusState *qbus);
>>>>>>> +static bool pcie_has_upstream_port(PCIDevice *dev);
>>>>>>>   static Property pci_props[] = {
>>>>>>>     DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>>>>>>> @@ -2121,6 +2122,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>>>>>>>         }
>>>>>>>     }
>>>>>>> +    /*
>>>>>>> +     * With SRIOV and ARI, vfs can have non-zero slot in the conventional
>>>>>>> +     * PCI interpretation as all five bits reserved for slot addresses are
>>>>>>> +     * also used for function bits for the various vfs. Ignore that case.    
>>>>>> 
>>>>>> You don't have to mention SR/IOV; it affects all ARI-capable devices. A PF can also have non-zero slot number in the conventional interpretation so you shouldn't call it vf either.    
>>>>> 
>>>>> Can you please help write a comment that explains this properly for all cases - ARI/non-ARI, PFs and VFs? Once everyone agrees that its clear and correct, I will re-spin.    
>>>> 
>>>> Simply, you can say:
>>>> With ARI, the slot number field in the conventional PCI interpretation 
>>>> can have a non-zero value as the field bits are reused to extend the 
>>>> function number bits. Ignore that case.  
>>> 
>>> mentioning 'conventional PCI interpretation' in comment and then immediately
>>> checking 'pci_is_express(pci_dev)' is confusing. Since comment belongs
>>> only to PCIE branch it would be better to talk in only about PCIe stuff
>>> and referring to relevant portions of spec.  
>> 
>> Ok so how about this?
>> 
>>   * With ARI, devices can have non-zero slot in the traditional BDF                                                                                  
>>     * representation as all five bits reserved for slot addresses are                                                                                  
>>     * also used for function bits. Ignore that case.  
> 
> you still refer to traditional (which I misread as 'conventional'),
> steal the linux comment and argument it with ARI if necessary,
> something like this (probably needs some more massaging):

The comment messaging in these patches seems to exceed the value of the patch itself :-)

How about this?

    /*                                                                                                                                                  
     * A PCIe Downstream Port normally leads to a Link with only Device                                                                                 
     * 0 on it (PCIe spec r3.1, sec 7.3.1).                                                                                                             
     * With ARI, PCI_SLOT() can return non-zero value as all five bits                                                                                  
     * reserved for slot addresses are also used for function bits.                                                                                     
     * Hence, ignore ARI capable devices.                                                                                                               
     */

> 
> 
>         /*                                                                       
>         * A PCIe Downstream Port normally leads to a Link with only Device      
>         * 0 on it (PCIe spec r3.1, sec 7.3.1). 
>          However PCI_SLOT() is broken if ARI is enabled, hence work around it
>          by skipping check if the later cap is present.                                  
>         */
> 
>> 
>> 
>>> (for example see how it's done in kernel code: only_one_child(...)
>>> 
>>> PS:
>>> kernel can be forced  to scan for !0 device numbers, but that's rather
>>> a hack, so we shouldn't really care about that.
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> +     */
>>>>>>> +    if (pci_is_express(pci_dev) &&
>>>>>>> +        !pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI) &&
>>>>>>> +        pcie_has_upstream_port(pci_dev) &&
>>>>>>> +        PCI_SLOT(pci_dev->devfn)) {
>>>>>>> +        warn_report("PCI: slot %d is not valid for %s,"
>>>>>>> +                    " parent device only allows plugging into slot 0.",
>>>>>>> +                    PCI_SLOT(pci_dev->devfn), pci_dev->name);
>>>>>>> +    }
>>>>>>> +
>>>>>>>     if (pci_dev->failover_pair_id) {
>>>>>>>         if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
>>>>>>>             error_setg(errp, "failover primary device must be on "    
>> 
>
Akihiko Odaki July 5, 2023, 1:39 a.m. UTC | #12
On 2023/07/05 0:07, Ani Sinha wrote:
> 
> 
>> On 04-Jul-2023, at 7:58 PM, Igor Mammedov <imammedo@redhat.com> wrote:
>>
>> On Tue, 4 Jul 2023 19:20:00 +0530
>> Ani Sinha <anisinha@redhat.com> wrote:
>>
>>>> On 04-Jul-2023, at 6:18 PM, Igor Mammedov <imammedo@redhat.com> wrote:
>>>>
>>>> On Tue, 4 Jul 2023 21:02:09 +0900
>>>> Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>>
>>>>> On 2023/07/04 20:59, Ani Sinha wrote:
>>>>>>
>>>>>>
>>>>>>> On 04-Jul-2023, at 5:24 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>>>>>
>>>>>>> On 2023/07/04 20:25, Ani Sinha wrote:
>>>>>>>> PCI Express ports only have one slot, so PCI Express devices can only be
>>>>>>>> plugged into slot 0 on a PCIE port. Add a warning to let users know when the
>>>>>>>> invalid configuration is used. We may enforce this more strongly later on once
>>>>>>>> we get more clarity on whether we are introducing a bad regression for users
>>>>>>>> currenly using the wrong configuration.
>>>>>>>> The change has been tested to not break or alter behaviors of ARI capable
>>>>>>>> devices by instantiating seven vfs on an emulated igb device (the maximum
>>>>>>>> number of vfs the linux igb driver supports). The vfs instantiated correctly
>>>>>>>> and are seen to have non-zero device/slot numbers in the conventional PCI BDF
>>>>>>>> representation.
>>>>>>>> CC: jusual@redhat.com
>>>>>>>> CC: imammedo@redhat.com
>>>>>>>> CC: mst@redhat.com
>>>>>>>> CC: akihiko.odaki@daynix.com
>>>>>>>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
>>>>>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>>>>>> Reviewed-by: Julia Suvorova <jusual@redhat.com>
>>>>>>>> ---
>>>>>>>> hw/pci/pci.c | 15 +++++++++++++++
>>>>>>>> 1 file changed, 15 insertions(+)
>>>>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>>>>> index e2eb4c3b4a..47517ba3db 100644
>>>>>>>> --- a/hw/pci/pci.c
>>>>>>>> +++ b/hw/pci/pci.c
>>>>>>>> @@ -65,6 +65,7 @@ bool pci_available = true;
>>>>>>>> static char *pcibus_get_dev_path(DeviceState *dev);
>>>>>>>> static char *pcibus_get_fw_dev_path(DeviceState *dev);
>>>>>>>> static void pcibus_reset(BusState *qbus);
>>>>>>>> +static bool pcie_has_upstream_port(PCIDevice *dev);
>>>>>>>>    static Property pci_props[] = {
>>>>>>>>      DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>>>>>>>> @@ -2121,6 +2122,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>>>>>>>>          }
>>>>>>>>      }
>>>>>>>> +    /*
>>>>>>>> +     * With SRIOV and ARI, vfs can have non-zero slot in the conventional
>>>>>>>> +     * PCI interpretation as all five bits reserved for slot addresses are
>>>>>>>> +     * also used for function bits for the various vfs. Ignore that case.
>>>>>>>
>>>>>>> You don't have to mention SR/IOV; it affects all ARI-capable devices. A PF can also have non-zero slot number in the conventional interpretation so you shouldn't call it vf either.
>>>>>>
>>>>>> Can you please help write a comment that explains this properly for all cases - ARI/non-ARI, PFs and VFs? Once everyone agrees that its clear and correct, I will re-spin.
>>>>>
>>>>> Simply, you can say:
>>>>> With ARI, the slot number field in the conventional PCI interpretation
>>>>> can have a non-zero value as the field bits are reused to extend the
>>>>> function number bits. Ignore that case.
>>>>
>>>> mentioning 'conventional PCI interpretation' in comment and then immediately
>>>> checking 'pci_is_express(pci_dev)' is confusing. Since comment belongs
>>>> only to PCIE branch it would be better to talk in only about PCIe stuff
>>>> and referring to relevant portions of spec.
>>>
>>> Ok so how about this?
>>>
>>>    * With ARI, devices can have non-zero slot in the traditional BDF
>>>      * representation as all five bits reserved for slot addresses are
>>>      * also used for function bits. Ignore that case.
>>
>> you still refer to traditional (which I misread as 'conventional'),
>> steal the linux comment and argument it with ARI if necessary,
>> something like this (probably needs some more massaging):
> 
> The comment messaging in these patches seems to exceed the value of the patch itself :-)
> 
> How about this?
> 
>      /*
>       * A PCIe Downstream Port normally leads to a Link with only Device
>       * 0 on it (PCIe spec r3.1, sec 7.3.1).
>       * With ARI, PCI_SLOT() can return non-zero value as all five bits
>       * reserved for slot addresses are also used for function bits.
>       * Hence, ignore ARI capable devices.
>       */

Perhaps: s/normally leads to/must lead to/

 From the kernel perspective, they may need to deal with a quirky 
hardware that does not conform with the specification, but from QEMU 
perspective, it is what we *must* conform with.

Otherwise looks good to me.

> 
>>
>>
>>          /*
>>          * A PCIe Downstream Port normally leads to a Link with only Device
>>          * 0 on it (PCIe spec r3.1, sec 7.3.1).
>>           However PCI_SLOT() is broken if ARI is enabled, hence work around it
>>           by skipping check if the later cap is present.
>>          */
>>
>>>
>>>
>>>> (for example see how it's done in kernel code: only_one_child(...)
>>>>
>>>> PS:
>>>> kernel can be forced  to scan for !0 device numbers, but that's rather
>>>> a hack, so we shouldn't really care about that.
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>> +     */
>>>>>>>> +    if (pci_is_express(pci_dev) &&
>>>>>>>> +        !pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI) &&
>>>>>>>> +        pcie_has_upstream_port(pci_dev) &&
>>>>>>>> +        PCI_SLOT(pci_dev->devfn)) {
>>>>>>>> +        warn_report("PCI: slot %d is not valid for %s,"
>>>>>>>> +                    " parent device only allows plugging into slot 0.",
>>>>>>>> +                    PCI_SLOT(pci_dev->devfn), pci_dev->name);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>      if (pci_dev->failover_pair_id) {
>>>>>>>>          if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
>>>>>>>>              error_setg(errp, "failover primary device must be on "
>>>
>>
>
Ani Sinha July 5, 2023, 5:43 a.m. UTC | #13
> On 05-Jul-2023, at 7:09 AM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> 
> 
> 
> On 2023/07/05 0:07, Ani Sinha wrote:
>>> On 04-Jul-2023, at 7:58 PM, Igor Mammedov <imammedo@redhat.com> wrote:
>>> 
>>> On Tue, 4 Jul 2023 19:20:00 +0530
>>> Ani Sinha <anisinha@redhat.com> wrote:
>>> 
>>>>> On 04-Jul-2023, at 6:18 PM, Igor Mammedov <imammedo@redhat.com> wrote:
>>>>> 
>>>>> On Tue, 4 Jul 2023 21:02:09 +0900
>>>>> Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>>> 
>>>>>> On 2023/07/04 20:59, Ani Sinha wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> On 04-Jul-2023, at 5:24 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>>>>>> 
>>>>>>>> On 2023/07/04 20:25, Ani Sinha wrote:
>>>>>>>>> PCI Express ports only have one slot, so PCI Express devices can only be
>>>>>>>>> plugged into slot 0 on a PCIE port. Add a warning to let users know when the
>>>>>>>>> invalid configuration is used. We may enforce this more strongly later on once
>>>>>>>>> we get more clarity on whether we are introducing a bad regression for users
>>>>>>>>> currenly using the wrong configuration.
>>>>>>>>> The change has been tested to not break or alter behaviors of ARI capable
>>>>>>>>> devices by instantiating seven vfs on an emulated igb device (the maximum
>>>>>>>>> number of vfs the linux igb driver supports). The vfs instantiated correctly
>>>>>>>>> and are seen to have non-zero device/slot numbers in the conventional PCI BDF
>>>>>>>>> representation.
>>>>>>>>> CC: jusual@redhat.com
>>>>>>>>> CC: imammedo@redhat.com
>>>>>>>>> CC: mst@redhat.com
>>>>>>>>> CC: akihiko.odaki@daynix.com
>>>>>>>>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
>>>>>>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>>>>>>> Reviewed-by: Julia Suvorova <jusual@redhat.com>
>>>>>>>>> ---
>>>>>>>>> hw/pci/pci.c | 15 +++++++++++++++
>>>>>>>>> 1 file changed, 15 insertions(+)
>>>>>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>>>>>> index e2eb4c3b4a..47517ba3db 100644
>>>>>>>>> --- a/hw/pci/pci.c
>>>>>>>>> +++ b/hw/pci/pci.c
>>>>>>>>> @@ -65,6 +65,7 @@ bool pci_available = true;
>>>>>>>>> static char *pcibus_get_dev_path(DeviceState *dev);
>>>>>>>>> static char *pcibus_get_fw_dev_path(DeviceState *dev);
>>>>>>>>> static void pcibus_reset(BusState *qbus);
>>>>>>>>> +static bool pcie_has_upstream_port(PCIDevice *dev);
>>>>>>>>>   static Property pci_props[] = {
>>>>>>>>>     DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>>>>>>>>> @@ -2121,6 +2122,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>>>>>>>>>         }
>>>>>>>>>     }
>>>>>>>>> +    /*
>>>>>>>>> +     * With SRIOV and ARI, vfs can have non-zero slot in the conventional
>>>>>>>>> +     * PCI interpretation as all five bits reserved for slot addresses are
>>>>>>>>> +     * also used for function bits for the various vfs. Ignore that case.
>>>>>>>> 
>>>>>>>> You don't have to mention SR/IOV; it affects all ARI-capable devices. A PF can also have non-zero slot number in the conventional interpretation so you shouldn't call it vf either.
>>>>>>> 
>>>>>>> Can you please help write a comment that explains this properly for all cases - ARI/non-ARI, PFs and VFs? Once everyone agrees that its clear and correct, I will re-spin.
>>>>>> 
>>>>>> Simply, you can say:
>>>>>> With ARI, the slot number field in the conventional PCI interpretation
>>>>>> can have a non-zero value as the field bits are reused to extend the
>>>>>> function number bits. Ignore that case.
>>>>> 
>>>>> mentioning 'conventional PCI interpretation' in comment and then immediately
>>>>> checking 'pci_is_express(pci_dev)' is confusing. Since comment belongs
>>>>> only to PCIE branch it would be better to talk in only about PCIe stuff
>>>>> and referring to relevant portions of spec.
>>>> 
>>>> Ok so how about this?
>>>> 
>>>>   * With ARI, devices can have non-zero slot in the traditional BDF
>>>>     * representation as all five bits reserved for slot addresses are
>>>>     * also used for function bits. Ignore that case.
>>> 
>>> you still refer to traditional (which I misread as 'conventional'),
>>> steal the linux comment and argument it with ARI if necessary,
>>> something like this (probably needs some more massaging):
>> The comment messaging in these patches seems to exceed the value of the patch itself :-)
>> How about this?
>>     /*
>>      * A PCIe Downstream Port normally leads to a Link with only Device
>>      * 0 on it (PCIe spec r3.1, sec 7.3.1).
>>      * With ARI, PCI_SLOT() can return non-zero value as all five bits
>>      * reserved for slot addresses are also used for function bits.
>>      * Hence, ignore ARI capable devices.
>>      */
> 
> Perhaps: s/normally leads to/must lead to/
> 
> From the kernel perspective, they may need to deal with a quirky hardware that does not conform with the specification, but from QEMU perspective, it is what we *must* conform with.

PCI base spec 4.0, rev 3, section 7.3.1 says:

"  
Downstream Ports that do not have ARI Forwarding enabled must associate only Device 0 with the device attached to the Logical Bus representing the Link from the Port. Configuration Requests 15 targeting the Bus Number associated with a Link specifying Device Number 0 are delivered to the device attached to the Link; Configuration Requests specifying all other Device Numbers (1-31) must be terminated by the Switch Downstream Port or the Root Port with an Unsupported Request Completion Status (equivalent to Master Abort in PCI). Non-ARI Devices must not assume that Device Number 0 is associated with their Upstream Port, but must capture their assigned Device Number as discussed in Section 2.2.6.2. Non-ARI Devices must respond to all Type 0 Configuration Read Requests, regardless of the Device Number specified in the Request.

…

With an ARI Device, its Device Number is implied to be 0 rather than specified by a field within an ID. The traditional 5-bit Device Number and 3-bit Function Number fields in its associated Routing IDs, Requester IDs, and Completer IDs are interpreted as a single 8-bit Function Number. See Section 6.13. Any Type 0 Configuration Request targeting an unimplemented Function in an ARI Device must be handled as an Unsupported Request.

“

So it seems they do indeed use the “must” clause. I prefer to use the line from the spec verbatim as possible. Hence, this is what I am going with and be done with this patchset:

    /*                                                                                                                                                  
     * A PCIe Downstream Port that do not have ARI Forwarding enabled must                                                                              
     * associate only Device 0 with the device attached to the bus                                                                                      
     * representing the Link from the Port (PCIe base spec rev 4.0 ver 0.3,                                                                             
     * sec 7.3.1).                                                                                                                                      
     * With ARI, PCI_SLOT() can return non-zero value as the traditional                                                                                
     * 5-bit Device Number and 3-bit Function Number fields in its associated                                                                           
     * Routing IDs, Requester IDs and Completer IDs are interpreted as a                                                                                
     * single 8-bit Function Number. Hence, ignore ARI capable devices.                                                                                 
     */


> 
> Otherwise looks good to me.
> 
>>> 
>>> 
>>>         /*
>>>         * A PCIe Downstream Port normally leads to a Link with only Device
>>>         * 0 on it (PCIe spec r3.1, sec 7.3.1).
>>>          However PCI_SLOT() is broken if ARI is enabled, hence work around it
>>>          by skipping check if the later cap is present.
>>>         */
>>> 
>>>> 
>>>> 
>>>>> (for example see how it's done in kernel code: only_one_child(...)
>>>>> 
>>>>> PS:
>>>>> kernel can be forced  to scan for !0 device numbers, but that's rather
>>>>> a hack, so we shouldn't really care about that.
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> +     */
>>>>>>>>> +    if (pci_is_express(pci_dev) &&
>>>>>>>>> +        !pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI) &&
>>>>>>>>> +        pcie_has_upstream_port(pci_dev) &&
>>>>>>>>> +        PCI_SLOT(pci_dev->devfn)) {
>>>>>>>>> +        warn_report("PCI: slot %d is not valid for %s,"
>>>>>>>>> +                    " parent device only allows plugging into slot 0.",
>>>>>>>>> +                    PCI_SLOT(pci_dev->devfn), pci_dev->name);
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>>     if (pci_dev->failover_pair_id) {
>>>>>>>>>         if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
>>>>>>>>>             error_setg(errp, "failover primary device must be on "
>>>> 
>>> 
>
Akihiko Odaki July 5, 2023, 10:42 a.m. UTC | #14
On 2023/07/05 14:43, Ani Sinha wrote:
> 
> 
>> On 05-Jul-2023, at 7:09 AM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>
>>
>>
>> On 2023/07/05 0:07, Ani Sinha wrote:
>>>> On 04-Jul-2023, at 7:58 PM, Igor Mammedov <imammedo@redhat.com> wrote:
>>>>
>>>> On Tue, 4 Jul 2023 19:20:00 +0530
>>>> Ani Sinha <anisinha@redhat.com> wrote:
>>>>
>>>>>> On 04-Jul-2023, at 6:18 PM, Igor Mammedov <imammedo@redhat.com> wrote:
>>>>>>
>>>>>> On Tue, 4 Jul 2023 21:02:09 +0900
>>>>>> Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>>>>
>>>>>>> On 2023/07/04 20:59, Ani Sinha wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 04-Jul-2023, at 5:24 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>>>>>>>
>>>>>>>>> On 2023/07/04 20:25, Ani Sinha wrote:
>>>>>>>>>> PCI Express ports only have one slot, so PCI Express devices can only be
>>>>>>>>>> plugged into slot 0 on a PCIE port. Add a warning to let users know when the
>>>>>>>>>> invalid configuration is used. We may enforce this more strongly later on once
>>>>>>>>>> we get more clarity on whether we are introducing a bad regression for users
>>>>>>>>>> currenly using the wrong configuration.
>>>>>>>>>> The change has been tested to not break or alter behaviors of ARI capable
>>>>>>>>>> devices by instantiating seven vfs on an emulated igb device (the maximum
>>>>>>>>>> number of vfs the linux igb driver supports). The vfs instantiated correctly
>>>>>>>>>> and are seen to have non-zero device/slot numbers in the conventional PCI BDF
>>>>>>>>>> representation.
>>>>>>>>>> CC: jusual@redhat.com
>>>>>>>>>> CC: imammedo@redhat.com
>>>>>>>>>> CC: mst@redhat.com
>>>>>>>>>> CC: akihiko.odaki@daynix.com
>>>>>>>>>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
>>>>>>>>>> Signed-off-by: Ani Sinha <anisinha@redhat.com>
>>>>>>>>>> Reviewed-by: Julia Suvorova <jusual@redhat.com>
>>>>>>>>>> ---
>>>>>>>>>> hw/pci/pci.c | 15 +++++++++++++++
>>>>>>>>>> 1 file changed, 15 insertions(+)
>>>>>>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>>>>>>> index e2eb4c3b4a..47517ba3db 100644
>>>>>>>>>> --- a/hw/pci/pci.c
>>>>>>>>>> +++ b/hw/pci/pci.c
>>>>>>>>>> @@ -65,6 +65,7 @@ bool pci_available = true;
>>>>>>>>>> static char *pcibus_get_dev_path(DeviceState *dev);
>>>>>>>>>> static char *pcibus_get_fw_dev_path(DeviceState *dev);
>>>>>>>>>> static void pcibus_reset(BusState *qbus);
>>>>>>>>>> +static bool pcie_has_upstream_port(PCIDevice *dev);
>>>>>>>>>>    static Property pci_props[] = {
>>>>>>>>>>      DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>>>>>>>>>> @@ -2121,6 +2122,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>>>>>>>>>>          }
>>>>>>>>>>      }
>>>>>>>>>> +    /*
>>>>>>>>>> +     * With SRIOV and ARI, vfs can have non-zero slot in the conventional
>>>>>>>>>> +     * PCI interpretation as all five bits reserved for slot addresses are
>>>>>>>>>> +     * also used for function bits for the various vfs. Ignore that case.
>>>>>>>>>
>>>>>>>>> You don't have to mention SR/IOV; it affects all ARI-capable devices. A PF can also have non-zero slot number in the conventional interpretation so you shouldn't call it vf either.
>>>>>>>>
>>>>>>>> Can you please help write a comment that explains this properly for all cases - ARI/non-ARI, PFs and VFs? Once everyone agrees that its clear and correct, I will re-spin.
>>>>>>>
>>>>>>> Simply, you can say:
>>>>>>> With ARI, the slot number field in the conventional PCI interpretation
>>>>>>> can have a non-zero value as the field bits are reused to extend the
>>>>>>> function number bits. Ignore that case.
>>>>>>
>>>>>> mentioning 'conventional PCI interpretation' in comment and then immediately
>>>>>> checking 'pci_is_express(pci_dev)' is confusing. Since comment belongs
>>>>>> only to PCIE branch it would be better to talk in only about PCIe stuff
>>>>>> and referring to relevant portions of spec.
>>>>>
>>>>> Ok so how about this?
>>>>>
>>>>>    * With ARI, devices can have non-zero slot in the traditional BDF
>>>>>      * representation as all five bits reserved for slot addresses are
>>>>>      * also used for function bits. Ignore that case.
>>>>
>>>> you still refer to traditional (which I misread as 'conventional'),
>>>> steal the linux comment and argument it with ARI if necessary,
>>>> something like this (probably needs some more massaging):
>>> The comment messaging in these patches seems to exceed the value of the patch itself :-)
>>> How about this?
>>>      /*
>>>       * A PCIe Downstream Port normally leads to a Link with only Device
>>>       * 0 on it (PCIe spec r3.1, sec 7.3.1).
>>>       * With ARI, PCI_SLOT() can return non-zero value as all five bits
>>>       * reserved for slot addresses are also used for function bits.
>>>       * Hence, ignore ARI capable devices.
>>>       */
>>
>> Perhaps: s/normally leads to/must lead to/
>>
>>  From the kernel perspective, they may need to deal with a quirky hardware that does not conform with the specification, but from QEMU perspective, it is what we *must* conform with.
> 
> PCI base spec 4.0, rev 3, section 7.3.1 says:
> 
> "
> Downstream Ports that do not have ARI Forwarding enabled must associate only Device 0 with the device attached to the Logical Bus representing the Link from the Port. Configuration Requests 15 targeting the Bus Number associated with a Link specifying Device Number 0 are delivered to the device attached to the Link; Configuration Requests specifying all other Device Numbers (1-31) must be terminated by the Switch Downstream Port or the Root Port with an Unsupported Request Completion Status (equivalent to Master Abort in PCI). Non-ARI Devices must not assume that Device Number 0 is associated with their Upstream Port, but must capture their assigned Device Number as discussed in Section 2.2.6.2. Non-ARI Devices must respond to all Type 0 Configuration Read Requests, regardless of the Device Number specified in the Request.
> 
> …
> 
> With an ARI Device, its Device Number is implied to be 0 rather than specified by a field within an ID. The traditional 5-bit Device Number and 3-bit Function Number fields in its associated Routing IDs, Requester IDs, and Completer IDs are interpreted as a single 8-bit Function Number. See Section 6.13. Any Type 0 Configuration Request targeting an unimplemented Function in an ARI Device must be handled as an Unsupported Request.
> 
> “
> 
> So it seems they do indeed use the “must” clause. I prefer to use the line from the spec verbatim as possible. Hence, this is what I am going with and be done with this patchset:
> 
>      /*
>       * A PCIe Downstream Port that do not have ARI Forwarding enabled must
>       * associate only Device 0 with the device attached to the bus
>       * representing the Link from the Port (PCIe base spec rev 4.0 ver 0.3,
>       * sec 7.3.1).
>       * With ARI, PCI_SLOT() can return non-zero value as the traditional
>       * 5-bit Device Number and 3-bit Function Number fields in its associated
>       * Routing IDs, Requester IDs and Completer IDs are interpreted as a
>       * single 8-bit Function Number. Hence, ignore ARI capable devices.
>       */

Looks perfect.

> 
> 
>>
>> Otherwise looks good to me.
>>
>>>>
>>>>
>>>>          /*
>>>>          * A PCIe Downstream Port normally leads to a Link with only Device
>>>>          * 0 on it (PCIe spec r3.1, sec 7.3.1).
>>>>           However PCI_SLOT() is broken if ARI is enabled, hence work around it
>>>>           by skipping check if the later cap is present.
>>>>          */
>>>>
>>>>>
>>>>>
>>>>>> (for example see how it's done in kernel code: only_one_child(...)
>>>>>>
>>>>>> PS:
>>>>>> kernel can be forced  to scan for !0 device numbers, but that's rather
>>>>>> a hack, so we shouldn't really care about that.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> +     */
>>>>>>>>>> +    if (pci_is_express(pci_dev) &&
>>>>>>>>>> +        !pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI) &&
>>>>>>>>>> +        pcie_has_upstream_port(pci_dev) &&
>>>>>>>>>> +        PCI_SLOT(pci_dev->devfn)) {
>>>>>>>>>> +        warn_report("PCI: slot %d is not valid for %s,"
>>>>>>>>>> +                    " parent device only allows plugging into slot 0.",
>>>>>>>>>> +                    PCI_SLOT(pci_dev->devfn), pci_dev->name);
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>>      if (pci_dev->failover_pair_id) {
>>>>>>>>>>          if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
>>>>>>>>>>              error_setg(errp, "failover primary device must be on "
>>>>>
>>>>
>>
>
diff mbox series

Patch

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e2eb4c3b4a..47517ba3db 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -65,6 +65,7 @@  bool pci_available = true;
 static char *pcibus_get_dev_path(DeviceState *dev);
 static char *pcibus_get_fw_dev_path(DeviceState *dev);
 static void pcibus_reset(BusState *qbus);
+static bool pcie_has_upstream_port(PCIDevice *dev);
 
 static Property pci_props[] = {
     DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
@@ -2121,6 +2122,20 @@  static void pci_qdev_realize(DeviceState *qdev, Error **errp)
         }
     }
 
+    /*
+     * With SRIOV and ARI, vfs can have non-zero slot in the conventional
+     * PCI interpretation as all five bits reserved for slot addresses are
+     * also used for function bits for the various vfs. Ignore that case.
+     */
+    if (pci_is_express(pci_dev) &&
+        !pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI) &&
+        pcie_has_upstream_port(pci_dev) &&
+        PCI_SLOT(pci_dev->devfn)) {
+        warn_report("PCI: slot %d is not valid for %s,"
+                    " parent device only allows plugging into slot 0.",
+                    PCI_SLOT(pci_dev->devfn), pci_dev->name);
+    }
+
     if (pci_dev->failover_pair_id) {
         if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
             error_setg(errp, "failover primary device must be on "