diff mbox series

[4/4] pci: Compare function number and ARI next function number

Message ID 20230701070133.24877-5-akihiko.odaki@daynix.com
State New
Headers show
Series pci: Compare function number and ARI next function number | expand

Commit Message

Akihiko Odaki July 1, 2023, 7:01 a.m. UTC
The function number must be lower than the next function number
advertised with ARI.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/pci/pci.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

Michael S. Tsirkin July 2, 2023, 4:58 a.m. UTC | #1
On Sat, Jul 01, 2023 at 04:01:22PM +0900, Akihiko Odaki wrote:
> The function number must be lower than the next function number
> advertised with ARI.
> 
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>

I don't get this logic at all - where is the limitation coming from?

All I see in the spec is:
	Next Function Number - With non-VFs, this field indicates the Function Number of the next higher
	numbered Function in the Device, or 00h if there are no higher numbered Functions. Function 0 starts
	this linked list of Functions.
	The presence of Shadow Functions does not affect this field.
	For VFs, this field is undefined since VFs are located using First VF Offset (see § Section 9.3.3.9 ) and VF
	Stride (see § Section 9.3.3.10 ).

and

	 To improve the enumeration performance and create a more deterministic solution, software can
	enumerate Functions through a linked list of Function Numbers. The next linked list element is
	communicated through each Function’s ARI Capability Register.
	i. Function 0 acts as the head of a linked list of Function Numbers. Software detects a
	non-Zero Next Function Number field within the ARI Capability Register as the next
	Function within the linked list. Software issues a configuration probe using the Bus Number
	captured by the Device and the Function Number derived from the ARI Capability Register
	to locate the next associated Function’s configuration space.
	ii. Function Numbers may be sparse and non-sequential in their consumption by an ARI
	Device.





> ---
>  hw/pci/pci.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index e2eb4c3b4a..568665ee42 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2059,6 +2059,8 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>      Error *local_err = NULL;
>      bool is_default_rom;
>      uint16_t class_id;
> +    uint16_t ari;
> +    uint16_t nextfn;
>  
>      /*
>       * capped by systemd (see: udev-builtin-net_id.c)
> @@ -2121,6 +2123,19 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>          }
>      }
>  
> +    if (pci_is_express(pci_dev)) {
> +        ari = pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI);
> +        if (ari) {
> +            nextfn = (pci_get_long(pci_dev->config + ari + PCI_ARI_CAP) >> 8) & 0xff;
> +            if (nextfn && (pci_dev->devfn & 0xff) >= nextfn) {
> +                error_setg(errp, "PCI: function number %u is not lower than ARI next function number %u",
> +                           pci_dev->devfn & 0xff, nextfn);
> +                pci_qdev_unrealize(DEVICE(pci_dev));
> +                return;
> +            }
> +        }
> +    }
> +
>      if (pci_dev->failover_pair_id) {
>          if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
>              error_setg(errp, "failover primary device must be on "
> -- 
> 2.41.0
Akihiko Odaki July 2, 2023, 8:46 a.m. UTC | #2
On 2023/07/02 13:58, Michael S. Tsirkin wrote:
> On Sat, Jul 01, 2023 at 04:01:22PM +0900, Akihiko Odaki wrote:
>> The function number must be lower than the next function number
>> advertised with ARI.
>>
>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> 
> I don't get this logic at all - where is the limitation coming from?
> 
> All I see in the spec is:
> 	Next Function Number - With non-VFs, this field indicates the Function Number of the next higher
> 	numbered Function in the Device, or 00h if there are no higher numbered Functions. Function 0 starts
> 	this linked list of Functions.
> 	The presence of Shadow Functions does not affect this field.
> 	For VFs, this field is undefined since VFs are located using First VF Offset (see § Section 9.3.3.9 ) and VF
> 	Stride (see § Section 9.3.3.10 ).
> 
> and
> 
> 	 To improve the enumeration performance and create a more deterministic solution, software can
> 	enumerate Functions through a linked list of Function Numbers. The next linked list element is
> 	communicated through each Function’s ARI Capability Register.
> 	i. Function 0 acts as the head of a linked list of Function Numbers. Software detects a
> 	non-Zero Next Function Number field within the ARI Capability Register as the next
> 	Function within the linked list. Software issues a configuration probe using the Bus Number
> 	captured by the Device and the Function Number derived from the ARI Capability Register
> 	to locate the next associated Function’s configuration space.
> 	ii. Function Numbers may be sparse and non-sequential in their consumption by an ARI
> 	Device.

The statement "With non-VFs, this field indicates the Function Number of 
the next higher numbered Function in the Device, or 00h if there are no 
higher numbered Functions." implies the Function Number of the device 
should be lower than the value advertised by the field (for non-VFs; 
this patch does not check if it's VF or not.)

> 
> 
> 
> 
> 
>> ---
>>   hw/pci/pci.c | 15 +++++++++++++++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index e2eb4c3b4a..568665ee42 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -2059,6 +2059,8 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>>       Error *local_err = NULL;
>>       bool is_default_rom;
>>       uint16_t class_id;
>> +    uint16_t ari;
>> +    uint16_t nextfn;
>>   
>>       /*
>>        * capped by systemd (see: udev-builtin-net_id.c)
>> @@ -2121,6 +2123,19 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>>           }
>>       }
>>   
>> +    if (pci_is_express(pci_dev)) {
>> +        ari = pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI);
>> +        if (ari) {
>> +            nextfn = (pci_get_long(pci_dev->config + ari + PCI_ARI_CAP) >> 8) & 0xff;
>> +            if (nextfn && (pci_dev->devfn & 0xff) >= nextfn) {
>> +                error_setg(errp, "PCI: function number %u is not lower than ARI next function number %u",
>> +                           pci_dev->devfn & 0xff, nextfn);
>> +                pci_qdev_unrealize(DEVICE(pci_dev));
>> +                return;
>> +            }
>> +        }
>> +    }
>> +
>>       if (pci_dev->failover_pair_id) {
>>           if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
>>               error_setg(errp, "failover primary device must be on "
>> -- 
>> 2.41.0
>
Michael S. Tsirkin July 2, 2023, 8:55 a.m. UTC | #3
On Sun, Jul 02, 2023 at 05:46:38PM +0900, Akihiko Odaki wrote:
> On 2023/07/02 13:58, Michael S. Tsirkin wrote:
> > On Sat, Jul 01, 2023 at 04:01:22PM +0900, Akihiko Odaki wrote:
> > > The function number must be lower than the next function number
> > > advertised with ARI.
> > > 
> > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > 
> > I don't get this logic at all - where is the limitation coming from?
> > 
> > All I see in the spec is:
> > 	Next Function Number - With non-VFs, this field indicates the Function Number of the next higher
> > 	numbered Function in the Device, or 00h if there are no higher numbered Functions. Function 0 starts
> > 	this linked list of Functions.
> > 	The presence of Shadow Functions does not affect this field.
> > 	For VFs, this field is undefined since VFs are located using First VF Offset (see § Section 9.3.3.9 ) and VF
> > 	Stride (see § Section 9.3.3.10 ).
> > 
> > and
> > 
> > 	 To improve the enumeration performance and create a more deterministic solution, software can
> > 	enumerate Functions through a linked list of Function Numbers. The next linked list element is
> > 	communicated through each Function’s ARI Capability Register.
> > 	i. Function 0 acts as the head of a linked list of Function Numbers. Software detects a
> > 	non-Zero Next Function Number field within the ARI Capability Register as the next
> > 	Function within the linked list. Software issues a configuration probe using the Bus Number
> > 	captured by the Device and the Function Number derived from the ARI Capability Register
> > 	to locate the next associated Function’s configuration space.
> > 	ii. Function Numbers may be sparse and non-sequential in their consumption by an ARI
> > 	Device.
> 
> The statement "With non-VFs, this field indicates the Function Number of the
> next higher numbered Function in the Device, or 00h if there are no higher
> numbered Functions." implies the Function Number of the device should be
> lower than the value advertised by the field (for non-VFs; this patch does
> not check if it's VF or not.)


Now I get it. Good point! I'd say if we want this check we should add
it in pcie_ari_init, making that return int.
But for now it's dead code since your are changing it to 0.
So maybe a comment in pcie_ari_init is enough:

/*
 * Note: nextfn must be the Function Number of the
 * next higher numbered Function in the Device, or 00h if there are no higher
 * numbered Functions.
 * TODO: validate this.
 */

> > 
> > 
> > 
> > 
> > 
> > > ---
> > >   hw/pci/pci.c | 15 +++++++++++++++
> > >   1 file changed, 15 insertions(+)
> > > 
> > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > index e2eb4c3b4a..568665ee42 100644
> > > --- a/hw/pci/pci.c
> > > +++ b/hw/pci/pci.c
> > > @@ -2059,6 +2059,8 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
> > >       Error *local_err = NULL;
> > >       bool is_default_rom;
> > >       uint16_t class_id;
> > > +    uint16_t ari;
> > > +    uint16_t nextfn;
> > >       /*
> > >        * capped by systemd (see: udev-builtin-net_id.c)
> > > @@ -2121,6 +2123,19 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
> > >           }
> > >       }
> > > +    if (pci_is_express(pci_dev)) {
> > > +        ari = pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI);
> > > +        if (ari) {
> > > +            nextfn = (pci_get_long(pci_dev->config + ari + PCI_ARI_CAP) >> 8) & 0xff;
> > > +            if (nextfn && (pci_dev->devfn & 0xff) >= nextfn) {
> > > +                error_setg(errp, "PCI: function number %u is not lower than ARI next function number %u",
> > > +                           pci_dev->devfn & 0xff, nextfn);
> > > +                pci_qdev_unrealize(DEVICE(pci_dev));
> > > +                return;
> > > +            }
> > > +        }
> > > +    }
> > > +
> > >       if (pci_dev->failover_pair_id) {
> > >           if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
> > >               error_setg(errp, "failover primary device must be on "
> > > -- 
> > > 2.41.0
> >
Michael S. Tsirkin July 2, 2023, 8:57 a.m. UTC | #4
On Sun, Jul 02, 2023 at 04:55:48AM -0400, Michael S. Tsirkin wrote:
> On Sun, Jul 02, 2023 at 05:46:38PM +0900, Akihiko Odaki wrote:
> > On 2023/07/02 13:58, Michael S. Tsirkin wrote:
> > > On Sat, Jul 01, 2023 at 04:01:22PM +0900, Akihiko Odaki wrote:
> > > > The function number must be lower than the next function number
> > > > advertised with ARI.
> > > > 
> > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > 
> > > I don't get this logic at all - where is the limitation coming from?
> > > 
> > > All I see in the spec is:
> > > 	Next Function Number - With non-VFs, this field indicates the Function Number of the next higher
> > > 	numbered Function in the Device, or 00h if there are no higher numbered Functions. Function 0 starts
> > > 	this linked list of Functions.
> > > 	The presence of Shadow Functions does not affect this field.
> > > 	For VFs, this field is undefined since VFs are located using First VF Offset (see § Section 9.3.3.9 ) and VF
> > > 	Stride (see § Section 9.3.3.10 ).
> > > 
> > > and
> > > 
> > > 	 To improve the enumeration performance and create a more deterministic solution, software can
> > > 	enumerate Functions through a linked list of Function Numbers. The next linked list element is
> > > 	communicated through each Function’s ARI Capability Register.
> > > 	i. Function 0 acts as the head of a linked list of Function Numbers. Software detects a
> > > 	non-Zero Next Function Number field within the ARI Capability Register as the next
> > > 	Function within the linked list. Software issues a configuration probe using the Bus Number
> > > 	captured by the Device and the Function Number derived from the ARI Capability Register
> > > 	to locate the next associated Function’s configuration space.
> > > 	ii. Function Numbers may be sparse and non-sequential in their consumption by an ARI
> > > 	Device.
> > 
> > The statement "With non-VFs, this field indicates the Function Number of the
> > next higher numbered Function in the Device, or 00h if there are no higher
> > numbered Functions." implies the Function Number of the device should be
> > lower than the value advertised by the field (for non-VFs; this patch does
> > not check if it's VF or not.)
> 
> 
> Now I get it. Good point! I'd say if we want this check we should add
> it in pcie_ari_init, making that return int.
> But for now it's dead code since your are changing it to 0.
> So maybe a comment in pcie_ari_init is enough:
> 
> /*
>  * Note: nextfn must be the Function Number of the
>  * next higher numbered Function in the Device, or 00h if there are no higher
>  * numbered Functions.
>  * TODO: validate this.
>  */

Or add an assert, and
	TODO: in case this can ever come from command line, we'll have
	to replace the assert below with a runtime check.


> > > 
> > > 
> > > 
> > > 
> > > 
> > > > ---
> > > >   hw/pci/pci.c | 15 +++++++++++++++
> > > >   1 file changed, 15 insertions(+)
> > > > 
> > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > > index e2eb4c3b4a..568665ee42 100644
> > > > --- a/hw/pci/pci.c
> > > > +++ b/hw/pci/pci.c
> > > > @@ -2059,6 +2059,8 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
> > > >       Error *local_err = NULL;
> > > >       bool is_default_rom;
> > > >       uint16_t class_id;
> > > > +    uint16_t ari;
> > > > +    uint16_t nextfn;
> > > >       /*
> > > >        * capped by systemd (see: udev-builtin-net_id.c)
> > > > @@ -2121,6 +2123,19 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
> > > >           }
> > > >       }
> > > > +    if (pci_is_express(pci_dev)) {
> > > > +        ari = pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI);
> > > > +        if (ari) {
> > > > +            nextfn = (pci_get_long(pci_dev->config + ari + PCI_ARI_CAP) >> 8) & 0xff;
> > > > +            if (nextfn && (pci_dev->devfn & 0xff) >= nextfn) {
> > > > +                error_setg(errp, "PCI: function number %u is not lower than ARI next function number %u",
> > > > +                           pci_dev->devfn & 0xff, nextfn);
> > > > +                pci_qdev_unrealize(DEVICE(pci_dev));
> > > > +                return;
> > > > +            }
> > > > +        }
> > > > +    }
> > > > +
> > > >       if (pci_dev->failover_pair_id) {
> > > >           if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
> > > >               error_setg(errp, "failover primary device must be on "
> > > > -- 
> > > > 2.41.0
> > >
Ani Sinha July 11, 2023, 7:10 a.m. UTC | #5
> On 01-Jul-2023, at 12:31 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> 
> The function number must be lower than the next function number
> advertised with ARI.
> 
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
> hw/pci/pci.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index e2eb4c3b4a..568665ee42 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2059,6 +2059,8 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>     Error *local_err = NULL;
>     bool is_default_rom;
>     uint16_t class_id;
> +    uint16_t ari;
> +    uint16_t nextfn;
> 
>     /*
>      * capped by systemd (see: udev-builtin-net_id.c)
> @@ -2121,6 +2123,19 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>         }
>     }
> 
> +    if (pci_is_express(pci_dev)) {
> +        ari = pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI);
> +        if (ari) {
> +            nextfn = (pci_get_long(pci_dev->config + ari + PCI_ARI_CAP) >> 8) & 0xff;
> +            if (nextfn && (pci_dev->devfn & 0xff) >= nextfn) {
> +                error_setg(errp, "PCI: function number %u is not lower than ARI next function number %u",
> +                           pci_dev->devfn & 0xff, nextfn);
> +                pci_qdev_unrealize(DEVICE(pci_dev));
> +                return;
> +            }
> +        }
> +    }
> +

So I kind of got lost in all the patches. What was the ultimate decision regarding checking this?

>     if (pci_dev->failover_pair_id) {
>         if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
>             error_setg(errp, "failover primary device must be on "
> -- 
> 2.41.0
>
Michael S. Tsirkin July 11, 2023, 8:33 a.m. UTC | #6
On Tue, Jul 11, 2023 at 12:40:47PM +0530, Ani Sinha wrote:
> 
> 
> > On 01-Jul-2023, at 12:31 PM, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> > 
> > The function number must be lower than the next function number
> > advertised with ARI.
> > 
> > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > ---
> > hw/pci/pci.c | 15 +++++++++++++++
> > 1 file changed, 15 insertions(+)
> > 
> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > index e2eb4c3b4a..568665ee42 100644
> > --- a/hw/pci/pci.c
> > +++ b/hw/pci/pci.c
> > @@ -2059,6 +2059,8 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
> >     Error *local_err = NULL;
> >     bool is_default_rom;
> >     uint16_t class_id;
> > +    uint16_t ari;
> > +    uint16_t nextfn;
> > 
> >     /*
> >      * capped by systemd (see: udev-builtin-net_id.c)
> > @@ -2121,6 +2123,19 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
> >         }
> >     }
> > 
> > +    if (pci_is_express(pci_dev)) {
> > +        ari = pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI);
> > +        if (ari) {
> > +            nextfn = (pci_get_long(pci_dev->config + ari + PCI_ARI_CAP) >> 8) & 0xff;
> > +            if (nextfn && (pci_dev->devfn & 0xff) >= nextfn) {
> > +                error_setg(errp, "PCI: function number %u is not lower than ARI next function number %u",
> > +                           pci_dev->devfn & 0xff, nextfn);
> > +                pci_qdev_unrealize(DEVICE(pci_dev));
> > +                return;
> > +            }
> > +        }
> > +    }
> > +
> 
> So I kind of got lost in all the patches. What was the ultimate decision regarding checking this?

We still need to fix ARI for multi-function PFs.
I feel the right thing to do is to init Next Function in the ARI
capability, automatically.
For now, we have merely changed ARI setting next function to 0.
At least that's more correct for the common case of ARI PF with VFs.


> >     if (pci_dev->failover_pair_id) {
> >         if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
> >             error_setg(errp, "failover primary device must be on "
> > -- 
> > 2.41.0
> >
diff mbox series

Patch

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e2eb4c3b4a..568665ee42 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2059,6 +2059,8 @@  static void pci_qdev_realize(DeviceState *qdev, Error **errp)
     Error *local_err = NULL;
     bool is_default_rom;
     uint16_t class_id;
+    uint16_t ari;
+    uint16_t nextfn;
 
     /*
      * capped by systemd (see: udev-builtin-net_id.c)
@@ -2121,6 +2123,19 @@  static void pci_qdev_realize(DeviceState *qdev, Error **errp)
         }
     }
 
+    if (pci_is_express(pci_dev)) {
+        ari = pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI);
+        if (ari) {
+            nextfn = (pci_get_long(pci_dev->config + ari + PCI_ARI_CAP) >> 8) & 0xff;
+            if (nextfn && (pci_dev->devfn & 0xff) >= nextfn) {
+                error_setg(errp, "PCI: function number %u is not lower than ARI next function number %u",
+                           pci_dev->devfn & 0xff, nextfn);
+                pci_qdev_unrealize(DEVICE(pci_dev));
+                return;
+            }
+        }
+    }
+
     if (pci_dev->failover_pair_id) {
         if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
             error_setg(errp, "failover primary device must be on "