diff mbox series

[net-next,2/3] devlink: Consider other controller while building phys_port_name

Message ID 20200825135839.106796-3-parav@mellanox.com
State Changes Requested
Delegated to: David Miller
Headers show
Series devlink show controller info | expand

Commit Message

Parav Pandit Aug. 25, 2020, 1:58 p.m. UTC
A devlink port may be for a controller consist of PCI device.
A devlink instance holds ports of two types of controllers.
(1) controller discovered on same system where eswitch resides
This is the case where PCI PF/VF of a controller and devlink eswitch
instance both are located on a single system.
(2) controller located on other system.
This is the case where a controller is located in one system and its
devlink eswitch ports are located in a different system. In this case
devlink instance of the eswitch only have access to ports of the
controller.

When a devlink eswitch instance serves the devlink ports of both
controllers together, PCI PF/VF numbers may overlap.
Due to this a unique phys_port_name cannot be constructed.

To differentiate ports of a controller external/remote to the devlink
instance, consider external controller number while forming the
phys_port_name.
Also return this optional controller number of the port via netlink
interface.

An example output:

$ devlink port show pci/0000:00:08.0/2
pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour pcivf pfnum 0 vfnum 1 splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show -jp pci/0000:00:08.0/2
{
    "port": {
        "pci/0000:00:08.0/1": {
            "type": "eth",
            "netdev": "eth7",
            "controller": 0,
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

An example representor netdev udev name consist of controller
annotation for external controller with controller number = 0,
for PF 0 and VF 1:

udevadm test-builtin net_id /sys/class/net/eth7
Using default interface naming scheme 'v245'.
ID_NET_NAMING_SCHEME=v245
ID_NET_NAME_PATH=enp0s8f0nc0pf0vf1
Unload module index
Unloaded link configuration context.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
---
 include/net/devlink.h        |  7 ++++++-
 include/uapi/linux/devlink.h |  1 +
 net/core/devlink.c           | 29 +++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+), 1 deletion(-)

Comments

Jakub Kicinski Aug. 26, 2020, 12:32 a.m. UTC | #1
On Tue, 25 Aug 2020 16:58:38 +0300 Parav Pandit wrote:
> A devlink port may be for a controller consist of PCI device.
> A devlink instance holds ports of two types of controllers.
> (1) controller discovered on same system where eswitch resides
> This is the case where PCI PF/VF of a controller and devlink eswitch
> instance both are located on a single system.
> (2) controller located on other system.
> This is the case where a controller is located in one system and its
> devlink eswitch ports are located in a different system. In this case
> devlink instance of the eswitch only have access to ports of the
> controller.
> 
> When a devlink eswitch instance serves the devlink ports of both
> controllers together, PCI PF/VF numbers may overlap.
> Due to this a unique phys_port_name cannot be constructed.

This description is clear as mud to me. Is it just me? Can someone
understand this?
Parav Pandit Aug. 26, 2020, 4:27 a.m. UTC | #2
Hi Jakub,

> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org>
> On Behalf Of Jakub Kicinski
> Sent: Wednesday, August 26, 2020 6:02 AM
> 
> On Tue, 25 Aug 2020 16:58:38 +0300 Parav Pandit wrote:
> > A devlink port may be for a controller consist of PCI device.
> > A devlink instance holds ports of two types of controllers.
> > (1) controller discovered on same system where eswitch resides This is
> > the case where PCI PF/VF of a controller and devlink eswitch instance
> > both are located on a single system.
> > (2) controller located on other system.
> > This is the case where a controller is located in one system and its
> > devlink eswitch ports are located in a different system. In this case
> > devlink instance of the eswitch only have access to ports of the
> > controller.
> >
> > When a devlink eswitch instance serves the devlink ports of both
> > controllers together, PCI PF/VF numbers may overlap.
> > Due to this a unique phys_port_name cannot be constructed.
> 
> This description is clear as mud to me. Is it just me? Can someone understand
> this?

I would like to improve this description.
Do you have an input to describe these two different controllers, each has same PF and VF numbers?

$ devlink port show looks like below without a controller annotation.
pci/0000:00:08.0/0: type eth netdev eth5 flavour physical
pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
                                                                                                   ^^^^^^^
Jakub Kicinski Aug. 26, 2020, 8:07 p.m. UTC | #3
On Wed, 26 Aug 2020 04:27:35 +0000 Parav Pandit wrote:
> > On Tue, 25 Aug 2020 16:58:38 +0300 Parav Pandit wrote:  
> > > A devlink port may be for a controller consist of PCI device.
> > > A devlink instance holds ports of two types of controllers.
> > > (1) controller discovered on same system where eswitch resides This is
> > > the case where PCI PF/VF of a controller and devlink eswitch instance
> > > both are located on a single system.
> > > (2) controller located on other system.
> > > This is the case where a controller is located in one system and its
> > > devlink eswitch ports are located in a different system. In this case
> > > devlink instance of the eswitch only have access to ports of the
> > > controller.
> > >
> > > When a devlink eswitch instance serves the devlink ports of both
> > > controllers together, PCI PF/VF numbers may overlap.
> > > Due to this a unique phys_port_name cannot be constructed.  
> > 
> > This description is clear as mud to me. Is it just me? Can someone understand
> > this?  
> 
> I would like to improve this description.
> Do you have an input to describe these two different controllers,
> each has same PF and VF numbers?

Not yet, I'm just trying to figure out how things come together.

Are some VFs of the same PF under one controller and other ones under 
a different controller? 

> $ devlink port show looks like below without a controller annotation.
> pci/0000:00:08.0/0: type eth netdev eth5 flavour physical
> pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0

How can you have two PF 0? Aaah - by controller you mean hardware IP,
not whoever is controlling the switching! So the chip has multiple HW
controllers, each of which can have multiple PFs?

Definitely please make that more clear.

Why is @controller_num not under PCI port attrs, but a separate field
without even a mention of PCI? Are some of the controllers a different
bus?
Parav Pandit Aug. 27, 2020, 4:31 a.m. UTC | #4
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Thursday, August 27, 2020 1:38 AM
> 
> On Wed, 26 Aug 2020 04:27:35 +0000 Parav Pandit wrote:
> > > On Tue, 25 Aug 2020 16:58:38 +0300 Parav Pandit wrote:
> > > > A devlink port may be for a controller consist of PCI device.
> > > > A devlink instance holds ports of two types of controllers.
> > > > (1) controller discovered on same system where eswitch resides
> > > > This is the case where PCI PF/VF of a controller and devlink
> > > > eswitch instance both are located on a single system.
> > > > (2) controller located on other system.
> > > > This is the case where a controller is located in one system and
> > > > its devlink eswitch ports are located in a different system. In
> > > > this case devlink instance of the eswitch only have access to
> > > > ports of the controller.
> > > >
> > > > When a devlink eswitch instance serves the devlink ports of both
> > > > controllers together, PCI PF/VF numbers may overlap.
> > > > Due to this a unique phys_port_name cannot be constructed.
> > >
> > > This description is clear as mud to me. Is it just me? Can someone
> > > understand this?
> >
> > I would like to improve this description.
> > Do you have an input to describe these two different controllers, each
> > has same PF and VF numbers?
> 
> Not yet, I'm just trying to figure out how things come together.
> 
> Are some VFs of the same PF under one controller and other ones under a
> different controller?
> 
Correct.

> > $ devlink port show looks like below without a controller annotation.
> > pci/0000:00:08.0/0: type eth netdev eth5 flavour physical
> > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
> 
> How can you have two PF 0? Aaah - by controller you mean hardware IP, not
> whoever is controlling the switching! So the chip has multiple HW controllers,
> each of which can have multiple PFs?
> 
Hardware IP is one. This IP is plugged into two PCI root complexes.
One is eswitch PF, this PF has its own VFs/SFs.
Other PF(s) plugged into an second PCI Root complex serving the server system.
So you are right there are multiple PFs. Both the PFs have same PCI BDF.

> Definitely please make that more clear.
> 
Ok. yeah.
I should add above details.
I think a text diagram might be more useful.

> Why is @controller_num not under PCI port attrs, but a separate field
> without even a mention of PCI? Are some of the controllers a different bus?
I think It can be added under PCI attribute, but than we need to add it for PF and VF and in future for SF too.
So added it as generic. But yes, keeping it as part of PCI port attributes is good too.
Jakub Kicinski Aug. 27, 2020, 6:32 p.m. UTC | #5
On Thu, 27 Aug 2020 04:31:43 +0000 Parav Pandit wrote:
> > > $ devlink port show looks like below without a controller annotation.
> > > pci/0000:00:08.0/0: type eth netdev eth5 flavour physical
> > > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> > > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0  
> > 
> > How can you have two PF 0? Aaah - by controller you mean hardware IP, not
> > whoever is controlling the switching! So the chip has multiple HW controllers,
> > each of which can have multiple PFs?
> >   
> Hardware IP is one. This IP is plugged into two PCI root complexes.
> One is eswitch PF, this PF has its own VFs/SFs.
> Other PF(s) plugged into an second PCI Root complex serving the server system.
> So you are right there are multiple PFs.

I find it strange that you have pfnum 0 everywhere but then different
controllers. For MultiHost at Netronome we've used pfnum to distinguish
between the hosts. ASIC must have some unique identifiers for each PF.

I'm not aware of any practical reason for creating PFs on one RC
without reinitializing all the others.

I can see how having multiple controllers may make things clearer, but
adding another layer of IDs while the one under it is unused (pfnum=0)
feels very unnecessary.

> Both the PFs have same PCI BDF.

BDFs are irrelevant.
Parav Pandit Aug. 27, 2020, 8:15 p.m. UTC | #6
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Friday, August 28, 2020 12:02 AM
> 
> On Thu, 27 Aug 2020 04:31:43 +0000 Parav Pandit wrote:
> > > > $ devlink port show looks like below without a controller annotation.
> > > > pci/0000:00:08.0/0: type eth netdev eth5 flavour physical
> > > > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> > > > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
> > >
> > > How can you have two PF 0? Aaah - by controller you mean hardware
> > > IP, not whoever is controlling the switching! So the chip has
> > > multiple HW controllers, each of which can have multiple PFs?
> > >
> > Hardware IP is one. This IP is plugged into two PCI root complexes.
> > One is eswitch PF, this PF has its own VFs/SFs.
> > Other PF(s) plugged into an second PCI Root complex serving the server
> system.
> > So you are right there are multiple PFs.
> 
> I find it strange that you have pfnum 0 everywhere but then different
> controllers.
There are multiple PFs, connected to different PCI RC. So device has same pfnum for both the PFs.

> For MultiHost at Netronome we've used pfnum to distinguish
> between the hosts. ASIC must have some unique identifiers for each PF.
> 
Yes. there is. It is identified by a unique controller number; internally it is called host_number.
But internal host_number is misleading term as multiple cables of same physical card can be plugged into single host.
So identifying based on a unique (controller) number and matching that up on external cable is desired.

> I'm not aware of any practical reason for creating PFs on one RC without
> reinitializing all the others.
I may be misunderstanding, but how is initialization is related multiple PFs?

> 
> I can see how having multiple controllers may make things clearer, but adding
> another layer of IDs while the one under it is unused (pfnum=0) feels very
> unnecessary.
pfnum=0 is used today. not sure I understand your comment about being unused.
Can you please explain?

Hierarchical naming kind of make sense, but if you have other ideas to annotate the controller, without changing the hardware pfnum, lets discuss.

> 
> > Both the PFs have same PCI BDF.
> 
> BDFs are irrelevant.
Jakub Kicinski Aug. 27, 2020, 9:42 p.m. UTC | #7
On Thu, 27 Aug 2020 20:15:01 +0000 Parav Pandit wrote:
> > From: Jakub Kicinski <kuba@kernel.org>
> >
> > I find it strange that you have pfnum 0 everywhere but then different
> > controllers.  
> There are multiple PFs, connected to different PCI RC. So device has
> same pfnum for both the PFs.
> 
> > For MultiHost at Netronome we've used pfnum to distinguish
> > between the hosts. ASIC must have some unique identifiers for each
> > PF. 
> Yes. there is. It is identified by a unique controller number;
> internally it is called host_number. But internal host_number is
> misleading term as multiple cables of same physical card can be
> plugged into single host. So identifying based on a unique
> (controller) number and matching that up on external cable is desired.
> 
> > I'm not aware of any practical reason for creating PFs on one RC
> > without reinitializing all the others.  
> I may be misunderstanding, but how is initialization is related
> multiple PFs?

If the number of PFs is static it should be possible to understand
which one is on which system.

> > I can see how having multiple controllers may make things clearer,
> > but adding another layer of IDs while the one under it is unused
> > (pfnum=0) feels very unnecessary.  
> pfnum=0 is used today. not sure I understand your comment about being
> unused. Can you please explain?

You examples only ever have pfnum 0:

From patch 2:

$ devlink port show pci/0000:00:08.0/2
pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour pcivf pfnum 0 vfnum 1 splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show -jp pci/0000:00:08.0/2
{
    "port": {
        "pci/0000:00:08.0/1": {
            "type": "eth",
            "netdev": "eth7",
            "controller": 0,
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

From earlier email:

pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0

If you never use pfnum, you can just put the controller ID there, 
like Netronome.

> Hierarchical naming kind of make sense, but if you have other ideas
> to annotate the controller, without changing the hardware pfnum, lets
> discuss.
Parav Pandit Aug. 28, 2020, 4:27 a.m. UTC | #8
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Friday, August 28, 2020 3:12 AM
> 
> On Thu, 27 Aug 2020 20:15:01 +0000 Parav Pandit wrote:
> > > From: Jakub Kicinski <kuba@kernel.org>
> > >
> > > I find it strange that you have pfnum 0 everywhere but then
> > > different controllers.
> > There are multiple PFs, connected to different PCI RC. So device has
> > same pfnum for both the PFs.
> >
> > > For MultiHost at Netronome we've used pfnum to distinguish between
> > > the hosts. ASIC must have some unique identifiers for each PF.
> > Yes. there is. It is identified by a unique controller number;
> > internally it is called host_number. But internal host_number is
> > misleading term as multiple cables of same physical card can be
> > plugged into single host. So identifying based on a unique
> > (controller) number and matching that up on external cable is desired.
> >
> > > I'm not aware of any practical reason for creating PFs on one RC
> > > without reinitializing all the others.
> > I may be misunderstanding, but how is initialization is related
> > multiple PFs?
> 
> If the number of PFs is static it should be possible to understand which one is on
> which system.
>
How? How do we tell that pfnum A means external system.
Want to avoid such 'implicit' notion.
 
> > > I can see how having multiple controllers may make things clearer,
> > > but adding another layer of IDs while the one under it is unused
> > > (pfnum=0) feels very unnecessary.
> > pfnum=0 is used today. not sure I understand your comment about being
> > unused. Can you please explain?
> 
> You examples only ever have pfnum 0:
> 
Because both controllers have pfnum 0.

> From patch 2:
> 
> $ devlink port show pci/0000:00:08.0/2
> pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour pcivf pfnum 0
> vfnum 1 splittable false
>   function:
>     hw_addr 00:00:00:00:00:00
> 
> $ devlink port show -jp pci/0000:00:08.0/2 {
>     "port": {
>         "pci/0000:00:08.0/1": {
>             "type": "eth",
>             "netdev": "eth7",
>             "controller": 0,
>             "flavour": "pcivf",
>             "pfnum": 0,
>             "vfnum": 1,
>             "splittable": false,
>             "function": {
>                 "hw_addr": "00:00:00:00:00:00"
>             }
>         }
>     }
> }
> 
> From earlier email:
> 
> pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
> 
> If you never use pfnum, you can just put the controller ID there, like Netronome.
>
It likely not going to work for us. Because pfnum is not some randomly generated number.
It is linked to the underlying PCI pf number. {pf0, pf1...}
Orchestration sw uses this to identify representor of a PF-VF pair.

Replacing pfnum with controller number breaks this; and it still doesn't tell user that it's the pf on other_host.

So it is used, and would like to continue to use even if there are multiple PFs port (that has same pfnum) under the same eswitch.

In an alternative,
Currently we have pcipf, pcivf (and pcisf) flavours. May be if we introduce new flavour say 'epcipf' to indicate external pci PF/VF/SF ports?
There can be better name than epcipf. I just put epcipf to differentiate it.
However these ports have same attributes as pcipf, pcivf, pcisf flavours.

> > Hierarchical naming kind of make sense, but if you have other ideas to
> > annotate the controller, without changing the hardware pfnum, lets
> > discuss.
Parav Pandit Aug. 28, 2020, 5:08 a.m. UTC | #9
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Parav Pandit
> Sent: Friday, August 28, 2020 9:57 AM
> 
> 
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Friday, August 28, 2020 3:12 AM
> >
> > On Thu, 27 Aug 2020 20:15:01 +0000 Parav Pandit wrote:
> > > > From: Jakub Kicinski <kuba@kernel.org>
> > > >
> > > > I find it strange that you have pfnum 0 everywhere but then
> > > > different controllers.
> > > There are multiple PFs, connected to different PCI RC. So device has
> > > same pfnum for both the PFs.
> > >
> > > > For MultiHost at Netronome we've used pfnum to distinguish between
> > > > the hosts. ASIC must have some unique identifiers for each PF.
> > > Yes. there is. It is identified by a unique controller number;
> > > internally it is called host_number. But internal host_number is
> > > misleading term as multiple cables of same physical card can be
> > > plugged into single host. So identifying based on a unique
> > > (controller) number and matching that up on external cable is desired.
> > >
> > > > I'm not aware of any practical reason for creating PFs on one RC
> > > > without reinitializing all the others.
> > > I may be misunderstanding, but how is initialization is related
> > > multiple PFs?
> >
> > If the number of PFs is static it should be possible to understand
> > which one is on which system.
> >
> How? How do we tell that pfnum A means external system.
> Want to avoid such 'implicit' notion.
> 
> > > > I can see how having multiple controllers may make things clearer,
> > > > but adding another layer of IDs while the one under it is unused
> > > > (pfnum=0) feels very unnecessary.
> > > pfnum=0 is used today. not sure I understand your comment about
> > > being unused. Can you please explain?
> >
> > You examples only ever have pfnum 0:
> >
> Because both controllers have pfnum 0.
> 
> > From patch 2:
> >
> > $ devlink port show pci/0000:00:08.0/2
> > pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour pcivf
> > pfnum 0 vfnum 1 splittable false
> >   function:
> >     hw_addr 00:00:00:00:00:00
> >
> > $ devlink port show -jp pci/0000:00:08.0/2 {
> >     "port": {
> >         "pci/0000:00:08.0/1": {
> >             "type": "eth",
> >             "netdev": "eth7",
> >             "controller": 0,
> >             "flavour": "pcivf",
> >             "pfnum": 0,
> >             "vfnum": 1,
> >             "splittable": false,
> >             "function": {
> >                 "hw_addr": "00:00:00:00:00:00"
> >             }
> >         }
> >     }
> > }
> >
> > From earlier email:
> >
> > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
> >
> > If you never use pfnum, you can just put the controller ID there, like
> Netronome.
> >
> It likely not going to work for us. Because pfnum is not some randomly
> generated number.
> It is linked to the underlying PCI pf number. {pf0, pf1...} Orchestration sw uses
> this to identify representor of a PF-VF pair.
> 
> Replacing pfnum with controller number breaks this; and it still doesn't tell user
> that it's the pf on other_host.
> 
> So it is used, and would like to continue to use even if there are multiple PFs port
> (that has same pfnum) under the same eswitch.
> 
> In an alternative,
> Currently we have pcipf, pcivf (and pcisf) flavours. May be if we introduce new
> flavour say 'epcipf' to indicate external pci PF/VF/SF ports?
> There can be better name than epcipf. I just put epcipf to differentiate it.
> However these ports have same attributes as pcipf, pcivf, pcisf flavours.
> 
I pressed the send button without an example of an alternative.
Changed eth dev name to be more readable as phys_port_name.

$ devlink port show
pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
pci/0000:00:08.0/2: type eth netdev enp0s8f0_epf0 flavour epcipf pfnum 0

This naming only go far as long as each multi-host controller is in different eswitch.
When user prefers them under same eswitch, we will again have collision.
Hence, I suggest to use controller number that addressed both the use cases.

I do not know when Mellanox plan to support this mode, but I was told that this is likely.
What about Netronome? Is each host in different eswitch?

$ devlink port show
pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
pci/0000:00:08.0/2: type eth netdev enp0s8f0_c0pf0 flavour epcipf pfnum 0

This combines the idea of building phys_port_name of pcipf and external_pcipf ports in same way.
At the same time it has the ability to use the controller number.

Secondly, 
I forgot to mention previously that each controller (in multi host) setup consist of 2 PFs on same cable.
So pfnum!=host_number either.

Hence using controller number covering both use cases looks better to me.

> > > Hierarchical naming kind of make sense, but if you have other ideas
> > > to annotate the controller, without changing the hardware pfnum,
> > > lets discuss.
Jakub Kicinski Aug. 28, 2020, 4:43 p.m. UTC | #10
On Fri, 28 Aug 2020 04:27:19 +0000 Parav Pandit wrote:
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Friday, August 28, 2020 3:12 AM
> > 
> > On Thu, 27 Aug 2020 20:15:01 +0000 Parav Pandit wrote:  
> > > > From: Jakub Kicinski <kuba@kernel.org>
> > > >
> > > > I find it strange that you have pfnum 0 everywhere but then
> > > > different controllers.  
> > > There are multiple PFs, connected to different PCI RC. So device has
> > > same pfnum for both the PFs.
> > >  
> > > > For MultiHost at Netronome we've used pfnum to distinguish between
> > > > the hosts. ASIC must have some unique identifiers for each PF.  
> > > Yes. there is. It is identified by a unique controller number;
> > > internally it is called host_number. But internal host_number is
> > > misleading term as multiple cables of same physical card can be
> > > plugged into single host. So identifying based on a unique
> > > (controller) number and matching that up on external cable is desired.
> > >  
> > > > I'm not aware of any practical reason for creating PFs on one RC
> > > > without reinitializing all the others.  
> > > I may be misunderstanding, but how is initialization is related
> > > multiple PFs?  
> > 
> > If the number of PFs is static it should be possible to understand which one is on
> > which system.
>
> How? How do we tell that pfnum A means external system.
> Want to avoid such 'implicit' notion.

How do you tell that controller A means external system?

> > > > I can see how having multiple controllers may make things clearer,
> > > > but adding another layer of IDs while the one under it is unused
> > > > (pfnum=0) feels very unnecessary.  
> > > pfnum=0 is used today. not sure I understand your comment about being
> > > unused. Can you please explain?  
> > 
> > You examples only ever have pfnum 0:
> >   
> Because both controllers have pfnum 0.
> 
> > From patch 2:
> > 
> > $ devlink port show pci/0000:00:08.0/2
> > pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour pcivf pfnum 0
> > vfnum 1 splittable false
> >   function:
> >     hw_addr 00:00:00:00:00:00
> > 
> > $ devlink port show -jp pci/0000:00:08.0/2 {
> >     "port": {
> >         "pci/0000:00:08.0/1": {
> >             "type": "eth",
> >             "netdev": "eth7",
> >             "controller": 0,
> >             "flavour": "pcivf",
> >             "pfnum": 0,
> >             "vfnum": 1,
> >             "splittable": false,
> >             "function": {
> >                 "hw_addr": "00:00:00:00:00:00"
> >             }
> >         }
> >     }
> > }
> > 
> > From earlier email:
> > 
> > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
> > 
> > If you never use pfnum, you can just put the controller ID there, like Netronome.
> >  
> It likely not going to work for us. Because pfnum is not some randomly generated number.
> It is linked to the underlying PCI pf number. {pf0, pf1...}
> Orchestration sw uses this to identify representor of a PF-VF pair.

For orchestration software which is unaware of controllers ports will
still alias on pf/vf nums.

Besides you have one devlink instance per port currently so I'm guessing
there is no pf1 ever, in your case...

> Replacing pfnum with controller number breaks this; and it still doesn't tell user that it's the pf on other_host.

Neither does the opaque controller id. Maybe now you understand better
why I wanted peer objects :/

> So it is used, and would like to continue to use even if there are multiple PFs port (that has same pfnum) under the same eswitch.
> 
> In an alternative,
> Currently we have pcipf, pcivf (and pcisf) flavours. May be if we introduce new flavour say 'epcipf' to indicate external pci PF/VF/SF ports?
> There can be better name than epcipf. I just put epcipf to differentiate it.
> However these ports have same attributes as pcipf, pcivf, pcisf flavours.

I don't think the controllers are a terrible idea. Seems like a fairly
reasonable extension. But MLX don't seem to need them. And you have a
history of trying to make the Linux APIs look like your FW API.

Jiri, would you mind chiming in? What's your take?
Parav Pandit Aug. 29, 2020, 3:43 a.m. UTC | #11
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Friday, August 28, 2020 10:14 PM
> 
> On Fri, 28 Aug 2020 04:27:19 +0000 Parav Pandit wrote:
> > > From: Jakub Kicinski <kuba@kernel.org>
> > > Sent: Friday, August 28, 2020 3:12 AM
> > >
> > > On Thu, 27 Aug 2020 20:15:01 +0000 Parav Pandit wrote:
> > > > > From: Jakub Kicinski <kuba@kernel.org>
> > > > >
> > > > > I find it strange that you have pfnum 0 everywhere but then
> > > > > different controllers.
> > > > There are multiple PFs, connected to different PCI RC. So device
> > > > has same pfnum for both the PFs.
> > > >
> > > > > For MultiHost at Netronome we've used pfnum to distinguish
> > > > > between the hosts. ASIC must have some unique identifiers for each PF.
> > > > Yes. there is. It is identified by a unique controller number;
> > > > internally it is called host_number. But internal host_number is
> > > > misleading term as multiple cables of same physical card can be
> > > > plugged into single host. So identifying based on a unique
> > > > (controller) number and matching that up on external cable is desired.
> > > >
> > > > > I'm not aware of any practical reason for creating PFs on one RC
> > > > > without reinitializing all the others.
> > > > I may be misunderstanding, but how is initialization is related
> > > > multiple PFs?
> > >
> > > If the number of PFs is static it should be possible to understand
> > > which one is on which system.
> >
> > How? How do we tell that pfnum A means external system.
> > Want to avoid such 'implicit' notion.
> 
> How do you tell that controller A means external system?
Which is why I started with annotating only external controllers, mainly to avoid renaming and breaking current scheme for non_smartnic cases which possibly is the most user base.

But probably external pcipf/vf/sf port flavours are more intuitive combined with controller number.
More below.

> 
> > > > > I can see how having multiple controllers may make things
> > > > > clearer, but adding another layer of IDs while the one under it
> > > > > is unused
> > > > > (pfnum=0) feels very unnecessary.
> > > > pfnum=0 is used today. not sure I understand your comment about
> > > > being unused. Can you please explain?
> > >
> > > You examples only ever have pfnum 0:
> > >
> > Because both controllers have pfnum 0.
> >
> > > From patch 2:
> > >
> > > $ devlink port show pci/0000:00:08.0/2
> > > pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour pcivf
> > > pfnum 0 vfnum 1 splittable false
> > >   function:
> > >     hw_addr 00:00:00:00:00:00
> > >
> > > $ devlink port show -jp pci/0000:00:08.0/2 {
> > >     "port": {
> > >         "pci/0000:00:08.0/1": {
> > >             "type": "eth",
> > >             "netdev": "eth7",
> > >             "controller": 0,
> > >             "flavour": "pcivf",
> > >             "pfnum": 0,
> > >             "vfnum": 1,
> > >             "splittable": false,
> > >             "function": {
> > >                 "hw_addr": "00:00:00:00:00:00"
> > >             }
> > >         }
> > >     }
> > > }
> > >
> > > From earlier email:
> > >
> > > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> > > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
> > >
> > > If you never use pfnum, you can just put the controller ID there, like
> Netronome.
> > >
> > It likely not going to work for us. Because pfnum is not some randomly
> generated number.
> > It is linked to the underlying PCI pf number. {pf0, pf1...}
> > Orchestration sw uses this to identify representor of a PF-VF pair.
> 
> For orchestration software which is unaware of controllers ports will still alias
> on pf/vf nums.
>
Yes.
Orchestration which will be aware of controller, will use it.
 
> Besides you have one devlink instance per port currently so I'm guessing there is
> no pf1 ever, in your case...
>
Currently there are multiple devlink instance. One for pf0, other for pf1.
Ports of both instances have the same switch id.
 
> > Replacing pfnum with controller number breaks this; and it still doesn't tell user
> that it's the pf on other_host.
> 
> Neither does the opaque controller id. 
Which is why I tossed the epcipf (external pci pf) port flavour that fits in current model.
But doesn't allow multiple external hosts under same eswitch for those devices which has same pci pf, vf numbers among those hosts. (and it is the case for mlnx).

> Maybe now you understand better why I wanted peer objects :/
>
I wasn't against peer object. But showing netdev of peer object assumed no_smartnic, it also assume other_side is also similar Linux kernel.
Anyways, I make humble request get over the past to move forward. :-)

> > So it is used, and would like to continue to use even if there are multiple PFs
> port (that has same pfnum) under the same eswitch.
> >
> > In an alternative,
> > Currently we have pcipf, pcivf (and pcisf) flavours. May be if we introduce new
> flavour say 'epcipf' to indicate external pci PF/VF/SF ports?
> > There can be better name than epcipf. I just put epcipf to differentiate it.
> > However these ports have same attributes as pcipf, pcivf, pcisf flavours.
> 
> I don't think the controllers are a terrible idea. Seems like a fairly reasonable
> extension.
Ok. 
> But MLX don't seem to need them. And you have a history of trying to
> make the Linux APIs look like your FW API.
> 
Because there are two devlink instances for each PF?
I think for now an epcipf, epcivf flavour would just suffice due to lack of multiple devlink instances.
But in long run it is better to have the controller covering few topologies.
Otherwise we will break the rep naming later when multiple controllers are managed by single eswitch (without notion of controller).

Sometime my text is confusing. :-) so adding example of the thoughts below.
Example: Eswitch side devlink port show for multi-host setup considering the smartnic.

$ devlink port show
pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
pci/0000:00:08.0/2: type eth netdev enp0s8f0_c0pf0 flavour epcipf pfnum 0
                                                                                                             ^^^^^ new port flavour.
pci/0000:00:08.1/0: type eth netdev enp0s8f1 flavour physical
pci/0000:00:08.1/1: type eth netdev enp0s8f1_pf1 flavour pcipf pfnum 1
pci/0000:00:08.1/2: type eth netdev enp0s8f1_c0pf1 flavour epcipf pfnum 1

Here one controller has two pci pfs (0,1}. Eswitch shows that they are external pci ports.
Whenever (not sure when), mlnx converts to single devlink instance, this will continue to work.
It will also work when multiple controller(s) (of external host) ports have same switch_id (for orchestration).
And this doesn't break any backward compatibility for non multihost, non smatnic users.

> Jiri, would you mind chiming in? What's your take?

Will wait for his inputs..
Jiri Pirko Sept. 1, 2020, 8:19 a.m. UTC | #12
Sat, Aug 29, 2020 at 05:43:58AM CEST, parav@nvidia.com wrote:
>
>
>> From: Jakub Kicinski <kuba@kernel.org>
>> Sent: Friday, August 28, 2020 10:14 PM
>> 
>> On Fri, 28 Aug 2020 04:27:19 +0000 Parav Pandit wrote:
>> > > From: Jakub Kicinski <kuba@kernel.org>
>> > > Sent: Friday, August 28, 2020 3:12 AM
>> > >
>> > > On Thu, 27 Aug 2020 20:15:01 +0000 Parav Pandit wrote:
>> > > > > From: Jakub Kicinski <kuba@kernel.org>
>> > > > >
>> > > > > I find it strange that you have pfnum 0 everywhere but then
>> > > > > different controllers.
>> > > > There are multiple PFs, connected to different PCI RC. So device
>> > > > has same pfnum for both the PFs.
>> > > >
>> > > > > For MultiHost at Netronome we've used pfnum to distinguish
>> > > > > between the hosts. ASIC must have some unique identifiers for each PF.
>> > > > Yes. there is. It is identified by a unique controller number;
>> > > > internally it is called host_number. But internal host_number is
>> > > > misleading term as multiple cables of same physical card can be
>> > > > plugged into single host. So identifying based on a unique
>> > > > (controller) number and matching that up on external cable is desired.
>> > > >
>> > > > > I'm not aware of any practical reason for creating PFs on one RC
>> > > > > without reinitializing all the others.
>> > > > I may be misunderstanding, but how is initialization is related
>> > > > multiple PFs?
>> > >
>> > > If the number of PFs is static it should be possible to understand
>> > > which one is on which system.
>> >
>> > How? How do we tell that pfnum A means external system.
>> > Want to avoid such 'implicit' notion.
>> 
>> How do you tell that controller A means external system?

Perhaps the attr name could be explicitly containing "external" word?
Like:
"ext_controller" or "extnum" (similar to "pfnum" and "vfnum") something
like that.


>Which is why I started with annotating only external controllers, mainly to avoid renaming and breaking current scheme for non_smartnic cases which possibly is the most user base.
>
>But probably external pcipf/vf/sf port flavours are more intuitive combined with controller number.
>More below.
>
>> 
>> > > > > I can see how having multiple controllers may make things
>> > > > > clearer, but adding another layer of IDs while the one under it
>> > > > > is unused
>> > > > > (pfnum=0) feels very unnecessary.
>> > > > pfnum=0 is used today. not sure I understand your comment about
>> > > > being unused. Can you please explain?
>> > >
>> > > You examples only ever have pfnum 0:
>> > >
>> > Because both controllers have pfnum 0.
>> >
>> > > From patch 2:
>> > >
>> > > $ devlink port show pci/0000:00:08.0/2
>> > > pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour pcivf
>> > > pfnum 0 vfnum 1 splittable false
>> > >   function:
>> > >     hw_addr 00:00:00:00:00:00
>> > >
>> > > $ devlink port show -jp pci/0000:00:08.0/2 {
>> > >     "port": {
>> > >         "pci/0000:00:08.0/1": {
>> > >             "type": "eth",
>> > >             "netdev": "eth7",
>> > >             "controller": 0,
>> > >             "flavour": "pcivf",
>> > >             "pfnum": 0,
>> > >             "vfnum": 1,
>> > >             "splittable": false,
>> > >             "function": {
>> > >                 "hw_addr": "00:00:00:00:00:00"
>> > >             }
>> > >         }
>> > >     }
>> > > }
>> > >
>> > > From earlier email:
>> > >
>> > > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
>> > > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
>> > >
>> > > If you never use pfnum, you can just put the controller ID there, like
>> Netronome.
>> > >
>> > It likely not going to work for us. Because pfnum is not some randomly
>> generated number.
>> > It is linked to the underlying PCI pf number. {pf0, pf1...}
>> > Orchestration sw uses this to identify representor of a PF-VF pair.
>> 
>> For orchestration software which is unaware of controllers ports will still alias
>> on pf/vf nums.
>>
>Yes.
>Orchestration which will be aware of controller, will use it.
> 
>> Besides you have one devlink instance per port currently so I'm guessing there is
>> no pf1 ever, in your case...
>>
>Currently there are multiple devlink instance. One for pf0, other for pf1.
>Ports of both instances have the same switch id.
> 
>> > Replacing pfnum with controller number breaks this; and it still doesn't tell user
>> that it's the pf on other_host.
>> 
>> Neither does the opaque controller id. 
>Which is why I tossed the epcipf (external pci pf) port flavour that fits in current model.
>But doesn't allow multiple external hosts under same eswitch for those devices which has same pci pf, vf numbers among those hosts. (and it is the case for mlnx).
>
>> Maybe now you understand better why I wanted peer objects :/
>>
>I wasn't against peer object. But showing netdev of peer object assumed no_smartnic, it also assume other_side is also similar Linux kernel.
>Anyways, I make humble request get over the past to move forward. :-)
>
>> > So it is used, and would like to continue to use even if there are multiple PFs
>> port (that has same pfnum) under the same eswitch.
>> >
>> > In an alternative,
>> > Currently we have pcipf, pcivf (and pcisf) flavours. May be if we introduce new
>> flavour say 'epcipf' to indicate external pci PF/VF/SF ports?
>> > There can be better name than epcipf. I just put epcipf to differentiate it.
>> > However these ports have same attributes as pcipf, pcivf, pcisf flavours.
>> 
>> I don't think the controllers are a terrible idea. Seems like a fairly reasonable
>> extension.
>Ok. 
>> But MLX don't seem to need them. And you have a history of trying to
>> make the Linux APIs look like your FW API.
>> 
>Because there are two devlink instances for each PF?
>I think for now an epcipf, epcivf flavour would just suffice due to lack of multiple devlink instances.
>But in long run it is better to have the controller covering few topologies.
>Otherwise we will break the rep naming later when multiple controllers are managed by single eswitch (without notion of controller).
>
>Sometime my text is confusing. :-) so adding example of the thoughts below.
>Example: Eswitch side devlink port show for multi-host setup considering the smartnic.
>
>$ devlink port show
>pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
>pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
>pci/0000:00:08.0/2: type eth netdev enp0s8f0_c0pf0 flavour epcipf pfnum 0
>                                                                                                             ^^^^^ new port flavour.
>pci/0000:00:08.1/0: type eth netdev enp0s8f1 flavour physical
>pci/0000:00:08.1/1: type eth netdev enp0s8f1_pf1 flavour pcipf pfnum 1
>pci/0000:00:08.1/2: type eth netdev enp0s8f1_c0pf1 flavour epcipf pfnum 1
>
>Here one controller has two pci pfs (0,1}. Eswitch shows that they are external pci ports.
>Whenever (not sure when), mlnx converts to single devlink instance, this will continue to work.
>It will also work when multiple controller(s) (of external host) ports have same switch_id (for orchestration).
>And this doesn't break any backward compatibility for non multihost, non smatnic users.
>
>> Jiri, would you mind chiming in? What's your take?
>
>Will wait for his inputs..

I don't see the need for new flavour. The port is still pf same as the
local pf, it only resides on a different host. We just need to make sure
to resolve the conflict between PFX and PFX on 2 different hosts
(local/ext or ext/ext)

So I think that for local PFs, no change is needed.
The external PFs need to have an extra attribute with "external
enumeration" what would be used for the representor netdev name as well.

pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
pci/0000:00:08.0/2: type eth netdev enp0s8f0_e0pf0 flavour pcipf extnum 0 pfnum 0
pci/0000:00:08.0/3: type eth netdev enp0s8f0_e1pf0 flavour pcipf extnum 1 pfnum 0
Parav Pandit Sept. 1, 2020, 8:53 a.m. UTC | #13
> From: Jiri Pirko <jiri@resnulli.us>
> Sent: Tuesday, September 1, 2020 1:49 PM
> 
> >> > How? How do we tell that pfnum A means external system.
> >> > Want to avoid such 'implicit' notion.
> >>
> >> How do you tell that controller A means external system?
> 
> Perhaps the attr name could be explicitly containing "external" word?
> Like:
> "ext_controller" or "extnum" (similar to "pfnum" and "vfnum") something
> like that.

How about ecnum "external controller number"?
Tiny change in the phys_port_name below example.

> 
> 
> >Which is why I started with annotating only external controllers, mainly to
> avoid renaming and breaking current scheme for non_smartnic cases which
> possibly is the most user base.
> >
> >But probably external pcipf/vf/sf port flavours are more intuitive combined
> with controller number.
> >More below.
> >
> >>
> >> > > > > I can see how having multiple controllers may make things
> >> > > > > clearer, but adding another layer of IDs while the one under
> >> > > > > it is unused
> >> > > > > (pfnum=0) feels very unnecessary.
> >> > > > pfnum=0 is used today. not sure I understand your comment about
> >> > > > being unused. Can you please explain?
> >> > >
> >> > > You examples only ever have pfnum 0:
> >> > >
> >> > Because both controllers have pfnum 0.
> >> >
> >> > > From patch 2:
> >> > >
> >> > > $ devlink port show pci/0000:00:08.0/2
> >> > > pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour
> >> > > pcivf pfnum 0 vfnum 1 splittable false
> >> > >   function:
> >> > >     hw_addr 00:00:00:00:00:00
> >> > >
> >> > > $ devlink port show -jp pci/0000:00:08.0/2 {
> >> > >     "port": {
> >> > >         "pci/0000:00:08.0/1": {
> >> > >             "type": "eth",
> >> > >             "netdev": "eth7",
> >> > >             "controller": 0,
> >> > >             "flavour": "pcivf",
> >> > >             "pfnum": 0,
> >> > >             "vfnum": 1,
> >> > >             "splittable": false,
> >> > >             "function": {
> >> > >                 "hw_addr": "00:00:00:00:00:00"
> >> > >             }
> >> > >         }
> >> > >     }
> >> > > }
> >> > >
> >> > > From earlier email:
> >> > >
> >> > > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> >> > > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
> >> > >
> >> > > If you never use pfnum, you can just put the controller ID there,
> >> > > like
> >> Netronome.
> >> > >
> >> > It likely not going to work for us. Because pfnum is not some
> >> > randomly
> >> generated number.
> >> > It is linked to the underlying PCI pf number. {pf0, pf1...}
> >> > Orchestration sw uses this to identify representor of a PF-VF pair.
> >>
> >> For orchestration software which is unaware of controllers ports will
> >> still alias on pf/vf nums.
> >>
> >Yes.
> >Orchestration which will be aware of controller, will use it.
> >
> >> Besides you have one devlink instance per port currently so I'm
> >> guessing there is no pf1 ever, in your case...
> >>
> >Currently there are multiple devlink instance. One for pf0, other for pf1.
> >Ports of both instances have the same switch id.
> >
> >> > Replacing pfnum with controller number breaks this; and it still
> >> > doesn't tell user
> >> that it's the pf on other_host.
> >>
> >> Neither does the opaque controller id.
> >Which is why I tossed the epcipf (external pci pf) port flavour that fits in
> current model.
> >But doesn't allow multiple external hosts under same eswitch for those
> devices which has same pci pf, vf numbers among those hosts. (and it is the
> case for mlnx).
> >
> >> Maybe now you understand better why I wanted peer objects :/
> >>
> >I wasn't against peer object. But showing netdev of peer object assumed
> no_smartnic, it also assume other_side is also similar Linux kernel.
> >Anyways, I make humble request get over the past to move forward. :-)
> >
> >> > So it is used, and would like to continue to use even if there are
> >> > multiple PFs
> >> port (that has same pfnum) under the same eswitch.
> >> >
> >> > In an alternative,
> >> > Currently we have pcipf, pcivf (and pcisf) flavours. May be if we
> >> > introduce new
> >> flavour say 'epcipf' to indicate external pci PF/VF/SF ports?
> >> > There can be better name than epcipf. I just put epcipf to differentiate
> it.
> >> > However these ports have same attributes as pcipf, pcivf, pcisf flavours.
> >>
> >> I don't think the controllers are a terrible idea. Seems like a
> >> fairly reasonable extension.
> >Ok.
> >> But MLX don't seem to need them. And you have a history of trying to
> >> make the Linux APIs look like your FW API.
> >>
> >Because there are two devlink instances for each PF?
> >I think for now an epcipf, epcivf flavour would just suffice due to lack of
> multiple devlink instances.
> >But in long run it is better to have the controller covering few topologies.
> >Otherwise we will break the rep naming later when multiple controllers are
> managed by single eswitch (without notion of controller).
> >
> >Sometime my text is confusing. :-) so adding example of the thoughts
> below.
> >Example: Eswitch side devlink port show for multi-host setup considering
> the smartnic.
> >
> >$ devlink port show
> >pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
> >pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
> >pci/0000:00:08.0/2: type eth netdev enp0s8f0_c0pf0 flavour epcipf pfnum 0
> >                                                                                                             ^^^^^ new port
> flavour.
> >pci/0000:00:08.1/0: type eth netdev enp0s8f1 flavour physical
> >pci/0000:00:08.1/1: type eth netdev enp0s8f1_pf1 flavour pcipf pfnum 1
> >pci/0000:00:08.1/2: type eth netdev enp0s8f1_c0pf1 flavour epcipf pfnum
> >1
> >
> >Here one controller has two pci pfs (0,1}. Eswitch shows that they are
> external pci ports.
> >Whenever (not sure when), mlnx converts to single devlink instance, this
> will continue to work.
> >It will also work when multiple controller(s) (of external host) ports have
> same switch_id (for orchestration).
> >And this doesn't break any backward compatibility for non multihost, non
> smatnic users.
> >
> >> Jiri, would you mind chiming in? What's your take?
> >
> >Will wait for his inputs..
> 
> I don't see the need for new flavour. The port is still pf same as the local pf, it
> only resides on a different host. We just need to make sure to resolve the
> conflict between PFX and PFX on 2 different hosts (local/ext or ext/ext)
> 
Yes. I agree. I do not have strong opinion on new flavour as long as we make clear that this is for the external controller.

> So I think that for local PFs, no change is needed.
Yep.

> The external PFs need to have an extra attribute with "external
> enumeration" what would be used for the representor netdev name as well.
> 
> pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
> pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
> pci/0000:00:08.0/2: type eth netdev enp0s8f0_e0pf0 flavour pcipf extnum 0
> pfnum 0

How about a prefix of "ec" instead of "e", like?
pci/0000:00:08.0/2: type eth netdev enp0s8f0_ec0pf0 flavour pcipf ecnum 0 pfnum 0
                                                                                     ^^^^
> pci/0000:00:08.0/3: type eth netdev enp0s8f0_e1pf0 flavour pcipf extnum 1
> pfnum 0
Jiri Pirko Sept. 1, 2020, 9:17 a.m. UTC | #14
Tue, Sep 01, 2020 at 10:53:23AM CEST, parav@nvidia.com wrote:
>
>> From: Jiri Pirko <jiri@resnulli.us>
>> Sent: Tuesday, September 1, 2020 1:49 PM
>> 
>> >> > How? How do we tell that pfnum A means external system.
>> >> > Want to avoid such 'implicit' notion.
>> >>
>> >> How do you tell that controller A means external system?
>> 
>> Perhaps the attr name could be explicitly containing "external" word?
>> Like:
>> "ext_controller" or "extnum" (similar to "pfnum" and "vfnum") something
>> like that.
>
>How about ecnum "external controller number"?
>Tiny change in the phys_port_name below example.
>
>> 
>> 
>> >Which is why I started with annotating only external controllers, mainly to
>> avoid renaming and breaking current scheme for non_smartnic cases which
>> possibly is the most user base.
>> >
>> >But probably external pcipf/vf/sf port flavours are more intuitive combined
>> with controller number.
>> >More below.
>> >
>> >>
>> >> > > > > I can see how having multiple controllers may make things
>> >> > > > > clearer, but adding another layer of IDs while the one under
>> >> > > > > it is unused
>> >> > > > > (pfnum=0) feels very unnecessary.
>> >> > > > pfnum=0 is used today. not sure I understand your comment about
>> >> > > > being unused. Can you please explain?
>> >> > >
>> >> > > You examples only ever have pfnum 0:
>> >> > >
>> >> > Because both controllers have pfnum 0.
>> >> >
>> >> > > From patch 2:
>> >> > >
>> >> > > $ devlink port show pci/0000:00:08.0/2
>> >> > > pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour
>> >> > > pcivf pfnum 0 vfnum 1 splittable false
>> >> > >   function:
>> >> > >     hw_addr 00:00:00:00:00:00
>> >> > >
>> >> > > $ devlink port show -jp pci/0000:00:08.0/2 {
>> >> > >     "port": {
>> >> > >         "pci/0000:00:08.0/1": {
>> >> > >             "type": "eth",
>> >> > >             "netdev": "eth7",
>> >> > >             "controller": 0,
>> >> > >             "flavour": "pcivf",
>> >> > >             "pfnum": 0,
>> >> > >             "vfnum": 1,
>> >> > >             "splittable": false,
>> >> > >             "function": {
>> >> > >                 "hw_addr": "00:00:00:00:00:00"
>> >> > >             }
>> >> > >         }
>> >> > >     }
>> >> > > }
>> >> > >
>> >> > > From earlier email:
>> >> > >
>> >> > > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
>> >> > > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
>> >> > >
>> >> > > If you never use pfnum, you can just put the controller ID there,
>> >> > > like
>> >> Netronome.
>> >> > >
>> >> > It likely not going to work for us. Because pfnum is not some
>> >> > randomly
>> >> generated number.
>> >> > It is linked to the underlying PCI pf number. {pf0, pf1...}
>> >> > Orchestration sw uses this to identify representor of a PF-VF pair.
>> >>
>> >> For orchestration software which is unaware of controllers ports will
>> >> still alias on pf/vf nums.
>> >>
>> >Yes.
>> >Orchestration which will be aware of controller, will use it.
>> >
>> >> Besides you have one devlink instance per port currently so I'm
>> >> guessing there is no pf1 ever, in your case...
>> >>
>> >Currently there are multiple devlink instance. One for pf0, other for pf1.
>> >Ports of both instances have the same switch id.
>> >
>> >> > Replacing pfnum with controller number breaks this; and it still
>> >> > doesn't tell user
>> >> that it's the pf on other_host.
>> >>
>> >> Neither does the opaque controller id.
>> >Which is why I tossed the epcipf (external pci pf) port flavour that fits in
>> current model.
>> >But doesn't allow multiple external hosts under same eswitch for those
>> devices which has same pci pf, vf numbers among those hosts. (and it is the
>> case for mlnx).
>> >
>> >> Maybe now you understand better why I wanted peer objects :/
>> >>
>> >I wasn't against peer object. But showing netdev of peer object assumed
>> no_smartnic, it also assume other_side is also similar Linux kernel.
>> >Anyways, I make humble request get over the past to move forward. :-)
>> >
>> >> > So it is used, and would like to continue to use even if there are
>> >> > multiple PFs
>> >> port (that has same pfnum) under the same eswitch.
>> >> >
>> >> > In an alternative,
>> >> > Currently we have pcipf, pcivf (and pcisf) flavours. May be if we
>> >> > introduce new
>> >> flavour say 'epcipf' to indicate external pci PF/VF/SF ports?
>> >> > There can be better name than epcipf. I just put epcipf to differentiate
>> it.
>> >> > However these ports have same attributes as pcipf, pcivf, pcisf flavours.
>> >>
>> >> I don't think the controllers are a terrible idea. Seems like a
>> >> fairly reasonable extension.
>> >Ok.
>> >> But MLX don't seem to need them. And you have a history of trying to
>> >> make the Linux APIs look like your FW API.
>> >>
>> >Because there are two devlink instances for each PF?
>> >I think for now an epcipf, epcivf flavour would just suffice due to lack of
>> multiple devlink instances.
>> >But in long run it is better to have the controller covering few topologies.
>> >Otherwise we will break the rep naming later when multiple controllers are
>> managed by single eswitch (without notion of controller).
>> >
>> >Sometime my text is confusing. :-) so adding example of the thoughts
>> below.
>> >Example: Eswitch side devlink port show for multi-host setup considering
>> the smartnic.
>> >
>> >$ devlink port show
>> >pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
>> >pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
>> >pci/0000:00:08.0/2: type eth netdev enp0s8f0_c0pf0 flavour epcipf pfnum 0
>> >                                                                                                             ^^^^^ new port
>> flavour.
>> >pci/0000:00:08.1/0: type eth netdev enp0s8f1 flavour physical
>> >pci/0000:00:08.1/1: type eth netdev enp0s8f1_pf1 flavour pcipf pfnum 1
>> >pci/0000:00:08.1/2: type eth netdev enp0s8f1_c0pf1 flavour epcipf pfnum
>> >1
>> >
>> >Here one controller has two pci pfs (0,1}. Eswitch shows that they are
>> external pci ports.
>> >Whenever (not sure when), mlnx converts to single devlink instance, this
>> will continue to work.
>> >It will also work when multiple controller(s) (of external host) ports have
>> same switch_id (for orchestration).
>> >And this doesn't break any backward compatibility for non multihost, non
>> smatnic users.
>> >
>> >> Jiri, would you mind chiming in? What's your take?
>> >
>> >Will wait for his inputs..
>> 
>> I don't see the need for new flavour. The port is still pf same as the local pf, it
>> only resides on a different host. We just need to make sure to resolve the
>> conflict between PFX and PFX on 2 different hosts (local/ext or ext/ext)
>> 
>Yes. I agree. I do not have strong opinion on new flavour as long as we make clear that this is for the external controller.
>
>> So I think that for local PFs, no change is needed.
>Yep.
>
>> The external PFs need to have an extra attribute with "external
>> enumeration" what would be used for the representor netdev name as well.
>> 
>> pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
>> pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
>> pci/0000:00:08.0/2: type eth netdev enp0s8f0_e0pf0 flavour pcipf extnum 0
>> pfnum 0
>
>How about a prefix of "ec" instead of "e", like?
>pci/0000:00:08.0/2: type eth netdev enp0s8f0_ec0pf0 flavour pcipf ecnum 0 pfnum 0

Yeah, looks fine to me. Jakub?


>                                                                                     ^^^^
>> pci/0000:00:08.0/3: type eth netdev enp0s8f0_e1pf0 flavour pcipf extnum 1
>> pfnum 0
>
Jakub Kicinski Sept. 1, 2020, 9:28 p.m. UTC | #15
On Tue, 1 Sep 2020 11:17:42 +0200 Jiri Pirko wrote:
> >> The external PFs need to have an extra attribute with "external
> >> enumeration" what would be used for the representor netdev name as well.
> >> 
> >> pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
> >> pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
> >> pci/0000:00:08.0/2: type eth netdev enp0s8f0_e0pf0 flavour pcipf extnum 0
> >> pfnum 0  
> >
> >How about a prefix of "ec" instead of "e", like?
> >pci/0000:00:08.0/2: type eth netdev enp0s8f0_ec0pf0 flavour pcipf ecnum 0 pfnum 0  
> 
> Yeah, looks fine to me. Jakub?

I don't like that local port doesn't have the controller ID.

Whether PCI port is external or not is best described by a the peer
relation. Failing that, at the very least "external" should be a
separate attribute/flag from the controller ID.

I didn't quite get the fact that you want to not show controller ID 
on the local port, initially.
Parav Pandit Sept. 2, 2020, 4:26 a.m. UTC | #16
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Wednesday, September 2, 2020 2:59 AM
> 
> On Tue, 1 Sep 2020 11:17:42 +0200 Jiri Pirko wrote:
> > >> The external PFs need to have an extra attribute with "external
> > >> enumeration" what would be used for the representor netdev name as well.
> > >>
> > >> pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
> > >> pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf
> > >> pfnum 0
> > >> pci/0000:00:08.0/2: type eth netdev enp0s8f0_e0pf0 flavour pcipf
> > >> extnum 0 pfnum 0
> > >
> > >How about a prefix of "ec" instead of "e", like?
> > >pci/0000:00:08.0/2: type eth netdev enp0s8f0_ec0pf0 flavour pcipf
> > >ecnum 0 pfnum 0
> >
> > Yeah, looks fine to me. Jakub?
> 
> I don't like that local port doesn't have the controller ID.
> 
Adding controller ID to local port will change name for all non smartnic deployments that affects current vast user base :-(

> Whether PCI port is external or not is best described by a the peer relation.

How about adding an attribute something like below in addition to controller id.

$ devlink port show
pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0 ecnum 0 external true splitable false
                                                                                                                                                    ^^^^^^^^^^^

> Failing that, at the very least "external" should be a separate attribute/flag from
> the controller ID.
>
Ok. Looks fine to me.

Jiri?

> I didn't quite get the fact that you want to not show controller ID on the local
> port, initially.
Mainly to not_break current users.
Parav Pandit Sept. 2, 2020, 4:44 a.m. UTC | #17
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Parav Pandit
> 
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Wednesday, September 2, 2020 2:59 AM
> >
> > On Tue, 1 Sep 2020 11:17:42 +0200 Jiri Pirko wrote:
> > > >> The external PFs need to have an extra attribute with "external
> > > >> enumeration" what would be used for the representor netdev name as
> well.
> > > >>
> > > >> pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
> > > >> pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf
> > > >> pfnum 0
> > > >> pci/0000:00:08.0/2: type eth netdev enp0s8f0_e0pf0 flavour pcipf
> > > >> extnum 0 pfnum 0
> > > >
> > > >How about a prefix of "ec" instead of "e", like?
> > > >pci/0000:00:08.0/2: type eth netdev enp0s8f0_ec0pf0 flavour pcipf
> > > >ecnum 0 pfnum 0
> > >
> > > Yeah, looks fine to me. Jakub?
> >
> > I don't like that local port doesn't have the controller ID.
> >
> Adding controller ID to local port will change name for all non smartnic
> deployments that affects current vast user base :-(
> 
> > Whether PCI port is external or not is best described by a the peer relation.
> 
> How about adding an attribute something like below in addition to controller id.
> 
> $ devlink port show
> pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0 ecnum
> 0 external true splitable false
> 
> ^^^^^^^^^^^
> 
I am sorry for messing up the example in previous email.
Please find below examples with controller number and external attribute flag.

$ devlink port show
pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0 external false splittable false
pci/0000:00:08.0/2: type eth netdev enp0s8f0_ec0pf0 flavour pcipf pfnum 0 ecnum 0 external true splittable false

> > Failing that, at the very least "external" should be a separate
> > attribute/flag from the controller ID.
> >
> Ok. Looks fine to me.
> 
> Jiri?
> 
> > I didn't quite get the fact that you want to not show controller ID on
> > the local port, initially.
> Mainly to not_break current users.
Jiri Pirko Sept. 2, 2020, 8 a.m. UTC | #18
Wed, Sep 02, 2020 at 06:26:12AM CEST, parav@nvidia.com wrote:
>
>
>> From: Jakub Kicinski <kuba@kernel.org>
>> Sent: Wednesday, September 2, 2020 2:59 AM
>> 
>> On Tue, 1 Sep 2020 11:17:42 +0200 Jiri Pirko wrote:
>> > >> The external PFs need to have an extra attribute with "external
>> > >> enumeration" what would be used for the representor netdev name as well.
>> > >>
>> > >> pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
>> > >> pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf
>> > >> pfnum 0
>> > >> pci/0000:00:08.0/2: type eth netdev enp0s8f0_e0pf0 flavour pcipf
>> > >> extnum 0 pfnum 0
>> > >
>> > >How about a prefix of "ec" instead of "e", like?
>> > >pci/0000:00:08.0/2: type eth netdev enp0s8f0_ec0pf0 flavour pcipf
>> > >ecnum 0 pfnum 0
>> >
>> > Yeah, looks fine to me. Jakub?
>> 
>> I don't like that local port doesn't have the controller ID.
>> 
>Adding controller ID to local port will change name for all non smartnic deployments that affects current vast user base :-(
>
>> Whether PCI port is external or not is best described by a the peer relation.
>
>How about adding an attribute something like below in addition to controller id.
>
>$ devlink port show
>pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0 ecnum 0 external true splitable false
>                                                                                                                                                    ^^^^^^^^^^^
>
>> Failing that, at the very least "external" should be a separate attribute/flag from
>> the controller ID.
>>
>Ok. Looks fine to me.
>
>Jiri?

Yeah, why not.

>
>> I didn't quite get the fact that you want to not show controller ID on the local
>> port, initially.
>Mainly to not_break current users.

You don't have to take it to the name, unless "external" flag is set.

But I don't really see the point of showing !external, cause such
controller number would be always 0. Jakub, why do you think it is
needed?
Jakub Kicinski Sept. 2, 2020, 3:23 p.m. UTC | #19
On Wed, 2 Sep 2020 10:00:11 +0200 Jiri Pirko wrote:
>>> I didn't quite get the fact that you want to not show controller ID on the local
>>> port, initially.  
>> Mainly to not_break current users.  
> 
> You don't have to take it to the name, unless "external" flag is set.
> 
> But I don't really see the point of showing !external, cause such
> controller number would be always 0. Jakub, why do you think it is
> needed?

It may seem reasonable for a smartNIC where there are only two
controllers, and all you really need is that external flag. 

In a general case when users are trying to figure out the topology
not knowing which controller they are sitting at looks like a serious
limitation.

Example - multi-host system and you want to know which controller you
are to run power cycle from the BMC side.

We won't be able to change that because it'd change the names for you.
Parav Pandit Sept. 2, 2020, 4:18 p.m. UTC | #20
Hi Jakub,

> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Wednesday, September 2, 2020 8:54 PM
> 
> On Wed, 2 Sep 2020 10:00:11 +0200 Jiri Pirko wrote:
> >>> I didn't quite get the fact that you want to not show controller ID
> >>> on the local port, initially.
> >> Mainly to not_break current users.
> >
> > You don't have to take it to the name, unless "external" flag is set.
> >
> > But I don't really see the point of showing !external, cause such
> > controller number would be always 0. Jakub, why do you think it is
> > needed?
> 
> It may seem reasonable for a smartNIC where there are only two controllers,
> and all you really need is that external flag.
> 
> In a general case when users are trying to figure out the topology not
> knowing which controller they are sitting at looks like a serious limitation.
> 
> Example - multi-host system and you want to know which controller you are
> to run power cycle from the BMC side.
> 
> We won't be able to change that because it'd change the names for you.

Is BMC controller running devlink instance?
How the power outlet and devlink instance are connected?
Can you please provide the example to understand the relation?
Parav Pandit Sept. 2, 2020, 8:10 p.m. UTC | #21
> From: Parav Pandit <parav@nvidia.com>
> Sent: Wednesday, September 2, 2020 9:49 PM
> 
> Hi Jakub,
> 
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Wednesday, September 2, 2020 8:54 PM
> >
> > On Wed, 2 Sep 2020 10:00:11 +0200 Jiri Pirko wrote:
> > >>> I didn't quite get the fact that you want to not show controller
> > >>> ID on the local port, initially.
> > >> Mainly to not_break current users.
> > >
> > > You don't have to take it to the name, unless "external" flag is set.
> > >
> > > But I don't really see the point of showing !external, cause such
> > > controller number would be always 0. Jakub, why do you think it is
> > > needed?
> >
> > It may seem reasonable for a smartNIC where there are only two
> > controllers, and all you really need is that external flag.
> >
With only an external flag how shall we differentiate more than one external controllers?

This is the example of on a single physical smartnic device which is serving two hosts.
Each external host has one controller (c1, c2).
Each controller consist of two PCI PFs. (marked with external = true)
Devlink instance is service the local controller too.

cnum = controller number.
external = true/false describes if its external port or local.

Below naming scheme enables one or more controllers, one or more PFs to be managed by individual devlink instance per controller or one devlink instance for all controller without any ambiguity.
Does this look good?

$ devlink port show
pci/0000:00:08.0/0: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0 external false
pci/0000:00:08.0/1: type eth netdev enp0s8f0_c1pf0 flavour pcipf pfnum 0 cnum 1 external true

pci/0000:00:08.1/0: type eth netdev enp0s8f1_pf1 flavour pcipf pfnum 1 external false
pci/0000:00:08.1/1: type eth netdev enp0s8f1_c1pf1 flavour pcipf pfnum 1 cnum 1 external true

pci/0000:00:08.2/0: type eth netdev enp0s8f2_pf0 flavour pcipf pfnum 0 external false
pci/0000:00:08.2/1: type eth netdev enp0s8f2_c2pf0 flavour pcipf pfnum 0 cnum 2 external true

pci/0000:00:08.3/0: type eth netdev enp0s8f3_pf1 flavour pcipf pfnum 1 external false
pci/0000:00:08.3/1: type eth netdev enp0s8f3_c2pf1 flavour pcipf pfnum 1 cnum 2 external true

> > In a general case when users are trying to figure out the topology not
> > knowing which controller they are sitting at looks like a serious limitation.
> >
> > Example - multi-host system and you want to know which controller you
> > are to run power cycle from the BMC side.
> >
Did you mean a controller inside the host itself (not in smrtnic) needs to know what is its controller identifier?

> > We won't be able to change that because it'd change the names for you.
> 
> Is BMC controller running devlink instance?
> How the power outlet and devlink instance are connected?
> Can you please provide the example to understand the relation?
Jiri Pirko Sept. 3, 2020, 5:54 a.m. UTC | #22
Wed, Sep 02, 2020 at 05:23:58PM CEST, kuba@kernel.org wrote:
>On Wed, 2 Sep 2020 10:00:11 +0200 Jiri Pirko wrote:
>>>> I didn't quite get the fact that you want to not show controller ID on the local
>>>> port, initially.  
>>> Mainly to not_break current users.  
>> 
>> You don't have to take it to the name, unless "external" flag is set.
>> 
>> But I don't really see the point of showing !external, cause such
>> controller number would be always 0. Jakub, why do you think it is
>> needed?
>
>It may seem reasonable for a smartNIC where there are only two
>controllers, and all you really need is that external flag. 
>
>In a general case when users are trying to figure out the topology
>not knowing which controller they are sitting at looks like a serious
>limitation.

I think we misunderstood each other. I never proposed just "external"
flag. What I propose is either:
1) ecnum attribute absent for local
   ecnum attribute absent set to 0 for external controller X
   ecnum attribute absent set to 1 for external controller Y
   ...

or:
2) ecnum attribute absent for local, external flag set to false
   ecnum attribute absent set to 0 for external controller X, external flag set to true
   ecnum attribute absent set to 1 for external controller Y, external flag set to true

>
>Example - multi-host system and you want to know which controller you
>are to run power cycle from the BMC side.
>
>We won't be able to change that because it'd change the names for you.
Jakub Kicinski Sept. 3, 2020, 7:31 p.m. UTC | #23
On Thu, 3 Sep 2020 07:54:39 +0200 Jiri Pirko wrote:
> Wed, Sep 02, 2020 at 05:23:58PM CEST, kuba@kernel.org wrote:
> >On Wed, 2 Sep 2020 10:00:11 +0200 Jiri Pirko wrote:  
> >>>> I didn't quite get the fact that you want to not show controller ID on the local
> >>>> port, initially.    
> >>> Mainly to not_break current users.    
> >> 
> >> You don't have to take it to the name, unless "external" flag is set.
> >> 
> >> But I don't really see the point of showing !external, cause such
> >> controller number would be always 0. Jakub, why do you think it is
> >> needed?  
> >
> >It may seem reasonable for a smartNIC where there are only two
> >controllers, and all you really need is that external flag. 
> >
> >In a general case when users are trying to figure out the topology
> >not knowing which controller they are sitting at looks like a serious
> >limitation.  
> 
> I think we misunderstood each other. I never proposed just "external"
> flag.

Sorry, I was just saying that assuming a single host SmartNIC the
controller ID is not necessary at all. You never suggested that, I did. 
Looks like I just confused everyone with that comment :(

Different controller ID for different PFs but the same PCIe link would
be very wrong. So please clarify - if I have a 2 port smartNIC, with on
PCIe link to the host, and the embedded controller - what would I see?

> What I propose is either:
> 1) ecnum attribute absent for local
>    ecnum attribute absent set to 0 for external controller X
>    ecnum attribute absent set to 1 for external controller Y
>    ...
> 
> or:
> 2) ecnum attribute absent for local, external flag set to false
>    ecnum attribute absent set to 0 for external controller X, external flag set to true
>    ecnum attribute absent set to 1 for external controller Y, external flag set to true

I'm saying that I do want to see the the controller ID for all ports.

So:

3) local:   { "controller ID": x }
   remote1: { "controller ID": y, "external": true }
   remote1: { "controller ID": z, "external": true }

We don't have to put the controller ID in the name for local ports, but
the attribute should be reported. AFAIU name was your main concern, no?

> >Example - multi-host system and you want to know which controller you
> >are to run power cycle from the BMC side.
> >
> >We won't be able to change that because it'd change the names for you.
Jiri Pirko Sept. 4, 2020, 8:43 a.m. UTC | #24
Thu, Sep 03, 2020 at 09:31:23PM CEST, kuba@kernel.org wrote:
>On Thu, 3 Sep 2020 07:54:39 +0200 Jiri Pirko wrote:
>> Wed, Sep 02, 2020 at 05:23:58PM CEST, kuba@kernel.org wrote:
>> >On Wed, 2 Sep 2020 10:00:11 +0200 Jiri Pirko wrote:  
>> >>>> I didn't quite get the fact that you want to not show controller ID on the local
>> >>>> port, initially.    
>> >>> Mainly to not_break current users.    
>> >> 
>> >> You don't have to take it to the name, unless "external" flag is set.
>> >> 
>> >> But I don't really see the point of showing !external, cause such
>> >> controller number would be always 0. Jakub, why do you think it is
>> >> needed?  
>> >
>> >It may seem reasonable for a smartNIC where there are only two
>> >controllers, and all you really need is that external flag. 
>> >
>> >In a general case when users are trying to figure out the topology
>> >not knowing which controller they are sitting at looks like a serious
>> >limitation.  
>> 
>> I think we misunderstood each other. I never proposed just "external"
>> flag.
>
>Sorry, I was just saying that assuming a single host SmartNIC the
>controller ID is not necessary at all. You never suggested that, I did. 
>Looks like I just confused everyone with that comment :(
>
>Different controller ID for different PFs but the same PCIe link would
>be very wrong. So please clarify - if I have a 2 port smartNIC, with on
>PCIe link to the host, and the embedded controller - what would I see?

Parav?


>
>> What I propose is either:
>> 1) ecnum attribute absent for local
>>    ecnum attribute absent set to 0 for external controller X
>>    ecnum attribute absent set to 1 for external controller Y
>>    ...
>> 
>> or:
>> 2) ecnum attribute absent for local, external flag set to false
>>    ecnum attribute absent set to 0 for external controller X, external flag set to true
>>    ecnum attribute absent set to 1 for external controller Y, external flag set to true
>
>I'm saying that I do want to see the the controller ID for all ports.
>
>So:
>
>3) local:   { "controller ID": x }
>   remote1: { "controller ID": y, "external": true }
>   remote1: { "controller ID": z, "external": true }
>
>We don't have to put the controller ID in the name for local ports, but
>the attribute should be reported. AFAIU name was your main concern, no?

Okay. Sounds fine. Let's put the controller number there for all ports.
ctrlnum X external true
ctrlnum Y external false

if (!external)
	ignore the ctrlnum when generating the name


>
>> >Example - multi-host system and you want to know which controller you
>> >are to run power cycle from the BMC side.
>> >
>> >We won't be able to change that because it'd change the names for you.
Parav Pandit Sept. 6, 2020, 3:08 a.m. UTC | #25
> From: Jiri Pirko <jiri@resnulli.us>
> Sent: Friday, September 4, 2020 2:13 PM
> 
> Thu, Sep 03, 2020 at 09:31:23PM CEST, kuba@kernel.org wrote:
> >On Thu, 3 Sep 2020 07:54:39 +0200 Jiri Pirko wrote:
> >> Wed, Sep 02, 2020 at 05:23:58PM CEST, kuba@kernel.org wrote:
> >> >On Wed, 2 Sep 2020 10:00:11 +0200 Jiri Pirko wrote:
> >> >>>> I didn't quite get the fact that you want to not show controller ID on the
> local
> >> >>>> port, initially.
> >> >>> Mainly to not_break current users.
> >> >>
> >> >> You don't have to take it to the name, unless "external" flag is set.
> >> >>
> >> >> But I don't really see the point of showing !external, cause such
> >> >> controller number would be always 0. Jakub, why do you think it is
> >> >> needed?
> >> >
> >> >It may seem reasonable for a smartNIC where there are only two
> >> >controllers, and all you really need is that external flag.
> >> >
> >> >In a general case when users are trying to figure out the topology
> >> >not knowing which controller they are sitting at looks like a
> >> >serious limitation.
> >>
> >> I think we misunderstood each other. I never proposed just "external"
> >> flag.
> >
> >Sorry, I was just saying that assuming a single host SmartNIC the
> >controller ID is not necessary at all. You never suggested that, I did.
> >Looks like I just confused everyone with that comment :(
> >
> >Different controller ID for different PFs but the same PCIe link would
> >be very wrong. So please clarify - if I have a 2 port smartNIC, with on
> >PCIe link to the host, and the embedded controller - what would I see?
> 
> Parav?
>
One controller id for both such PFs.
I liked the idea of putting controller number for all the ports but not embedded for local ports.

> 
> >
> >> What I propose is either:
> >> 1) ecnum attribute absent for local
> >>    ecnum attribute absent set to 0 for external controller X
> >>    ecnum attribute absent set to 1 for external controller Y
> >>    ...
> >>
> >> or:
> >> 2) ecnum attribute absent for local, external flag set to false
> >>    ecnum attribute absent set to 0 for external controller X, external flag set
> to true
> >>    ecnum attribute absent set to 1 for external controller Y,
> >> external flag set to true
> >
> >I'm saying that I do want to see the the controller ID for all ports.
> >
> >So:
> >
> >3) local:   { "controller ID": x }
> >   remote1: { "controller ID": y, "external": true }
> >   remote1: { "controller ID": z, "external": true }
> >
> >We don't have to put the controller ID in the name for local ports, but
> >the attribute should be reported. AFAIU name was your main concern, no?
> 
> Okay. Sounds fine. Let's put the controller number there for all ports.
> ctrlnum X external true
> ctrlnum Y external false
> 
> if (!external)
> 	ignore the ctrlnum when generating the name
> 

Putting little more realistic example for Jakub's and your suggestion below.

Below is the output for 3 controllers. ( 2 external + 1 local )
Each external controller consist of 2 PCI PFs for a external host via single PCIe cable.
Each local controller consist of 1 PCI PF.

$ devlink port show
pci/0000:00:08.0/0: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0 cnum 0 external false
pci/0000:00:08.0/1: type eth netdev enp0s8f0_c1pf0 flavour pcipf pfnum 0 cnum 1 external true
pci/0000:00:08.1/1: type eth netdev enp0s8f1_c1pf1 flavour pcipf pfnum 1 cnum 1 external true

Looks ok?

> 
> >
> >> >Example - multi-host system and you want to know which controller
> >> >you are to run power cycle from the BMC side.
> >> >
> >> >We won't be able to change that because it'd change the names for you.
Jakub Kicinski Sept. 6, 2020, 4:46 p.m. UTC | #26
On Sun, 6 Sep 2020 03:08:45 +0000 Parav Pandit wrote:
> > >3) local:   { "controller ID": x }
> > >   remote1: { "controller ID": y, "external": true }
> > >   remote1: { "controller ID": z, "external": true }
> > >
> > >We don't have to put the controller ID in the name for local ports, but
> > >the attribute should be reported. AFAIU name was your main concern, no?  
> > 
> > Okay. Sounds fine. Let's put the controller number there for all ports.
> > ctrlnum X external true
> > ctrlnum Y external false
> > 
> > if (!external)
> > 	ignore the ctrlnum when generating the name
> >   
> 
> Putting little more realistic example for Jakub's and your suggestion below.
> 
> Below is the output for 3 controllers. ( 2 external + 1 local )
> Each external controller consist of 2 PCI PFs for a external host via single PCIe cable.
> Each local controller consist of 1 PCI PF.
> 
> $ devlink port show
> pci/0000:00:08.0/0: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0 cnum 0 external false
> pci/0000:00:08.0/1: type eth netdev enp0s8f0_c1pf0 flavour pcipf pfnum 0 cnum 1 external true
> pci/0000:00:08.1/1: type eth netdev enp0s8f1_c1pf1 flavour pcipf pfnum 1 cnum 1 external true
> 
> Looks ok?

Yup, looks good, thanks.
Jiri Pirko Sept. 7, 2020, 7:21 a.m. UTC | #27
Sun, Sep 06, 2020 at 05:08:45AM CEST, parav@nvidia.com wrote:
>
>
>> From: Jiri Pirko <jiri@resnulli.us>
>> Sent: Friday, September 4, 2020 2:13 PM
>> 
>> Thu, Sep 03, 2020 at 09:31:23PM CEST, kuba@kernel.org wrote:
>> >On Thu, 3 Sep 2020 07:54:39 +0200 Jiri Pirko wrote:
>> >> Wed, Sep 02, 2020 at 05:23:58PM CEST, kuba@kernel.org wrote:
>> >> >On Wed, 2 Sep 2020 10:00:11 +0200 Jiri Pirko wrote:
>> >> >>>> I didn't quite get the fact that you want to not show controller ID on the
>> local
>> >> >>>> port, initially.
>> >> >>> Mainly to not_break current users.
>> >> >>
>> >> >> You don't have to take it to the name, unless "external" flag is set.
>> >> >>
>> >> >> But I don't really see the point of showing !external, cause such
>> >> >> controller number would be always 0. Jakub, why do you think it is
>> >> >> needed?
>> >> >
>> >> >It may seem reasonable for a smartNIC where there are only two
>> >> >controllers, and all you really need is that external flag.
>> >> >
>> >> >In a general case when users are trying to figure out the topology
>> >> >not knowing which controller they are sitting at looks like a
>> >> >serious limitation.
>> >>
>> >> I think we misunderstood each other. I never proposed just "external"
>> >> flag.
>> >
>> >Sorry, I was just saying that assuming a single host SmartNIC the
>> >controller ID is not necessary at all. You never suggested that, I did.
>> >Looks like I just confused everyone with that comment :(
>> >
>> >Different controller ID for different PFs but the same PCIe link would
>> >be very wrong. So please clarify - if I have a 2 port smartNIC, with on
>> >PCIe link to the host, and the embedded controller - what would I see?
>> 
>> Parav?
>>
>One controller id for both such PFs.
>I liked the idea of putting controller number for all the ports but not embedded for local ports.
>
>> 
>> >
>> >> What I propose is either:
>> >> 1) ecnum attribute absent for local
>> >>    ecnum attribute absent set to 0 for external controller X
>> >>    ecnum attribute absent set to 1 for external controller Y
>> >>    ...
>> >>
>> >> or:
>> >> 2) ecnum attribute absent for local, external flag set to false
>> >>    ecnum attribute absent set to 0 for external controller X, external flag set
>> to true
>> >>    ecnum attribute absent set to 1 for external controller Y,
>> >> external flag set to true
>> >
>> >I'm saying that I do want to see the the controller ID for all ports.
>> >
>> >So:
>> >
>> >3) local:   { "controller ID": x }
>> >   remote1: { "controller ID": y, "external": true }
>> >   remote1: { "controller ID": z, "external": true }
>> >
>> >We don't have to put the controller ID in the name for local ports, but
>> >the attribute should be reported. AFAIU name was your main concern, no?
>> 
>> Okay. Sounds fine. Let's put the controller number there for all ports.
>> ctrlnum X external true
>> ctrlnum Y external false
>> 
>> if (!external)
>> 	ignore the ctrlnum when generating the name
>> 
>
>Putting little more realistic example for Jakub's and your suggestion below.
>
>Below is the output for 3 controllers. ( 2 external + 1 local )
>Each external controller consist of 2 PCI PFs for a external host via single PCIe cable.
>Each local controller consist of 1 PCI PF.
>
>$ devlink port show
>pci/0000:00:08.0/0: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0 cnum 0 external false
>pci/0000:00:08.0/1: type eth netdev enp0s8f0_c1pf0 flavour pcipf pfnum 0 cnum 1 external true
>pci/0000:00:08.1/1: type eth netdev enp0s8f1_c1pf1 flavour pcipf pfnum 1 cnum 1 external true

I see cnum 0 and cnum 1, yet you talk about 3 controllers. What did I
miss?


>
>Looks ok?
>
>> 
>> >
>> >> >Example - multi-host system and you want to know which controller
>> >> >you are to run power cycle from the BMC side.
>> >> >
>> >> >We won't be able to change that because it'd change the names for you.
Jakub Kicinski Sept. 7, 2020, 4:18 p.m. UTC | #28
On Mon, 7 Sep 2020 09:21:53 +0200 Jiri Pirko wrote:
> >Putting little more realistic example for Jakub's and your suggestion below.
> >
> >Below is the output for 3 controllers. ( 2 external + 1 local )
> >Each external controller consist of 2 PCI PFs for a external host via single PCIe cable.
> >Each local controller consist of 1 PCI PF.
> >
> >$ devlink port show
> >pci/0000:00:08.0/0: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0 cnum 0 external false
> >pci/0000:00:08.0/1: type eth netdev enp0s8f0_c1pf0 flavour pcipf pfnum 0 cnum 1 external true
> >pci/0000:00:08.1/1: type eth netdev enp0s8f1_c1pf1 flavour pcipf pfnum 1 cnum 1 external true  
> 
> I see cnum 0 and cnum 1, yet you talk about 3 controllers. What did I
> miss?

Heh, good point. Please make sure to put this example in docs so folks
have a reference on how we expect a 2-port smartnic to look.
diff mbox series

Patch

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 3c7ba3e1f490..612f107b94ab 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -71,16 +71,20 @@  struct devlink_port_pci_vf_attrs {
  * @flavour: flavour of the port
  * @split: indicates if this is split port
  * @splittable: indicates if the port can be split.
+ * @controller_valid: indicates if the controller_num field is valid.
  * @lanes: maximum number of lanes the port supports. 0 value is not passed to netlink.
  * @switch_id: if the port is part of switch, this is buffer with ID, otherwise this is NULL
  * @phys: physical port attributes
  * @pci_pf: PCI PF port attributes
  * @pci_vf: PCI VF port attributes
+ * @controller_num: Controller number if a port is for other controller.
  */
 struct devlink_port_attrs {
 	u8 split:1,
-	   splittable:1;
+	   splittable:1,
+	   controller_valid:1;
 	u32 lanes;
+	u32 controller_num;
 	enum devlink_port_flavour flavour;
 	struct netdev_phys_item_id switch_id;
 	union {
@@ -1206,6 +1210,7 @@  void devlink_port_type_ib_set(struct devlink_port *devlink_port,
 void devlink_port_type_clear(struct devlink_port *devlink_port);
 void devlink_port_attrs_set(struct devlink_port *devlink_port,
 			    struct devlink_port_attrs *devlink_port_attrs);
+void devlink_port_attrs_controller_set(struct devlink_port *devlink_port, u32 controller);
 void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u16 pf);
 void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
 				   u16 pf, u16 vf);
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index cfef4245ea5a..886cddf6a0a9 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -458,6 +458,7 @@  enum devlink_attr {
 	DEVLINK_ATTR_PORT_LANES,			/* u32 */
 	DEVLINK_ATTR_PORT_SPLITTABLE,			/* u8 */
 
+	DEVLINK_ATTR_PORT_CONTROLLER_NUMBER,		/* u32 */
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 58c8bb07fa19..b9b71f119446 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -553,6 +553,11 @@  static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 	default:
 		break;
 	}
+
+	if (attrs->controller_valid &&
+	    nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER, attrs->controller_num))
+		return -EMSGSIZE;
+
 	return 0;
 }
 
@@ -7711,6 +7716,22 @@  void devlink_port_attrs_set(struct devlink_port *devlink_port,
 }
 EXPORT_SYMBOL_GPL(devlink_port_attrs_set);
 
+/**
+ *	devlink_port_attrs_controller_set - Set external controller identifier
+ *
+ *	@devlink_port: devlink port
+ *	@controller: associated controller number for the devlink port instance
+ *	devlink_port_attrs_controller_set() Sets the external controller identifier
+ *	for the port. This should be called by the driver for a devlink port which is associated
+ *	with the controller which is external to the devlink instance.
+ */
+void devlink_port_attrs_controller_set(struct devlink_port *devlink_port, u32 controller)
+{
+	devlink_port->attrs.controller_valid = true;
+	devlink_port->attrs.controller_num = controller;
+}
+EXPORT_SYMBOL_GPL(devlink_port_attrs_controller_set);
+
 /**
  *	devlink_port_attrs_pci_pf_set - Set PCI PF port attributes
  *
@@ -7762,6 +7783,14 @@  static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
 	if (!devlink_port->attrs_set)
 		return -EOPNOTSUPP;
 
+	if (attrs->controller_valid) {
+		n = snprintf(name, len, "c%u", attrs->controller_num);
+		if (n >= len)
+			return -EINVAL;
+		len -= n;
+		name += n;
+	}
+
 	switch (attrs->flavour) {
 	case DEVLINK_PORT_FLAVOUR_PHYSICAL:
 	case DEVLINK_PORT_FLAVOUR_VIRTUAL: