diff mbox series

[net-next,v2,1/8] devlink: Introduce PCI SF port flavour and port attribute

Message ID 20200917172020.26484-2-parav@nvidia.com
State Changes Requested
Delegated to: David Miller
Headers show
Series devlink: Add SF add/delete devlink ops | expand

Commit Message

Parav Pandit Sept. 17, 2020, 5:20 p.m. UTC
A PCI sub-function (SF) represents a portion of the device similar
to PCI VF.

In an eswitch, PCI SF may have port which is normally represented
using a representor netdevice.
To have better visibility of eswitch port, its association with SF,
and its representor netdevice, introduce a PCI SF port flavour.

When devlink port flavour is PCI SF, fill up PCI SF attributes of the
port.

Extend port name creation using PCI PF and SF number scheme on best
effort basis, so that vendor drivers can skip defining their own
scheme.

An example view of a PCI SF port.

$ devlink port show netdevsim/netdevsim10/2
netdevsim/netdevsim10/2: type eth netdev eni10npf0sf44 flavour pcisf controller 0 pfnum 0 sfnum 44 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00

devlink port show netdevsim/netdevsim10/2 -jp
{
    "port": {
        "netdevsim/netdevsim10/2": {
            "type": "eth",
            "netdev": "eni10npf0sf44",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 44,
            "external": false,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
 include/net/devlink.h        | 17 +++++++++++++++++
 include/uapi/linux/devlink.h |  7 +++++++
 net/core/devlink.c           | 37 ++++++++++++++++++++++++++++++++++++
 3 files changed, 61 insertions(+)

Comments

David Ahern Sept. 17, 2020, 8:01 p.m. UTC | #1
On 9/17/20 11:20 AM, Parav Pandit wrote:
> diff --git a/include/net/devlink.h b/include/net/devlink.h
> index 48b1c1ef1ebd..1edb558125b0 100644
> --- a/include/net/devlink.h
> +++ b/include/net/devlink.h
> @@ -83,6 +83,20 @@ struct devlink_port_pci_vf_attrs {
>  	u8 external:1;
>  };
>  
> +/**
> + * struct devlink_port_pci_sf_attrs - devlink port's PCI SF attributes
> + * @controller: Associated controller number
> + * @pf: Associated PCI PF number for this port.
> + * @sf: Associated PCI SF for of the PCI PF for this port.
> + * @external: when set, indicates if a port is for an external controller
> + */
> +struct devlink_port_pci_sf_attrs {
> +	u32 controller;
> +	u16 pf;
> +	u32 sf;

Why a u32? Do you expect to support that many SFs? Seems like even a u16
is more than you can adequately name within an IFNAMESZ buffer.


> +	u8 external:1;
> +};
> +
>  /**
>   * struct devlink_port_attrs - devlink port object
>   * @flavour: flavour of the port


> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index e5b71f3c2d4d..fada660fd515 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -7855,6 +7889,9 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
>  		n = snprintf(name, len, "pf%uvf%u",
>  			     attrs->pci_vf.pf, attrs->pci_vf.vf);
>  		break;
> +	case DEVLINK_PORT_FLAVOUR_PCI_SF:
> +		n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, attrs->pci_sf.sf);
> +		break;
>  	}
>  
>  	if (n >= len)
> 

And as I noted before, this function continues to grow device names and
it is going to spill over the IFNAMESZ buffer and EINVAL is going to be
confusing. It really needs better error handling back to users (not
kernel buffers).
Parav Pandit Sept. 18, 2020, 3:54 a.m. UTC | #2
> From: Jacob Keller <jacob.e.keller@intel.com>
> Sent: Friday, September 18, 2020 12:00 AM
> 
> 
> On 9/17/2020 10:20 AM, Parav Pandit wrote:
> > A PCI sub-function (SF) represents a portion of the device similar to
> > PCI VF.
> >
> > In an eswitch, PCI SF may have port which is normally represented
> > using a representor netdevice.
> > To have better visibility of eswitch port, its association with SF,
> > and its representor netdevice, introduce a PCI SF port flavour.
> >
> > When devlink port flavour is PCI SF, fill up PCI SF attributes of the
> > port.
> >
> > Extend port name creation using PCI PF and SF number scheme on best
> > effort basis, so that vendor drivers can skip defining their own
> > scheme.
> 
> What does this mean? What's the scheme used? 
>
Scheme used is equivalent as what is used for PCI VF ports. pfNvfM.
It is pfNsfM.
Below example shows the representor netdevice name as 'eni10npf0sf44' built by systemd/udev using phys_port_name.

> Do drivers still have the option to make their own scheme? If so, why?
Today we have two types of drivers (mlx5_core, netdevsim) which uses devlink core which creates the name.
Or other drivers (bnxt, nfp) which doesn't yet migrated to use devlink infra for PCI PF, VF ports.
Such drivers are phys_port_name and other ndos.
It is not the role of this patch to block those drivers, but any new implementation doesn't need to hand code switch_id and phys_port_name related ndos for SF.
For example, bnxt_vf_rep_get_phys_port_name().

> It's not obvious to me in this patch where the numbering scheme comes from. It
> looks like it's still up to the caller to set the numbers.
>
Naming scheme for PCI PF and PCI VF port flavours already exist.
Scheme is equivalent for PCI SF flavour.

I thought example is good enough to show that, but I will update commit message to describe this scheme to make it clear. pfNsfM.
 
> >
> > An example view of a PCI SF port.
> >
> > $ devlink port show netdevsim/netdevsim10/2
> > netdevsim/netdevsim10/2: type eth netdev eni10npf0sf44 flavour pcisf
> controller 0 pfnum 0 sfnum 44 external false splittable false
> >   function:
> >     hw_addr 00:00:00:00:00:00
> >
> > devlink port show netdevsim/netdevsim10/2 -jp {
> >     "port": {
> >         "netdevsim/netdevsim10/2": {
> >             "type": "eth",
> >             "netdev": "eni10npf0sf44",
> >             "flavour": "pcisf",
> >             "controller": 0,
> >             "pfnum": 0,
> >             "sfnum": 44,
> >             "external": false,
> >             "splittable": false,
> >             "function": {
> >                 "hw_addr": "00:00:00:00:00:00"
> >             }
> >         }
> >     }
> > }
> >
> > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > Reviewed-by: Jiri Pirko <jiri@nvidia.com>
> > ---
> >  include/net/devlink.h        | 17 +++++++++++++++++
> >  include/uapi/linux/devlink.h |  7 +++++++
> >  net/core/devlink.c           | 37 ++++++++++++++++++++++++++++++++++++
> >  3 files changed, 61 insertions(+)
> >


> >  static int __devlink_port_phys_port_name_get(struct devlink_port
> *devlink_port,
> >  					     char *name, size_t len)
> >  {
> > @@ -7855,6 +7889,9 @@ static int
> __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
> >  		n = snprintf(name, len, "pf%uvf%u",
> >  			     attrs->pci_vf.pf, attrs->pci_vf.vf);
> >  		break;
> > +	case DEVLINK_PORT_FLAVOUR_PCI_SF:
> > +		n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, attrs-
> >pci_sf.sf);
> > +		break;
> >  	}
> >
This is where the naming scheme is done, like pcipf and pcivf port flavours.

> >  	if (n >= len)
> >
Parav Pandit Sept. 18, 2020, 4:18 a.m. UTC | #3
> From: David Ahern <dsahern@gmail.com>
> Sent: Friday, September 18, 2020 1:32 AM
> 
> On 9/17/20 11:20 AM, Parav Pandit wrote:
> > diff --git a/include/net/devlink.h b/include/net/devlink.h index
> > 48b1c1ef1ebd..1edb558125b0 100644
> > --- a/include/net/devlink.h
> > +++ b/include/net/devlink.h
> > @@ -83,6 +83,20 @@ struct devlink_port_pci_vf_attrs {
> >  	u8 external:1;
> >  };
> >
> > +/**
> > + * struct devlink_port_pci_sf_attrs - devlink port's PCI SF
> > +attributes
> > + * @controller: Associated controller number
> > + * @pf: Associated PCI PF number for this port.
> > + * @sf: Associated PCI SF for of the PCI PF for this port.
> > + * @external: when set, indicates if a port is for an external
> > +controller  */ struct devlink_port_pci_sf_attrs {
> > +	u32 controller;
> > +	u16 pf;
> > +	u32 sf;
> 
> Why a u32? Do you expect to support that many SFs? Seems like even a u16 is
> more than you can adequately name within an IFNAMESZ buffer.
>
I think u16 is likely enough, which let use creates 64K SF ports which is a lot. :-)
Was little concerned that it shouldn't fall short in few years. So picked u32. 
Users will be able to make use of alternative names so just because IFNAMESZ is 16 characters, do not want to limit sfnum size.
What do you think?

> 
> > +	u8 external:1;
> > +};
> > +
> >  /**
> >   * struct devlink_port_attrs - devlink port object
> >   * @flavour: flavour of the port
> 
> 
> > diff --git a/net/core/devlink.c b/net/core/devlink.c index
> > e5b71f3c2d4d..fada660fd515 100644
> > --- a/net/core/devlink.c
> > +++ b/net/core/devlink.c
> > @@ -7855,6 +7889,9 @@ static int
> __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
> >  		n = snprintf(name, len, "pf%uvf%u",
> >  			     attrs->pci_vf.pf, attrs->pci_vf.vf);
> >  		break;
> > +	case DEVLINK_PORT_FLAVOUR_PCI_SF:
> > +		n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, attrs-
> >pci_sf.sf);
> > +		break;
> >  	}
> >
> >  	if (n >= len)
> >
> 
> And as I noted before, this function continues to grow device names and it is
> going to spill over the IFNAMESZ buffer and EINVAL is going to be confusing. It
> really needs better error handling back to users (not kernel buffers).
Alternative names [1] should help to overcome the limitation of IFNAMESZ.
For error code EINVAL, should it be ENOSPC?
If so, should I insert a pre-patch in this series?

[1] ip link property add dev DEVICE [ altname NAME .. ]
David Ahern Sept. 18, 2020, 3:15 p.m. UTC | #4
On 9/17/20 10:18 PM, Parav Pandit wrote:
> 
>> From: David Ahern <dsahern@gmail.com>
>> Sent: Friday, September 18, 2020 1:32 AM
>>
>> On 9/17/20 11:20 AM, Parav Pandit wrote:
>>> diff --git a/include/net/devlink.h b/include/net/devlink.h index
>>> 48b1c1ef1ebd..1edb558125b0 100644
>>> --- a/include/net/devlink.h
>>> +++ b/include/net/devlink.h
>>> @@ -83,6 +83,20 @@ struct devlink_port_pci_vf_attrs {
>>>  	u8 external:1;
>>>  };
>>>
>>> +/**
>>> + * struct devlink_port_pci_sf_attrs - devlink port's PCI SF
>>> +attributes
>>> + * @controller: Associated controller number
>>> + * @pf: Associated PCI PF number for this port.
>>> + * @sf: Associated PCI SF for of the PCI PF for this port.
>>> + * @external: when set, indicates if a port is for an external
>>> +controller  */ struct devlink_port_pci_sf_attrs {
>>> +	u32 controller;
>>> +	u16 pf;
>>> +	u32 sf;
>>
>> Why a u32? Do you expect to support that many SFs? Seems like even a u16 is
>> more than you can adequately name within an IFNAMESZ buffer.
>>
> I think u16 is likely enough, which let use creates 64K SF ports which is a lot. :-)
> Was little concerned that it shouldn't fall short in few years. So picked u32. 
> Users will be able to make use of alternative names so just because IFNAMESZ is 16 characters, do not want to limit sfnum size.
> What do you think?
> 
>>
>>> +	u8 external:1;
>>> +};
>>> +
>>>  /**
>>>   * struct devlink_port_attrs - devlink port object
>>>   * @flavour: flavour of the port
>>
>>
>>> diff --git a/net/core/devlink.c b/net/core/devlink.c index
>>> e5b71f3c2d4d..fada660fd515 100644
>>> --- a/net/core/devlink.c
>>> +++ b/net/core/devlink.c
>>> @@ -7855,6 +7889,9 @@ static int
>> __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
>>>  		n = snprintf(name, len, "pf%uvf%u",
>>>  			     attrs->pci_vf.pf, attrs->pci_vf.vf);
>>>  		break;
>>> +	case DEVLINK_PORT_FLAVOUR_PCI_SF:
>>> +		n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, attrs-
>>> pci_sf.sf);
>>> +		break;
>>>  	}
>>>
>>>  	if (n >= len)
>>>
>>
>> And as I noted before, this function continues to grow device names and it is
>> going to spill over the IFNAMESZ buffer and EINVAL is going to be confusing. It
>> really needs better error handling back to users (not kernel buffers).
> Alternative names [1] should help to overcome the limitation of IFNAMESZ.
> For error code EINVAL, should it be ENOSPC?
> If so, should I insert a pre-patch in this series?
> 
> [1] ip link property add dev DEVICE [ altname NAME .. ]
> 

You keep adding patches that extend the template based names. Those are
going to cause odd EINVAL failures (the absolute worst kind of
configuration failure) with no way for a user to understand why the
command is failing, and you need to handle that. Returning an extack
message back to the user is preferred.

Yes, the altnames provides a solution after the the netdevice has been
created, but I do not see how that works when the netdevice is created
as part of devlink commands using the template names approach.
Parav Pandit Sept. 18, 2020, 4:13 p.m. UTC | #5
Hi David,

> From: David Ahern <dsahern@gmail.com>
> Sent: Friday, September 18, 2020 8:45 PM
> 
> On 9/17/20 10:18 PM, Parav Pandit wrote:
> >
> >> From: David Ahern <dsahern@gmail.com>
> >> Sent: Friday, September 18, 2020 1:32 AM
> >>
> >> On 9/17/20 11:20 AM, Parav Pandit wrote:
> >>> diff --git a/include/net/devlink.h b/include/net/devlink.h index
> >>> 48b1c1ef1ebd..1edb558125b0 100644
> >>> --- a/include/net/devlink.h
> >>> +++ b/include/net/devlink.h
> >>> @@ -83,6 +83,20 @@ struct devlink_port_pci_vf_attrs {
> >>>  	u8 external:1;
> >>>  };
> >>>
> >>> +/**
> >>> + * struct devlink_port_pci_sf_attrs - devlink port's PCI SF
> >>> +attributes
> >>> + * @controller: Associated controller number
> >>> + * @pf: Associated PCI PF number for this port.
> >>> + * @sf: Associated PCI SF for of the PCI PF for this port.
> >>> + * @external: when set, indicates if a port is for an external
> >>> +controller  */ struct devlink_port_pci_sf_attrs {
> >>> +	u32 controller;
> >>> +	u16 pf;
> >>> +	u32 sf;
> >>
> >> Why a u32? Do you expect to support that many SFs? Seems like even a
> >> u16 is more than you can adequately name within an IFNAMESZ buffer.
> >>
> > I think u16 is likely enough, which let use creates 64K SF ports which
> > is a lot. :-) Was little concerned that it shouldn't fall short in few years. So
> picked u32.
> > Users will be able to make use of alternative names so just because
> IFNAMESZ is 16 characters, do not want to limit sfnum size.
> > What do you think?
> >
> >>
> >>> +	u8 external:1;
> >>> +};
> >>> +
> >>>  /**
> >>>   * struct devlink_port_attrs - devlink port object
> >>>   * @flavour: flavour of the port
> >>
> >>
> >>> diff --git a/net/core/devlink.c b/net/core/devlink.c index
> >>> e5b71f3c2d4d..fada660fd515 100644
> >>> --- a/net/core/devlink.c
> >>> +++ b/net/core/devlink.c
> >>> @@ -7855,6 +7889,9 @@ static int
> >> __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
> >>>  		n = snprintf(name, len, "pf%uvf%u",
> >>>  			     attrs->pci_vf.pf, attrs->pci_vf.vf);
> >>>  		break;
> >>> +	case DEVLINK_PORT_FLAVOUR_PCI_SF:
> >>> +		n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, attrs-
> >>> pci_sf.sf);
> >>> +		break;
> >>>  	}
> >>>
> >>>  	if (n >= len)
> >>>
> >>
> >> And as I noted before, this function continues to grow device names
> >> and it is going to spill over the IFNAMESZ buffer and EINVAL is going
> >> to be confusing. It really needs better error handling back to users (not
> kernel buffers).
> > Alternative names [1] should help to overcome the limitation of IFNAMESZ.
> > For error code EINVAL, should it be ENOSPC?
> > If so, should I insert a pre-patch in this series?
> >
> > [1] ip link property add dev DEVICE [ altname NAME .. ]
> >
> 
> You keep adding patches that extend the template based names. Those are
> going to cause odd EINVAL failures (the absolute worst kind of configuration
> failure) with no way for a user to understand why the command is failing, and
> you need to handle that. Returning an extack message back to the user is
> preferred.
Sure, make sense.
I will run one short series after this one, to return extack in below code flow and other where extack return is possible.
In sysfs flow it is irrelevant anyway.
rtnl_getlink()
  rtnl_phys_port_name_fill()
     dev_get_phys_port_name()
       ndo_phys_port_name()

is that ok?
This series is not really making phys_port_name any worse except that sfnum field width is bit larger than vfnum.

Now coming back to phys_port_name not fitting in 15 characters which can trigger -EINVAL error is very slim in most sane cases.
Let's take an example.
A controller in valid range of 0 to 16,
PF in range of 0 to 7,
VF in range of 0 to 255,
SF in range of 0 to 65536,

Will generate VF phys_port_name=c16pf7vf255 (11 chars + 1 null)
SF phys_port name = c16pf7sf65535 (13 chars + 1 null)
So yes, eth dev name won't fit in but phys_port_name failure is almost nil.

> 
> Yes, the altnames provides a solution after the the netdevice has been
> created, but I do not see how that works when the netdevice is created as
> part of devlink commands using the template names approach.
systemd/udev version v245 understands the parent PCI device of netdev and phys_port_name to construct a unique netdevice name.

This is the example from this patch series for PF port.

udevadm test-builtin net_id /sys/class/net/eth0
Load module index
Parsed configuration file /usr/lib/systemd/network/99-default.link
Created link configuration context.
Using default interface naming scheme 'v245'.
ID_NET_NAMING_SCHEME=v245
ID_NET_NAME_PATH=eni10npf0
Unload module index
Unloaded link configuration context.

And for SF port,

udevadm test-builtin net_id /sys/class/net/eth1
Load module index
Parsed configuration file /usr/lib/systemd/network/99-default.link
Created link configuration context.
Using default interface naming scheme 'v245'.
ID_NET_NAMING_SCHEME=v245
ID_NET_NAME_PATH=eni10npf0sf44
Unload module index
Unloaded link configuration context.

Similar VF naming in places for I think more than a year now.
Keller, Jacob E Sept. 18, 2020, 11:04 p.m. UTC | #6
On 9/17/2020 8:54 PM, Parav Pandit wrote:
> 
> 
>> From: Jacob Keller <jacob.e.keller@intel.com>
>> Sent: Friday, September 18, 2020 12:00 AM
>>
>>
>> On 9/17/2020 10:20 AM, Parav Pandit wrote:
>>> A PCI sub-function (SF) represents a portion of the device similar to
>>> PCI VF.
>>>
>>> In an eswitch, PCI SF may have port which is normally represented
>>> using a representor netdevice.
>>> To have better visibility of eswitch port, its association with SF,
>>> and its representor netdevice, introduce a PCI SF port flavour.
>>>
>>> When devlink port flavour is PCI SF, fill up PCI SF attributes of the
>>> port.
>>>
>>> Extend port name creation using PCI PF and SF number scheme on best
>>> effort basis, so that vendor drivers can skip defining their own
>>> scheme.
>>
>> What does this mean? What's the scheme used? 
>>
> Scheme used is equivalent as what is used for PCI VF ports. pfNvfM.
> It is pfNsfM.
> Below example shows the representor netdevice name as 'eni10npf0sf44' built by systemd/udev using phys_port_name.
> 
>> Do drivers still have the option to make their own scheme? If so, why?
> Today we have two types of drivers (mlx5_core, netdevsim) which uses devlink core which creates the name.
> Or other drivers (bnxt, nfp) which doesn't yet migrated to use devlink infra for PCI PF, VF ports.
> Such drivers are phys_port_name and other ndos.
> It is not the role of this patch to block those drivers, but any new implementation doesn't need to hand code switch_id and phys_port_name related ndos for SF.
> For example, bnxt_vf_rep_get_phys_port_name().
> 


Ok, thanks for the explanation.

>> It's not obvious to me in this patch where the numbering scheme comes from. It
>> looks like it's still up to the caller to set the numbers.
>>
> Naming scheme for PCI PF and PCI VF port flavours already exist.
> Scheme is equivalent for PCI SF flavour.
> 
> I thought example is good enough to show that, but I will update commit message to describe this scheme to make it clear. pfNsfM.
>  

I think I just hadn't quite moved from "sf number" to "name of the
netdevice" and was thinking of scheme for how the sf number is selected,
which isn't really what the statement was about.

>>>>>> An example view of a PCI SF port.
>>>
>>> $ devlink port show netdevsim/netdevsim10/2
>>> netdevsim/netdevsim10/2: type eth netdev eni10npf0sf44 flavour pcisf
>> controller 0 pfnum 0 sfnum 44 external false splittable false
>>>   function:
>>>     hw_addr 00:00:00:00:00:00
>>>
>>> devlink port show netdevsim/netdevsim10/2 -jp {
>>>     "port": {
>>>         "netdevsim/netdevsim10/2": {
>>>             "type": "eth",
>>>             "netdev": "eni10npf0sf44",
>>>             "flavour": "pcisf",
>>>             "controller": 0,
>>>             "pfnum": 0,
>>>             "sfnum": 44,
>>>             "external": false,
>>>             "splittable": false,
>>>             "function": {
>>>                 "hw_addr": "00:00:00:00:00:00"
>>>             }
>>>         }
>>>     }
>>> }
>>>
>>> Signed-off-by: Parav Pandit <parav@nvidia.com>
>>> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
>>> ---
>>>  include/net/devlink.h        | 17 +++++++++++++++++
>>>  include/uapi/linux/devlink.h |  7 +++++++
>>>  net/core/devlink.c           | 37 ++++++++++++++++++++++++++++++++++++
>>>  3 files changed, 61 insertions(+)
>>>
> 
> 
>>>  static int __devlink_port_phys_port_name_get(struct devlink_port
>> *devlink_port,
>>>  					     char *name, size_t len)
>>>  {
>>> @@ -7855,6 +7889,9 @@ static int
>> __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
>>>  		n = snprintf(name, len, "pf%uvf%u",
>>>  			     attrs->pci_vf.pf, attrs->pci_vf.vf);
>>>  		break;
>>> +	case DEVLINK_PORT_FLAVOUR_PCI_SF:
>>> +		n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, attrs-
>>> pci_sf.sf);
>>> +		break;
>>>  	}
>>>
> This is where the naming scheme is done, like pcipf and pcivf port flavours.
> 
>>>  	if (n >= len)
>>>
David Ahern Sept. 19, 2020, 4:49 a.m. UTC | #7
On 9/18/20 10:13 AM, Parav Pandit wrote:
>> You keep adding patches that extend the template based names. Those are
>> going to cause odd EINVAL failures (the absolute worst kind of configuration
>> failure) with no way for a user to understand why the command is failing, and
>> you need to handle that. Returning an extack message back to the user is
>> preferred.
> Sure, make sense.
> I will run one short series after this one, to return extack in below code flow and other where extack return is possible.
> In sysfs flow it is irrelevant anyway.

yes, sysfs is a different problem.

> rtnl_getlink()
>   rtnl_phys_port_name_fill()
>      dev_get_phys_port_name()
>        ndo_phys_port_name()
> 
> is that ok?

sure. The overflow is not going to happen today with 1-10 SFs; but you
are pushing ever close to the limit hence the push.


> This series is not really making phys_port_name any worse except that sfnum field width is bit larger than vfnum.
> 
> Now coming back to phys_port_name not fitting in 15 characters which can trigger -EINVAL error is very slim in most sane cases.
> Let's take an example.
> A controller in valid range of 0 to 16,
> PF in range of 0 to 7,
> VF in range of 0 to 255,
> SF in range of 0 to 65536,
> 
> Will generate VF phys_port_name=c16pf7vf255 (11 chars + 1 null)
> SF phys_port name = c16pf7sf65535 (13 chars + 1 null)
> So yes, eth dev name won't fit in but phys_port_name failure is almost nil.
> 

You lost me on that last sentence. Per your example for SF, adding 'eni'
or 'eth' as a prefix (which you commonly use in examples and which are
common prefixes) to that name and you overflow the IFNAMESZ limit.


(understand about the systemd integration and appreciate the forward
thinking about that).
Parav Pandit Sept. 19, 2020, 5:35 a.m. UTC | #8
> From: David Ahern <dsahern@gmail.com>
> Sent: Saturday, September 19, 2020 10:19 AM
> 
> On 9/18/20 10:13 AM, Parav Pandit wrote:
> >> You keep adding patches that extend the template based names. Those
> >> are going to cause odd EINVAL failures (the absolute worst kind of
> >> configuration
> >> failure) with no way for a user to understand why the command is
> >> failing, and you need to handle that. Returning an extack message
> >> back to the user is preferred.
> > Sure, make sense.
> > I will run one short series after this one, to return extack in below code flow
> and other where extack return is possible.
> > In sysfs flow it is irrelevant anyway.
> 
> yes, sysfs is a different problem.
> 
> > rtnl_getlink()
> >   rtnl_phys_port_name_fill()
> >      dev_get_phys_port_name()
> >        ndo_phys_port_name()
> >
> > is that ok?
> 
> sure. The overflow is not going to happen today with 1-10 SFs; but you are
> pushing ever close to the limit hence the push.
> 
Ok. yeah. got it.
> 
> > This series is not really making phys_port_name any worse except that sfnum
> field width is bit larger than vfnum.
> >
> > Now coming back to phys_port_name not fitting in 15 characters which can
> trigger -EINVAL error is very slim in most sane cases.
> > Let's take an example.
> > A controller in valid range of 0 to 16, PF in range of 0 to 7, VF in
> > range of 0 to 255, SF in range of 0 to 65536,
> >
> > Will generate VF phys_port_name=c16pf7vf255 (11 chars + 1 null) SF
> > phys_port name = c16pf7sf65535 (13 chars + 1 null) So yes, eth dev
> > name won't fit in but phys_port_name failure is almost nil.
> >
> 
> You lost me on that last sentence. Per your example for SF, adding 'eni'
> or 'eth' as a prefix (which you commonly use in examples and which are
> common prefixes) to that name and you overflow the IFNAMESZ limit.
>
Right. Got you.
Will run short series for returning extack.
 
> 
> (understand about the systemd integration and appreciate the forward thinking
> about that).
Thank you David.
diff mbox series

Patch

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 48b1c1ef1ebd..1edb558125b0 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -83,6 +83,20 @@  struct devlink_port_pci_vf_attrs {
 	u8 external:1;
 };
 
+/**
+ * struct devlink_port_pci_sf_attrs - devlink port's PCI SF attributes
+ * @controller: Associated controller number
+ * @pf: Associated PCI PF number for this port.
+ * @sf: Associated PCI SF for of the PCI PF for this port.
+ * @external: when set, indicates if a port is for an external controller
+ */
+struct devlink_port_pci_sf_attrs {
+	u32 controller;
+	u16 pf;
+	u32 sf;
+	u8 external:1;
+};
+
 /**
  * struct devlink_port_attrs - devlink port object
  * @flavour: flavour of the port
@@ -104,6 +118,7 @@  struct devlink_port_attrs {
 		struct devlink_port_phys_attrs phys;
 		struct devlink_port_pci_pf_attrs pci_pf;
 		struct devlink_port_pci_vf_attrs pci_vf;
+		struct devlink_port_pci_sf_attrs pci_sf;
 	};
 };
 
@@ -1230,6 +1245,8 @@  void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 contro
 				   u16 pf, bool external);
 void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller,
 				   u16 pf, u16 vf, bool external);
+void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 controller,
+				   u16 pf, u32 sf, bool external);
 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
 			u32 size, u16 ingress_pools_count,
 			u16 egress_pools_count, u16 ingress_tc_count,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 631f5bdf1707..09c41b9ce407 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -195,6 +195,11 @@  enum devlink_port_flavour {
 				      * port that faces the PCI VF.
 				      */
 	DEVLINK_PORT_FLAVOUR_VIRTUAL, /* Any virtual port facing the user. */
+
+	DEVLINK_PORT_FLAVOUR_PCI_SF, /* Represents eswitch port
+				      * for the PCI SF. It is an internal
+				      * port that faces the PCI SF.
+				      */
 };
 
 enum devlink_param_cmode {
@@ -462,6 +467,8 @@  enum devlink_attr {
 
 	DEVLINK_ATTR_PORT_EXTERNAL,		/* u8 */
 	DEVLINK_ATTR_PORT_CONTROLLER_NUMBER,	/* u32 */
+
+	DEVLINK_ATTR_PORT_PCI_SF_NUMBER,	/* u32 */
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index e5b71f3c2d4d..fada660fd515 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -539,6 +539,15 @@  static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 		if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_vf.external))
 			return -EMSGSIZE;
 		break;
+	case DEVLINK_PORT_FLAVOUR_PCI_SF:
+		if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER,
+				attrs->pci_sf.controller) ||
+		    nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, attrs->pci_sf.pf) ||
+		    nla_put_u32(msg, DEVLINK_ATTR_PORT_PCI_SF_NUMBER, attrs->pci_sf.sf))
+			return -EMSGSIZE;
+		if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_vf.external))
+			return -EMSGSIZE;
+		break;
 	case DEVLINK_PORT_FLAVOUR_PHYSICAL:
 	case DEVLINK_PORT_FLAVOUR_CPU:
 	case DEVLINK_PORT_FLAVOUR_DSA:
@@ -7808,6 +7817,31 @@  void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 contro
 }
 EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_vf_set);
 
+/**
+ *	devlink_port_attrs_pci_sf_set - Set PCI SF port attributes
+ *
+ *	@devlink_port: devlink port
+ *	@controller: associated controller number for the devlink port instance
+ *	@pf: associated PF for the devlink port instance
+ *	@sf: associated SF of a PF for the devlink port instance
+ *	@external: indicates if the port is for an external controller
+ */
+void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 controller,
+				   u16 pf, u32 sf, bool external)
+{
+	struct devlink_port_attrs *attrs = &devlink_port->attrs;
+	int ret;
+
+	ret = __devlink_port_attrs_set(devlink_port, DEVLINK_PORT_FLAVOUR_PCI_SF);
+	if (ret)
+		return;
+	attrs->pci_sf.controller = controller;
+	attrs->pci_sf.pf = pf;
+	attrs->pci_sf.sf = sf;
+	attrs->pci_sf.external = external;
+}
+EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_sf_set);
+
 static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
 					     char *name, size_t len)
 {
@@ -7855,6 +7889,9 @@  static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
 		n = snprintf(name, len, "pf%uvf%u",
 			     attrs->pci_vf.pf, attrs->pci_vf.vf);
 		break;
+	case DEVLINK_PORT_FLAVOUR_PCI_SF:
+		n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, attrs->pci_sf.sf);
+		break;
 	}
 
 	if (n >= len)