diff mbox

[net-next-2.6,V6,1/2] Add netlink support for virtual port management (was iovnl)

Message ID 20100513201720.25579.51230.stgit@savbu-pc100.cisco.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Scott Feldman May 13, 2010, 8:17 p.m. UTC
From: Scott Feldman <scofeldm@cisco.com>

Add new netdev ops ndo_{set|get}_vf_port to allow setting of
port-profile on a netdev interface.  Extends netlink socket RTM_SETLINK/
RTM_GETLINK with new sub cmd called IFLA_VF_PORT (added to end of
IFLA_cmd list).

A port-profile is used to configure/enable the external switch virtual port
backing the netdev interface, not to configure the host-facing side of the
netdev.  A port-profile is an identifier known to the switch.  How port-
profiles are installed on the switch or how available port-profiles are
made know to the host is outside the scope of this patch.

There are two types of port-profiles specs in the netlink msg.  The first spec
is for 802.1Qbg (pre-)standard, VDP protocol.  The second spec is for devices
that run a similar protocol as VDP but in firmware, thus hiding the protocol
details.  In either case, the specs have much in common and makes sense to
define the netlink msg as the union of the two specs.  For example, both specs
have a notition of associating/deassociating a port-profile.  And both specs
require some information from the hypervisor manager, such as client port
instance ID.

The general flow is the port-profile is applied to a host netdev interface
using RTM_SETLINK, the receiver of the RTM_SETLINK msg communicates with the
switch, and the switch virtual port backing the host netdev interface is
configured/enabled based on the settings defined by the port-profile.  What
those settings comprise, and how those settings are managed is again
outside the scope of this patch, since this patch only deals with the
first step in the flow.

There is a RTM_GETLINK cmd to to return port-profile setting of an
interface and to also return the status of the last port-profile
association.

IFLA_VF_PORT is modeled after the existing IFLA_VF_* cmd where a
VF number is passed in to identify the virtual function (VF) of an SR-IOV-
capable device.  In this case, the target of IFLA_VF_PORT msg is the
netdev physical function (PF) device.  The PF will apply the port-profile
to the VF.  IFLA_VF_PORT can also be used for devices that don't
adhere to SR-IOV and can apply the port-profile directly to the netdev
target.  In this case, the VF number is ignored.

Passing in a NULL port-profile is used to delete the port-profile association.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu<roprabhu@cisco.com>
---
 include/linux/if_link.h   |   52 +++++++++++++++++++
 include/linux/netdevice.h |   10 ++++
 net/core/rtnetlink.c      |  122 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 183 insertions(+), 1 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Chris Wright May 13, 2010, 8:28 p.m. UTC | #1
* Scott Feldman (scofeldm@cisco.com) wrote:
> From: Scott Feldman <scofeldm@cisco.com>
> 
> Add new netdev ops ndo_{set|get}_vf_port to allow setting of
> port-profile on a netdev interface.  Extends netlink socket RTM_SETLINK/
> RTM_GETLINK with new sub cmd called IFLA_VF_PORT (added to end of
> IFLA_cmd list).
> 
> A port-profile is used to configure/enable the external switch virtual port
> backing the netdev interface, not to configure the host-facing side of the
> netdev.  A port-profile is an identifier known to the switch.  How port-
> profiles are installed on the switch or how available port-profiles are
> made know to the host is outside the scope of this patch.
> 
> There are two types of port-profiles specs in the netlink msg.  The first spec
> is for 802.1Qbg (pre-)standard, VDP protocol.  The second spec is for devices
> that run a similar protocol as VDP but in firmware, thus hiding the protocol
> details.  In either case, the specs have much in common and makes sense to
> define the netlink msg as the union of the two specs.  For example, both specs
> have a notition of associating/deassociating a port-profile.  And both specs
> require some information from the hypervisor manager, such as client port
> instance ID.
> 
> The general flow is the port-profile is applied to a host netdev interface
> using RTM_SETLINK, the receiver of the RTM_SETLINK msg communicates with the
> switch, and the switch virtual port backing the host netdev interface is
> configured/enabled based on the settings defined by the port-profile.  What
> those settings comprise, and how those settings are managed is again
> outside the scope of this patch, since this patch only deals with the
> first step in the flow.
> 
> There is a RTM_GETLINK cmd to to return port-profile setting of an
> interface and to also return the status of the last port-profile
> association.
> 
> IFLA_VF_PORT is modeled after the existing IFLA_VF_* cmd where a
> VF number is passed in to identify the virtual function (VF) of an SR-IOV-
> capable device.  In this case, the target of IFLA_VF_PORT msg is the
> netdev physical function (PF) device.  The PF will apply the port-profile
> to the VF.  IFLA_VF_PORT can also be used for devices that don't
> adhere to SR-IOV and can apply the port-profile directly to the netdev
> target.  In this case, the VF number is ignored.
> 
> Passing in a NULL port-profile is used to delete the port-profile association.
> 
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Scott Feldman <scofeldm@cisco.com>
> Signed-off-by: Roopa Prabhu<roprabhu@cisco.com>

Nice, this looks good to me.

Acked-by: Chris Wright <chrisw@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy May 13, 2010, 8:40 p.m. UTC | #2
Scott Feldman wrote:
> +struct ifla_vf_port_vsi {
> +	__u8 vsi_mgr_id;
> +	__u8 vsi_type_id[3];
> +	__u8 vsi_type_version;
> +	__u8 pad[3];
> +};

Where is this actually used? The only use I could find is in the
size calculation.

> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 23a71cb..de14d36 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> +static int rtnl_vf_port_fill_nest(struct sk_buff *skb, struct net_device *dev,
> +	int vf)

Please keep the style used in that file consistent and align arguments
to the beginning of the opening '('.

> +{
> +	struct nlattr *data;
> +	int err;
> +
> +	data = nla_nest_start(skb, IFLA_VF_PORT);
> +	if (!data)
> +		return -EMSGSIZE;
> +
> +	if (vf >= 0)
> +		nla_put_u32(skb, IFLA_VF_PORT_VF, vf);
> +
> +	err = dev->netdev_ops->ndo_get_vf_port(dev, vf, skb);
> +	if (err == -EMSGSIZE) {
> +		nla_nest_cancel(skb, data);
> +		return -EMSGSIZE;
> +	} else if (err) {
> +		nla_nest_cancel(skb, data);
> +		return 0;

Why is the error not returned in this case?

> +	}
> +
> +	nla_nest_end(skb, data);
> +
> +	return 0;
> +}
> +
> +static int rtnl_vf_port_fill(struct sk_buff *skb, struct net_device *dev)
> +{
> +	int num_vf;
> +	int err;
> +
> +	if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent)
> +		return 0;
> +
> +	num_vf = dev_num_vf(dev->dev.parent);
> +
> +	if (num_vf) {
> +		int i;
> +
> +		for (i = 0; i < num_vf; i++) {
> +			err = rtnl_vf_port_fill_nest(skb, dev, i);
> +			if (err)
> +				goto nla_put_failure;
> +		}
> +	} else  {
> +		err = rtnl_vf_port_fill_nest(skb, dev, -1);

What does -1 mean?

> +		if (err)
> +			goto nla_put_failure;
> +	}
> +
> +	return 0;
> +
> +nla_put_failure:
> +	return -EMSGSIZE;
> +}
> +
>  static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
>  			    int type, u32 pid, u32 seq, u32 change,
>  			    unsigned int flags)
> @@ -747,17 +825,23 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
>  		goto nla_put_failure;
>  	copy_rtnl_link_stats64(nla_data(attr), stats);
>  
> +	if (dev->dev.parent)
> +		NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));

Should this attribute really be included even if the number is zero?

> +
>  	if (dev->netdev_ops->ndo_get_vf_config && dev->dev.parent) {
>  		int i;
>  		struct ifla_vf_info ivi;
>  
> -		NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));
>  		for (i = 0; i < dev_num_vf(dev->dev.parent); i++) {
>  			if (dev->netdev_ops->ndo_get_vf_config(dev, i, &ivi))
>  				break;
>  			NLA_PUT(skb, IFLA_VFINFO, sizeof(ivi), &ivi);
>  		}
>  	}
> +
> +	if (rtnl_vf_port_fill(skb, dev))
> +		goto nla_put_failure;
> +
>  	if (dev->rtnl_link_ops) {
>  		if (rtnl_link_fill(skb, dev) < 0)
>  			goto nla_put_failure;
> @@ -824,6 +908,7 @@ const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>  				    .len = sizeof(struct ifla_vf_vlan) },
>  	[IFLA_VF_TX_RATE]	= { .type = NLA_BINARY,
>  				    .len = sizeof(struct ifla_vf_tx_rate) },
> +	[IFLA_VF_PORT]		= { .type = NLA_NESTED },
>  };
>  EXPORT_SYMBOL(ifla_policy);
>  
> @@ -832,6 +917,20 @@ static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
>  	[IFLA_INFO_DATA]	= { .type = NLA_NESTED },
>  };
>  
> +static const struct nla_policy ifla_vf_port_policy[IFLA_VF_PORT_MAX+1] = {
> +	[IFLA_VF_PORT_VF]		= { .type = NLA_U32 },
> +	[IFLA_VF_PORT_PROFILE]		= { .type = NLA_STRING,
> +				.len = VF_PORT_PROFILE_MAX },

This is oddly indented, please align .len to .type as in the
existing attributes.

> +	[IFLA_VF_PORT_VSI_TYPE]		= { .type = NLA_BINARY,
> +				.len = sizeof(struct ifla_vf_port_vsi)},
> +	[IFLA_VF_PORT_INSTANCE_UUID]	= { .type = NLA_BINARY,
> +				.len = VF_PORT_UUID_MAX },
> +	[IFLA_VF_PORT_HOST_UUID]	= { .type = NLA_STRING,
> +				.len = VF_PORT_UUID_MAX },
> +	[IFLA_VF_PORT_REQUEST]		= { .type = NLA_U8, },
> +	[IFLA_VF_PORT_RESPONSE]		= { .type = NLA_U16, },
> +};
> +
>  struct net *rtnl_link_get_net(struct net *src_net, struct nlattr *tb[])
>  {
>  	struct net *net;
> @@ -1028,6 +1127,27 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
>  	}
>  	err = 0;
>  
> +	if (tb[IFLA_VF_PORT]) {
> +		struct nlattr *vf_port[IFLA_VF_PORT_MAX+1];
> +		int vf = -1;
> +
> +		err = nla_parse_nested(vf_port, IFLA_VF_PORT_MAX,
> +			tb[IFLA_VF_PORT], ifla_vf_port_policy);
> +		if (err < 0)
> +			goto errout;
> +
> +		if (vf_port[IFLA_VF_PORT_VF])
> +			vf = nla_get_u32(vf_port[IFLA_VF_PORT_VF]);
> +		err = -EOPNOTSUPP;
> +		if (ops->ndo_set_vf_port)
> +			err = ops->ndo_set_vf_port(dev, vf, vf_port);

This appears to be addressing a single VF to issue commands.
I already explained this during the last set of VF patches,
messages are supposed to by symetrical, since you're dumping
state for all existing VFs, you also need to accept configuration
for multiple VFs. Basically, the kernel must be able to receive
a message it created during a dump and fully recreate the state.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Wright May 13, 2010, 8:46 p.m. UTC | #3
* Patrick McHardy (kaber@trash.net) wrote:
> > +	} else  {
> > +		err = rtnl_vf_port_fill_nest(skb, dev, -1);
> 
> What does -1 mean?

It means no VFs.  Could be made a macro/enum constant

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy May 13, 2010, 8:49 p.m. UTC | #4
Chris Wright wrote:
> * Patrick McHardy (kaber@trash.net) wrote:
>>> +	} else  {
>>> +		err = rtnl_vf_port_fill_nest(skb, dev, -1);
>> What does -1 mean?
> 
> It means no VFs.  Could be made a macro/enum constant

Why call rtnl_vg_port_fill_nest at all in that case? It even
calls the ndo_get_vf_port() callback.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Wright May 13, 2010, 9:08 p.m. UTC | #5
* Patrick McHardy (kaber@trash.net) wrote:
> Chris Wright wrote:
> > * Patrick McHardy (kaber@trash.net) wrote:
> >>> +	} else  {
> >>> +		err = rtnl_vf_port_fill_nest(skb, dev, -1);
> >> What does -1 mean?
> > 
> > It means no VFs.  Could be made a macro/enum constant
> 
> Why call rtnl_vg_port_fill_nest at all in that case? It even
> calls the ndo_get_vf_port() callback.

For the case where port profile is set on net dev that does not
have VFs (e.g. the enic case in 2/2).

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy May 13, 2010, 9:11 p.m. UTC | #6
Chris Wright wrote:
> * Patrick McHardy (kaber@trash.net) wrote:
>> Chris Wright wrote:
>>> * Patrick McHardy (kaber@trash.net) wrote:
>>>>> +	} else  {
>>>>> +		err = rtnl_vf_port_fill_nest(skb, dev, -1);
>>>> What does -1 mean?
>>> It means no VFs.  Could be made a macro/enum constant
>> Why call rtnl_vg_port_fill_nest at all in that case? It even
>> calls the ndo_get_vf_port() callback.
> 
> For the case where port profile is set on net dev that does not
> have VFs (e.g. the enic case in 2/2).

Thanks for the explanation. I guess a enum constant would be nice
to have. But the bigger problem is the asymetrical message
parsing/construction.

BTW:

> +enum {
> +	VF_PORT_REQUEST_PREASSOCIATE = 0,
> +	VF_PORT_REQUEST_PREASSOCIATE_RR,
> +	VF_PORT_REQUEST_ASSOCIATE,
> +	VF_PORT_REQUEST_DISASSOCIATE,
> +};

Do multiple of these commands have to be issued in order to
reach "associated" state? That also wouldn't fit into the
rtnetlink design, which contains state, not commands.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Wright May 13, 2010, 9:18 p.m. UTC | #7
* Patrick McHardy (kaber@trash.net) wrote:
> Chris Wright wrote:
> > * Patrick McHardy (kaber@trash.net) wrote:
> >> Chris Wright wrote:
> >>> * Patrick McHardy (kaber@trash.net) wrote:
> >>>>> +	} else  {
> >>>>> +		err = rtnl_vf_port_fill_nest(skb, dev, -1);
> >>>> What does -1 mean?
> >>> It means no VFs.  Could be made a macro/enum constant
> >> Why call rtnl_vg_port_fill_nest at all in that case? It even
> >> calls the ndo_get_vf_port() callback.
> > 
> > For the case where port profile is set on net dev that does not
> > have VFs (e.g. the enic case in 2/2).
> 
> Thanks for the explanation. I guess a enum constant would be nice
> to have. But the bigger problem is the asymetrical message
> parsing/construction.

Yeah, what would you like to do there?  I think we have to keep the
existing, just break out symmtetic set/get?

> BTW:
> 
> > +enum {
> > +	VF_PORT_REQUEST_PREASSOCIATE = 0,
> > +	VF_PORT_REQUEST_PREASSOCIATE_RR,
> > +	VF_PORT_REQUEST_ASSOCIATE,
> > +	VF_PORT_REQUEST_DISASSOCIATE,
> > +};
> 
> Do multiple of these commands have to be issued in order to
> reach "associated" state? That also wouldn't fit into the
> rtnetlink design, which contains state, not commands.

It's optional.  At the very least, you need 1 associate/disassociate for
each logical link up/down.

For VM migration or (perhaps failover modes) you can optionally issue a
preassociate.  Preassociate has 2 flavors.  One which is purely advisory,
another which will reserve resources on the switch.  These all reprsent
state transitions in the switch, but only associate should allow final
logical link up and traffic to flow.

thanks,
-chris

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy May 13, 2010, 9:23 p.m. UTC | #8
Chris Wright wrote:
> * Patrick McHardy (kaber@trash.net) wrote:
>> Chris Wright wrote:
>>> * Patrick McHardy (kaber@trash.net) wrote:
>>>> Chris Wright wrote:
>>>>> * Patrick McHardy (kaber@trash.net) wrote:
>>>>>>> +	} else  {
>>>>>>> +		err = rtnl_vf_port_fill_nest(skb, dev, -1);
>>>>>> What does -1 mean?
>>>>> It means no VFs.  Could be made a macro/enum constant
>>>> Why call rtnl_vg_port_fill_nest at all in that case? It even
>>>> calls the ndo_get_vf_port() callback.
>>> For the case where port profile is set on net dev that does not
>>> have VFs (e.g. the enic case in 2/2).
>> Thanks for the explanation. I guess a enum constant would be nice
>> to have. But the bigger problem is the asymetrical message
>> parsing/construction.
> 
> Yeah, what would you like to do there?  I think we have to keep the
> existing, just break out symmtetic set/get?

Sure, that would be fine. I'll have a closer look at the exact
message layout tommorrow, its getting late here.

>> BTW:
>>
>>> +enum {
>>> +	VF_PORT_REQUEST_PREASSOCIATE = 0,
>>> +	VF_PORT_REQUEST_PREASSOCIATE_RR,
>>> +	VF_PORT_REQUEST_ASSOCIATE,
>>> +	VF_PORT_REQUEST_DISASSOCIATE,
>>> +};
>> Do multiple of these commands have to be issued in order to
>> reach "associated" state? That also wouldn't fit into the
>> rtnetlink design, which contains state, not commands.
> 
> It's optional.  At the very least, you need 1 associate/disassociate for
> each logical link up/down.
> 
> For VM migration or (perhaps failover modes) you can optionally issue a
> preassociate.  Preassociate has 2 flavors.  One which is purely advisory,
> another which will reserve resources on the switch.  These all reprsent
> state transitions in the switch, but only associate should allow final
> logical link up and traffic to flow.

I see, thanks. That seems fine.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Feldman May 13, 2010, 9:30 p.m. UTC | #9
On 5/13/10 1:40 PM, "Patrick McHardy" <kaber@trash.net> wrote:

> Scott Feldman wrote:
>> +struct ifla_vf_port_vsi {
>> + __u8 vsi_mgr_id;
>> + __u8 vsi_type_id[3];
>> + __u8 vsi_type_version;
>> + __u8 pad[3];
>> +};
> 
> Where is this actually used? The only use I could find is in the
> size calculation.

This is provisioned for VDP use.  The enic implementation (patch 2/2)
doesn't use these members.
 
> Please keep the style used in that file consistent and align arguments
> to the beginning of the opening '('.

I'll fix.
 
>> +{
>> + struct nlattr *data;
>> + int err;
>> +
>> + data = nla_nest_start(skb, IFLA_VF_PORT);
>> + if (!data)
>> +  return -EMSGSIZE;
>> +
>> + if (vf >= 0)
>> +  nla_put_u32(skb, IFLA_VF_PORT_VF, vf);
>> +
>> + err = dev->netdev_ops->ndo_get_vf_port(dev, vf, skb);
>> + if (err == -EMSGSIZE) {
>> +  nla_nest_cancel(skb, data);
>> +  return -EMSGSIZE;
>> + } else if (err) {
>> +  nla_nest_cancel(skb, data);
>> +  return 0;
> 
> Why is the error not returned in this case?

I was think the netdev could fail the call if the operation wasn't
supported.  I better choice would be to not set the netdev->op in the first
place.  Let me fix this.

>> +  if (err)
>> +   goto nla_put_failure;
>> + }
>> +
>> + return 0;
>> +
>> +nla_put_failure:
>> + return -EMSGSIZE;
>> +}
>> +
>>  static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
>>    int type, u32 pid, u32 seq, u32 change,
>>    unsigned int flags)
>> @@ -747,17 +825,23 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct
>> net_device *dev,
>> goto nla_put_failure;
>> copy_rtnl_link_stats64(nla_data(attr), stats);
>>  
>> + if (dev->dev.parent)
>> +  NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));
> 
> Should this attribute really be included even if the number is zero?

The previous code would write zero also.  I moved it out of the
get_vf_config check so it could be used for get_vf_port as well.
 
> This is oddly indented, please align .len to .type as in the
> existing attributes.

I'll fix, but bumping into 80 char issues...
 
>> + [IFLA_VF_PORT_VSI_TYPE]  = { .type = NLA_BINARY,
>> +    .len = sizeof(struct ifla_vf_port_vsi)},
>> + [IFLA_VF_PORT_INSTANCE_UUID] = { .type = NLA_BINARY,
>> +    .len = VF_PORT_UUID_MAX },
>> + [IFLA_VF_PORT_HOST_UUID] = { .type = NLA_STRING,
>> +    .len = VF_PORT_UUID_MAX },
>> + [IFLA_VF_PORT_REQUEST]  = { .type = NLA_U8, },
>> + [IFLA_VF_PORT_RESPONSE]  = { .type = NLA_U16, },
>> +};
>> +
>>  struct net *rtnl_link_get_net(struct net *src_net, struct nlattr *tb[])
>>  {
>> struct net *net;
>> @@ -1028,6 +1127,27 @@ static int do_setlink(struct net_device *dev, struct
>> ifinfomsg *ifm,
>> }
>> err = 0;
>>  
>> + if (tb[IFLA_VF_PORT]) {
>> +  struct nlattr *vf_port[IFLA_VF_PORT_MAX+1];
>> +  int vf = -1;
>> +
>> +  err = nla_parse_nested(vf_port, IFLA_VF_PORT_MAX,
>> +   tb[IFLA_VF_PORT], ifla_vf_port_policy);
>> +  if (err < 0)
>> +   goto errout;
>> +
>> +  if (vf_port[IFLA_VF_PORT_VF])
>> +   vf = nla_get_u32(vf_port[IFLA_VF_PORT_VF]);
>> +  err = -EOPNOTSUPP;
>> +  if (ops->ndo_set_vf_port)
>> +   err = ops->ndo_set_vf_port(dev, vf, vf_port);
> 
> This appears to be addressing a single VF to issue commands.
> I already explained this during the last set of VF patches,
> messages are supposed to by symetrical, since you're dumping
> state for all existing VFs, you also need to accept configuration
> for multiple VFs. Basically, the kernel must be able to receive
> a message it created during a dump and fully recreate the state.

This was modeled same as existing IFLA_VF_ cmd where single VF is addressed
on set, but all VFs for PF are dumped on get.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann May 14, 2010, 7:17 a.m. UTC | #10
On Thursday 13 May 2010, Patrick McHardy wrote:
> > +enum {
> > +     VF_PORT_REQUEST_PREASSOCIATE = 0,
> > +     VF_PORT_REQUEST_PREASSOCIATE_RR,
> > +     VF_PORT_REQUEST_ASSOCIATE,
> > +     VF_PORT_REQUEST_DISASSOCIATE,
> > +};
> 
> Do multiple of these commands have to be issued in order to
> reach "associated" state? That also wouldn't fit into the
> rtnetlink design, which contains state, not commands.

In 802.1Qbg (see http://www.ieee802.org/1/files/public/docs2010/bg-joint-evb-0410v1.pdf)
they are defined both as state and as commands. I suggested
using the same definition so that we can use the exact values
when forming or receiving a TLV for the VDP protocol. We can change
freely between the first three during guest migration, but the port
(identified by the UUID) can only be in associate state on one
machine, and the switches would use that information to redirect
the data flow to the target of the migration.

The disassociate state is a bit strange in netlink because it
essentially means that the other information is now invalid,
but I'm not sure if there is a good alternative. If the switch
tells us that it is setting the link into disassociate state,
that should still be visible in netlink.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy May 14, 2010, 10:47 a.m. UTC | #11
Scott Feldman wrote:
> On 5/13/10 1:40 PM, "Patrick McHardy" <kaber@trash.net> wrote:
> 
>>> +  if (vf_port[IFLA_VF_PORT_VF])
>>> +   vf = nla_get_u32(vf_port[IFLA_VF_PORT_VF]);
>>> +  err = -EOPNOTSUPP;
>>> +  if (ops->ndo_set_vf_port)
>>> +   err = ops->ndo_set_vf_port(dev, vf, vf_port);
>> This appears to be addressing a single VF to issue commands.
>> I already explained this during the last set of VF patches,
>> messages are supposed to by symetrical, since you're dumping
>> state for all existing VFs, you also need to accept configuration
>> for multiple VFs. Basically, the kernel must be able to receive
>> a message it created during a dump and fully recreate the state.
> 
> This was modeled same as existing IFLA_VF_ cmd where single VF is addressed
> on set, but all VFs for PF are dumped on get.

Yes, that one should have been done differently as well,
unfortunately my comments were ignored. So far rtnetlink
had two properties that are now broken:

- messages sent by the kernel could be sent back to the
  kernel to re-create an object in the same state

- the same parsing functions could be used in userspace for
  messages sent by the kernel and netlink error messages,
  which contain the original userspace message

I know at least one program I've written a few years ago which
relies on the second property. Anyways, this is easily fixable
by encapsulating all top-level VF attributes in a list and
invoking the ndo_set_vf_port() callback for each VF configuration.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index cfd420b..d93a4a5 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -116,6 +116,7 @@  enum {
 	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
 	IFLA_VFINFO,
 	IFLA_STATS64,
+	IFLA_VF_PORT,
 	__IFLA_MAX
 };
 
@@ -259,4 +260,55 @@  struct ifla_vf_info {
 	__u32 qos;
 	__u32 tx_rate;
 };
+
+/* VF Port management section */
+
+enum {
+	IFLA_VF_PORT_UNSPEC,
+	IFLA_VF_PORT_VF,		/* __u32 */
+	IFLA_VF_PORT_PROFILE,		/* string */
+	IFLA_VF_PORT_VSI_TYPE,		/* 802.1Qbg (pre-)standard VDP */
+	IFLA_VF_PORT_INSTANCE_UUID,	/* binary UUID */
+	IFLA_VF_PORT_HOST_UUID,		/* binary UUID */
+	IFLA_VF_PORT_REQUEST,		/* __u8 */
+	IFLA_VF_PORT_RESPONSE,		/* __u16, output only */
+	__IFLA_VF_PORT_MAX,
+};
+
+#define IFLA_VF_PORT_MAX (__IFLA_VF_PORT_MAX - 1)
+
+#define VF_PORT_PROFILE_MAX	40
+#define VF_PORT_UUID_MAX	16
+
+enum {
+	VF_PORT_REQUEST_PREASSOCIATE = 0,
+	VF_PORT_REQUEST_PREASSOCIATE_RR,
+	VF_PORT_REQUEST_ASSOCIATE,
+	VF_PORT_REQUEST_DISASSOCIATE,
+};
+
+enum {
+	VF_PORT_VDP_RESPONSE_SUCCESS = 0,
+	VF_PORT_VDP_RESPONSE_INVALID_FORMAT,
+	VF_PORT_VDP_RESPONSE_INSUFFICIENT_RESOURCES,
+	VF_PORT_VDP_RESPONSE_UNUSED_VTID,
+	VF_PORT_VDP_RESPONSE_VTID_VIOLATION,
+	VF_PORT_VDP_RESPONSE_VTID_VERSION_VIOALTION,
+	VF_PORT_VDP_RESPONSE_OUT_OF_SYNC,
+	/* 0x08-0xFF reserved for future VDP use */
+	VF_PORT_PROFILE_RESPONSE_SUCCESS = 0x100,
+	VF_PORT_PROFILE_RESPONSE_INPROGRESS,
+	VF_PORT_PROFILE_RESPONSE_INVALID,
+	VF_PORT_PROFILE_RESPONSE_BADSTATE,
+	VF_PORT_PROFILE_RESPONSE_INSUFFICIENT_RESOURCES,
+	VF_PORT_PROFILE_RESPONSE_ERROR,
+};
+
+struct ifla_vf_port_vsi {
+	__u8 vsi_mgr_id;
+	__u8 vsi_type_id[3];
+	__u8 vsi_type_version;
+	__u8 pad[3];
+};
+
 #endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 69022d4..c2ba8d4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -686,6 +686,10 @@  struct netdev_rx_queue {
  * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
  * int (*ndo_get_vf_config)(struct net_device *dev,
  *			    int vf, struct ifla_vf_info *ivf);
+ * int (*ndo_set_vf_port)(struct net_device *dev, int vf,
+ *			  struct nlattr *vf_port[]);
+ * int (*ndo_get_vf_port)(struct net_device *dev, int vf,
+ *			  struct sk_buff *skb);
  */
 #define HAVE_NET_DEVICE_OPS
 struct net_device_ops {
@@ -735,6 +739,12 @@  struct net_device_ops {
 	int			(*ndo_get_vf_config)(struct net_device *dev,
 						     int vf,
 						     struct ifla_vf_info *ivf);
+	int			(*ndo_set_vf_port)(struct net_device *dev,
+						   int vf,
+						   struct nlattr *vf_port[]);
+	int			(*ndo_get_vf_port)(struct net_device *dev,
+						   int vf,
+						   struct sk_buff *skb);
 #if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
 	int			(*ndo_fcoe_enable)(struct net_device *dev);
 	int			(*ndo_fcoe_disable)(struct net_device *dev);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 23a71cb..de14d36 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -653,6 +653,26 @@  static inline int rtnl_vfinfo_size(const struct net_device *dev)
 		return 0;
 }
 
+static size_t rtnl_vf_port_size(const struct net_device *dev)
+{
+	size_t vf_port_size = nla_total_size(sizeof(struct nlattr))
+						     /* VF_PORT_VF */
+		+ nla_total_size(VF_PORT_PROFILE_MAX)/* VF_PORT_PROFILE */
+		+ nla_total_size(sizeof(struct ifla_vf_port_vsi))
+						     /* VF_PORT_VSI_TYPE */
+		+ nla_total_size(VF_PORT_UUID_MAX)   /* VF_PORT_VSI_INSTANCE */
+		+ nla_total_size(VF_PORT_UUID_MAX)   /* VF_PORT_HOST_UUID */
+		+ nla_total_size(1)		     /* VF_PROT_VDP_REQUEST */
+		+ nla_total_size(2);		     /* VF_PORT_VDP_RESPONSE */
+
+	if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent)
+		return 0;
+	if (dev_num_vf(dev->dev.parent))
+		return vf_port_size * dev_num_vf(dev->dev.parent);
+	else
+		return vf_port_size;
+}
+
 static inline size_t if_nlmsg_size(const struct net_device *dev)
 {
 	return NLMSG_ALIGN(sizeof(struct ifinfomsg))
@@ -673,9 +693,67 @@  static inline size_t if_nlmsg_size(const struct net_device *dev)
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_NUM_VF */
 	       + nla_total_size(rtnl_vfinfo_size(dev)) /* IFLA_VFINFO */
+	       + rtnl_vf_port_size(dev) /* IFLA_VF_PORT */
 	       + rtnl_link_get_size(dev); /* IFLA_LINKINFO */
 }
 
+static int rtnl_vf_port_fill_nest(struct sk_buff *skb, struct net_device *dev,
+	int vf)
+{
+	struct nlattr *data;
+	int err;
+
+	data = nla_nest_start(skb, IFLA_VF_PORT);
+	if (!data)
+		return -EMSGSIZE;
+
+	if (vf >= 0)
+		nla_put_u32(skb, IFLA_VF_PORT_VF, vf);
+
+	err = dev->netdev_ops->ndo_get_vf_port(dev, vf, skb);
+	if (err == -EMSGSIZE) {
+		nla_nest_cancel(skb, data);
+		return -EMSGSIZE;
+	} else if (err) {
+		nla_nest_cancel(skb, data);
+		return 0;
+	}
+
+	nla_nest_end(skb, data);
+
+	return 0;
+}
+
+static int rtnl_vf_port_fill(struct sk_buff *skb, struct net_device *dev)
+{
+	int num_vf;
+	int err;
+
+	if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent)
+		return 0;
+
+	num_vf = dev_num_vf(dev->dev.parent);
+
+	if (num_vf) {
+		int i;
+
+		for (i = 0; i < num_vf; i++) {
+			err = rtnl_vf_port_fill_nest(skb, dev, i);
+			if (err)
+				goto nla_put_failure;
+		}
+	} else  {
+		err = rtnl_vf_port_fill_nest(skb, dev, -1);
+		if (err)
+			goto nla_put_failure;
+	}
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
 static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			    int type, u32 pid, u32 seq, u32 change,
 			    unsigned int flags)
@@ -747,17 +825,23 @@  static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 		goto nla_put_failure;
 	copy_rtnl_link_stats64(nla_data(attr), stats);
 
+	if (dev->dev.parent)
+		NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));
+
 	if (dev->netdev_ops->ndo_get_vf_config && dev->dev.parent) {
 		int i;
 		struct ifla_vf_info ivi;
 
-		NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));
 		for (i = 0; i < dev_num_vf(dev->dev.parent); i++) {
 			if (dev->netdev_ops->ndo_get_vf_config(dev, i, &ivi))
 				break;
 			NLA_PUT(skb, IFLA_VFINFO, sizeof(ivi), &ivi);
 		}
 	}
+
+	if (rtnl_vf_port_fill(skb, dev))
+		goto nla_put_failure;
+
 	if (dev->rtnl_link_ops) {
 		if (rtnl_link_fill(skb, dev) < 0)
 			goto nla_put_failure;
@@ -824,6 +908,7 @@  const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 				    .len = sizeof(struct ifla_vf_vlan) },
 	[IFLA_VF_TX_RATE]	= { .type = NLA_BINARY,
 				    .len = sizeof(struct ifla_vf_tx_rate) },
+	[IFLA_VF_PORT]		= { .type = NLA_NESTED },
 };
 EXPORT_SYMBOL(ifla_policy);
 
@@ -832,6 +917,20 @@  static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
 	[IFLA_INFO_DATA]	= { .type = NLA_NESTED },
 };
 
+static const struct nla_policy ifla_vf_port_policy[IFLA_VF_PORT_MAX+1] = {
+	[IFLA_VF_PORT_VF]		= { .type = NLA_U32 },
+	[IFLA_VF_PORT_PROFILE]		= { .type = NLA_STRING,
+				.len = VF_PORT_PROFILE_MAX },
+	[IFLA_VF_PORT_VSI_TYPE]		= { .type = NLA_BINARY,
+				.len = sizeof(struct ifla_vf_port_vsi)},
+	[IFLA_VF_PORT_INSTANCE_UUID]	= { .type = NLA_BINARY,
+				.len = VF_PORT_UUID_MAX },
+	[IFLA_VF_PORT_HOST_UUID]	= { .type = NLA_STRING,
+				.len = VF_PORT_UUID_MAX },
+	[IFLA_VF_PORT_REQUEST]		= { .type = NLA_U8, },
+	[IFLA_VF_PORT_RESPONSE]		= { .type = NLA_U16, },
+};
+
 struct net *rtnl_link_get_net(struct net *src_net, struct nlattr *tb[])
 {
 	struct net *net;
@@ -1028,6 +1127,27 @@  static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
 	}
 	err = 0;
 
+	if (tb[IFLA_VF_PORT]) {
+		struct nlattr *vf_port[IFLA_VF_PORT_MAX+1];
+		int vf = -1;
+
+		err = nla_parse_nested(vf_port, IFLA_VF_PORT_MAX,
+			tb[IFLA_VF_PORT], ifla_vf_port_policy);
+		if (err < 0)
+			goto errout;
+
+		if (vf_port[IFLA_VF_PORT_VF])
+			vf = nla_get_u32(vf_port[IFLA_VF_PORT_VF]);
+
+		err = -EOPNOTSUPP;
+		if (ops->ndo_set_vf_port)
+			err = ops->ndo_set_vf_port(dev, vf, vf_port);
+		if (err < 0)
+			goto errout;
+		modified = 1;
+	}
+	err = 0;
+
 errout:
 	if (err < 0 && modified && net_ratelimit())
 		printk(KERN_WARNING "A link change request failed with "