[net-next,1/2] add iovnl netlink support

Message ID	C7F60C62.2AC93%scofeldm@cisco.com
State	Changes Requested, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> User-Agent: Microsoft-Entourage/12.24.0.100205 Date: Thu, 22 Apr 2010 14:23:30 -0700 Subject: Re: [net-next PATCH 1/2] add iovnl netlink support From: Scott Feldman <scofeldm@cisco.com> To: David Miller <davem@davemloft.net> CC: <netdev@vger.kernel.org>, <chrisw@redhat.com> Message-ID: <C7F60C62.2AC93%scofeldm@cisco.com> Thread-Topic: [net-next PATCH 1/2] add iovnl netlink support Thread-Index: AcriYg3Rt8vn4ct+oEikTJgaroim8A== In-Reply-To: <20100421.234849.51685723.davem@davemloft.net> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk

Scott Feldman April 22, 2010, 9:23 p.m. UTC

On 4/21/10 11:48 PM, "David Miller" <davem@davemloft.net> wrote:

> From: Scott Feldman <scofeldm@cisco.com>
> Date: Mon, 19 Apr 2010 12:18:07 -0700
> 
>> +#define IOVNL_PROTO_VERSION 1
>> +
> 
> Please delete this in the final version, the macro isn't even used by
> the code.
> 
> We don't do protocol versioning in netlink.  Instead we get the base
> stuff solid from the beginning, and then if something needs fixing up
> we handle this using new attributes in a way which is both backward
> and forward compatible.

Sounds good to me, was a cut-and-paste from dcbnl.h.  How about:


-scott

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Miller April 22, 2010, 11:04 p.m. UTC | #1

From: Scott Feldman <scofeldm@cisco.com>
Date: Thu, 22 Apr 2010 14:23:30 -0700

> On 4/21/10 11:48 PM, "David Miller" <davem@davemloft.net> wrote:
> 
>> From: Scott Feldman <scofeldm@cisco.com>
>> Date: Mon, 19 Apr 2010 12:18:07 -0700
>> 
>>> +#define IOVNL_PROTO_VERSION 1
>>> +
>> 
>> Please delete this in the final version, the macro isn't even used by
>> the code.
>> 
>> We don't do protocol versioning in netlink.  Instead we get the base
>> stuff solid from the beginning, and then if something needs fixing up
>> we handle this using new attributes in a way which is both backward
>> and forward compatible.
> 
> Sounds good to me, was a cut-and-paste from dcbnl.h.  How about:

This is perfectly fine except it got whitespace damanged by your
email client and needs a proper commit message and signoff :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anirban Chakraborty April 22, 2010, 11:16 p.m. UTC | #2

Hi All,

I am following the discussions on iovnl patch closely. While it is going to take some time for iovnl patch to be reviewed and accepted, what would be the interim approach to manage the eswitch in NIC? We need to add support in qlcnic driver to configure the eswitch in our 10G NIC. Some of the things that we need to set to the switch are setting a port's VLAN, tx bandwidth etc. We would like to set these parameters for a bunch of ports at the start of the day and set it to the eswitch.
Can we expose sysfs nodes to manage the eswitch or should we have a netlink/ioctl support put in the driver?  Not sure if we can do it via sysfs in a clean way. Netlink seems to be the ideal candidate for this.  What is an acceptable solution? Any suggesstion, advice will be highly appreciated.

thanks much,
Anirban Chakraborty
 --
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Scott Feldman April 23, 2010, 12:47 a.m. UTC | #3

On 4/22/10 4:16 PM, "Anirban Chakraborty" <anirban.chakraborty@qlogic.com>
wrote:

> I am following the discussions on iovnl patch closely. While it is going to
> take some time for iovnl patch to be reviewed and accepted, what would be the
> interim approach to manage the eswitch in NIC? We need to add support in
> qlcnic driver to configure the eswitch in our 10G NIC. Some of the things that
> we need to set to the switch are setting a port's VLAN, tx bandwidth etc. We
> would like to set these parameters for a bunch of ports at the start of the
> day and set it to the eswitch.

Are any of these settings covered in DCB?  (net/dcb/dcbnl.c).  Maybe you can
get a start there?  Not sure not knowing your device requirements.

-scott

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Scott Feldman April 23, 2010, 1:29 a.m. UTC | #4

On 4/22/10 5:47 PM, "Scott Feldman" <scofeldm@cisco.com> wrote:

> On 4/22/10 4:16 PM, "Anirban Chakraborty" <anirban.chakraborty@qlogic.com>
> wrote:
> 
>> I am following the discussions on iovnl patch closely. While it is going to
>> take some time for iovnl patch to be reviewed and accepted, what would be the
>> interim approach to manage the eswitch in NIC? We need to add support in
>> qlcnic driver to configure the eswitch in our 10G NIC. Some of the things
>> that
>> we need to set to the switch are setting a port's VLAN, tx bandwidth etc. We
>> would like to set these parameters for a bunch of ports at the start of the
>> day and set it to the eswitch.
> 
> Are any of these settings covered in DCB?  (net/dcb/dcbnl.c).  Maybe you can
> get a start there?  Not sure not knowing your device requirements.

Or maybe the RTM_SETLINK IFLA_VF_* ops in include/linux/if_link.h?  Those
seem like what you're looking for.  I'm looking at moving iovnl here as well
for port-profile.

-scott

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anirban Chakraborty April 23, 2010, 5:57 a.m. UTC | #5

On Apr 22, 2010, at 6:29 PM, Scott Feldman wrote:

> On 4/22/10 5:47 PM, "Scott Feldman" <scofeldm@cisco.com> wrote:
> 
>> On 4/22/10 4:16 PM, "Anirban Chakraborty" <anirban.chakraborty@qlogic.com>
>> wrote:
>> 
>>> I am following the discussions on iovnl patch closely. While it is going to
>>> take some time for iovnl patch to be reviewed and accepted, what would be the
>>> interim approach to manage the eswitch in NIC? We need to add support in
>>> qlcnic driver to configure the eswitch in our 10G NIC. Some of the things
>>> that
>>> we need to set to the switch are setting a port's VLAN, tx bandwidth etc. We
>>> would like to set these parameters for a bunch of ports at the start of the
>>> day and set it to the eswitch.
>> 
>> Are any of these settings covered in DCB?  (net/dcb/dcbnl.c).  Maybe you can
>> get a start there?  Not sure not knowing your device requirements.
> 
> Or maybe the RTM_SETLINK IFLA_VF_* ops in include/linux/if_link.h?  Those
> seem like what you're looking for.  I'm looking at moving iovnl here as well
> for port-profile.

It looks like ifla_vf_info does contain most of the data set. But if I use it, what NETLINK protocol family should I use in my driver to receive netlink messages? Do I need to create a private protocol family?

Thanks a lot,
Anirban

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Arnd Bergmann April 23, 2010, 12:42 p.m. UTC | #6

On Friday 23 April 2010, Anirban Chakraborty wrote:
> On Apr 22, 2010, at 6:29 PM, Scott Feldman wrote:
> > On 4/22/10 5:47 PM, "Scott Feldman" <scofeldm@cisco.com> wrote:
> >> 
> >> Are any of these settings covered in DCB?  (net/dcb/dcbnl.c).  Maybe you can
> >> get a start there?  Not sure not knowing your device requirements.
> > 
> > Or maybe the RTM_SETLINK IFLA_VF_* ops in include/linux/if_link.h?  Those
> > seem like what you're looking for.  I'm looking at moving iovnl here as well
> > for port-profile.
> 
> It looks like ifla_vf_info does contain most of the data set. But if I use it, what
> NETLINK protocol family should I use in my driver to receive netlink messages? Do I
> need to create a private protocol family?

Your driver should implement the ndo_set_vf_*/ndo_get_vf_* callbacks, not
implement the netlink protocol itself. If there is anything missing in the
existing callbacks that you require for the operation of your driver, you
should send patches to extend the implementation in net/core/rtnetlink.c.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Wright April 23, 2010, 4:23 p.m. UTC | #7

* Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
> 
> On Apr 22, 2010, at 6:29 PM, Scott Feldman wrote:
> 
> > On 4/22/10 5:47 PM, "Scott Feldman" <scofeldm@cisco.com> wrote:
> > 
> >> On 4/22/10 4:16 PM, "Anirban Chakraborty" <anirban.chakraborty@qlogic.com>
> >> wrote:
> >> 
> >>> I am following the discussions on iovnl patch closely. While it is going to
> >>> take some time for iovnl patch to be reviewed and accepted, what would be the
> >>> interim approach to manage the eswitch in NIC? We need to add support in
> >>> qlcnic driver to configure the eswitch in our 10G NIC. Some of the things
> >>> that
> >>> we need to set to the switch are setting a port's VLAN, tx bandwidth etc. We
> >>> would like to set these parameters for a bunch of ports at the start of the
> >>> day and set it to the eswitch.
> >> 
> >> Are any of these settings covered in DCB?  (net/dcb/dcbnl.c).  Maybe you can
> >> get a start there?  Not sure not knowing your device requirements.
> > 
> > Or maybe the RTM_SETLINK IFLA_VF_* ops in include/linux/if_link.h?  Those
> > seem like what you're looking for.  I'm looking at moving iovnl here as well
> > for port-profile.
> 
> It looks like ifla_vf_info does contain most of the data set. But if I use it, what NETLINK protocol family should I use in my driver to receive netlink messages? Do I need to create a private protocol family?

No, you don't need to use netlink in your driver.  You just need to fill
in the relevant net_device_ops in your driver init.  Specifically:

 *      SR-IOV management functions.
 * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
 * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8 qos);
 * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
 * int (*ndo_get_vf_config)(struct net_device *dev,
 *                          int vf, struct ifla_vf_info *ivf);

These are all operating on a VF indexed internally w/in the driver, so it's
a little cumbersome to use from userspace.

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anirban Chakraborty April 23, 2010, 7 p.m. UTC | #8

On Apr 23, 2010, at 9:23 AM, Chris Wright wrote:

> * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
>> 
>> On Apr 22, 2010, at 6:29 PM, Scott Feldman wrote:
>> 
>>> On 4/22/10 5:47 PM, "Scott Feldman" <scofeldm@cisco.com> wrote:
>>> 
>>>> On 4/22/10 4:16 PM, "Anirban Chakraborty" <anirban.chakraborty@qlogic.com>
>>>> wrote:
>>>> 
>>>>> I am following the discussions on iovnl patch closely. While it is going to
>>>>> take some time for iovnl patch to be reviewed and accepted, what would be the
>>>>> interim approach to manage the eswitch in NIC? We need to add support in
>>>>> qlcnic driver to configure the eswitch in our 10G NIC. Some of the things
>>>>> that
>>>>> we need to set to the switch are setting a port's VLAN, tx bandwidth etc. We
>>>>> would like to set these parameters for a bunch of ports at the start of the
>>>>> day and set it to the eswitch.
>>>> 
>>>> Are any of these settings covered in DCB?  (net/dcb/dcbnl.c).  Maybe you can
>>>> get a start there?  Not sure not knowing your device requirements.
>>> 
>>> Or maybe the RTM_SETLINK IFLA_VF_* ops in include/linux/if_link.h?  Those
>>> seem like what you're looking for.  I'm looking at moving iovnl here as well
>>> for port-profile.
>> 
>> It looks like ifla_vf_info does contain most of the data set. But if I use it, what NETLINK protocol family should I use in my driver to receive netlink messages? Do I need to create a private protocol family?
> 
> No, you don't need to use netlink in your driver.  You just need to fill
> in the relevant net_device_ops in your driver init.  Specifically:
> 
> *      SR-IOV management functions.
> * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
> * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8 qos);
> * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
> * int (*ndo_get_vf_config)(struct net_device *dev,
> *                          int vf, struct ifla_vf_info *ivf);
> 
> These are all operating on a VF indexed internally w/in the driver, so it's
> a little cumbersome to use from userspace.

These are all intended for VFs and are configureable from PF. However, in our case, there are multiple physical NIC function on a port which are configureable by the eswitch. So, what we are setting is essentially switch ports, rather than configuring any setting on the physical functions. If netlink doesn't fly, is sysfs going to work? If we allocate a buffer and fill it up with user space tools that the driver grabs it and does the configuration itself?  

thanks,
Anirban


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Wright April 23, 2010, 7:44 p.m. UTC | #9

* Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
> On Apr 23, 2010, at 9:23 AM, Chris Wright wrote:
> > * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
> >> It looks like ifla_vf_info does contain most of the data set. But if I use it, what NETLINK protocol family should I use in my driver to receive netlink messages? Do I need to create a private protocol family?
> > 
> > No, you don't need to use netlink in your driver.  You just need to fill
> > in the relevant net_device_ops in your driver init.  Specifically:
> > 
> > *      SR-IOV management functions.
> > * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
> > * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8 qos);
> > * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
> > * int (*ndo_get_vf_config)(struct net_device *dev,
> > *                          int vf, struct ifla_vf_info *ivf);
> > 
> > These are all operating on a VF indexed internally w/in the driver, so it's
> > a little cumbersome to use from userspace.
> 
> These are all intended for VFs and are configureable from PF.

Yes, and while the set of callbacks can change, they are always tied to
some net_device (typically the PF) that knows how to make hardware
settings on behalf of a VF.

> However, in our case, there are multiple physical NIC function on a
> port which are configureable by the eswitch.

Is there a PCI function that represents the switch?  Or a special PCI
NIC function that has VEB mgmt plane access?  And do you have examples
of configuration that you'll do here?

> So, what we are setting
> is essentially switch ports, rather than configuring any setting on the
> physical functions. If netlink doesn't fly, is sysfs going to work?

Before we go to implementation specifics (i.e. netlink vs. sysfs, and my
guess is sysfs isn't going to be the right fit), let's step back and
look at what needs setting.

> If
> we allocate a buffer and fill it up with user space tools that the driver
> grabs it and does the configuration itself?

One idea that has been discussed in the past is to create essentially
a pluggable set of bridge_ops.  The first step would be purely internal
shuffling, to make the existing sw bridge code go through the bridge_ops.
The second step would be making your driver for whichever PCI function
you have that supports managing the bridge create a net_device which is
a bridge during driver init.  And now normal brctl can call into your
VEB via the bridge_ops callbacks. </handwave>

But this too starts w/ looking at what the management requirements are
for your bridge.  Can you enumerate those?

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anirban Chakraborty April 23, 2010, 9:08 p.m. UTC | #10

On Apr 23, 2010, at 12:44 PM, Chris Wright wrote:

> * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
>> On Apr 23, 2010, at 9:23 AM, Chris Wright wrote:
>>> * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
>>>> It looks like ifla_vf_info does contain most of the data set. But if I use it, what NETLINK protocol family should I use in my driver to receive netlink messages? Do I need to create a private protocol family?
>>> 
>>> No, you don't need to use netlink in your driver.  You just need to fill
>>> in the relevant net_device_ops in your driver init.  Specifically:
>>> 
>>> *      SR-IOV management functions.
>>> * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
>>> * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8 qos);
>>> * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
>>> * int (*ndo_get_vf_config)(struct net_device *dev,
>>> *                          int vf, struct ifla_vf_info *ivf);
>>> 
>>> These are all operating on a VF indexed internally w/in the driver, so it's
>>> a little cumbersome to use from userspace.
>> 
>> These are all intended for VFs and are configureable from PF.
> 
> Yes, and while the set of callbacks can change, they are always tied to
> some net_device (typically the PF) that knows how to make hardware
> settings on behalf of a VF.
> 
>> However, in our case, there are multiple physical NIC function on a
>> port which are configureable by the eswitch.
> 
> Is there a PCI function that represents the switch?  Or a special PCI
> NIC function that has VEB mgmt plane access?  And do you have examples
> of configuration that you'll do here?
There is no PCI function that represents the switch. However, one of the NIC functions can act as a privileged function to configure the eswitch. Typically the first NIC function that is enumerated in the bus manages the eswitch. Typical configurations would be to set tx bandwidth, VLAN ID, MAC address, promiscuous mode setting for each of these ports at the start of the day. This is useful in virtualization scenario where we can do PCI passthru of the functions to the guest and these settings for the guest are configured via the driver in the host.

<snip>
> 
> One idea that has been discussed in the past is to create essentially
> a pluggable set of bridge_ops.  The first step would be purely internal
> shuffling, to make the existing sw bridge code go through the bridge_ops.
> The second step would be making your driver for whichever PCI function
> you have that supports managing the bridge create a net_device which is
> a bridge during driver init.  And now normal brctl can call into your
> VEB via the bridge_ops callbacks. </handwave>
> 
I liked the idea of iovnl as it works by utilizing port profile. That way the eswitch can be configured with the same port profile that a vswitch in a hypervisor has.

thanks,
Anirban




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Wright April 23, 2010, 11:04 p.m. UTC | #11

* Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
> 
> On Apr 23, 2010, at 12:44 PM, Chris Wright wrote:
> 
> > * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
> >> On Apr 23, 2010, at 9:23 AM, Chris Wright wrote:
> >>> * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
> >>>> It looks like ifla_vf_info does contain most of the data set. But if I use it, what NETLINK protocol family should I use in my driver to receive netlink messages? Do I need to create a private protocol family?
> >>> 
> >>> No, you don't need to use netlink in your driver.  You just need to fill
> >>> in the relevant net_device_ops in your driver init.  Specifically:
> >>> 
> >>> *      SR-IOV management functions.
> >>> * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
> >>> * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8 qos);
> >>> * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
> >>> * int (*ndo_get_vf_config)(struct net_device *dev,
> >>> *                          int vf, struct ifla_vf_info *ivf);
> >>> 
> >>> These are all operating on a VF indexed internally w/in the driver, so it's
> >>> a little cumbersome to use from userspace.
> >> 
> >> These are all intended for VFs and are configureable from PF.
> > 
> > Yes, and while the set of callbacks can change, they are always tied to
> > some net_device (typically the PF) that knows how to make hardware
> > settings on behalf of a VF.
> > 
> >> However, in our case, there are multiple physical NIC function on a
> >> port which are configureable by the eswitch.
> > 
> > Is there a PCI function that represents the switch?  Or a special PCI
> > NIC function that has VEB mgmt plane access?  And do you have examples
> > of configuration that you'll do here?
> 
> There is no PCI function that represents the switch. However, one
> of the NIC functions can act as a privileged function to configure the
> eswitch. Typically the first NIC function that is enumerated in the bus
> manages the eswitch. Typical configurations would be to set tx bandwidth,
> VLAN ID, MAC address, promiscuous mode setting for each of these ports
> at the start of the day. This is useful in virtualization scenario where
> we can do PCI passthru of the functions to the guest and these settings
> for the guest are configured via the driver in the host.

(btw, this is not uncommon, there other adapters that have multiple
functions for a single physical port that is not SR-IOV based)

How does the privileged function identify the other functions?  IOW, the
existing SR-IOV ndo callbacks have most of the above (tx bw control, mac,
vlan id), and have an 'int vf' which is basically just a driver specific
identifier to a non-privileged function or set of hw resources.  It looks
like you can use the existing bits (just need to expand a little).

So far we have only:

- tx bw control
- set mac addr
- set vlan id

You've additionally identified:

- set promiscuous mode

I'm also aware of:

- setting port aggregation
- issuing a function reset
- setting port mirroring or bcast/mcast replication
- setting anti-spoofing (mac/vlan..)
- setting security/filtering
- getting port statistics
- ...whatever else I'm forgetting

> <snip>
> > 
> > One idea that has been discussed in the past is to create essentially
> > a pluggable set of bridge_ops.  The first step would be purely internal
> > shuffling, to make the existing sw bridge code go through the bridge_ops.
> > The second step would be making your driver for whichever PCI function
> > you have that supports managing the bridge create a net_device which is
> > a bridge during driver init.  And now normal brctl can call into your
> > VEB via the bridge_ops callbacks. </handwave>
> > 
> I liked the idea of iovnl as it works by utilizing port profile. That way the eswitch can be configured with the same port profile that a vswitch in a hypervisor has.

I don't quite follow you here.

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anirban Chakraborty April 24, 2010, 6:21 a.m. UTC | #12

On Apr 23, 2010, at 4:04 PM, Chris Wright wrote:

> * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
>> 
>> On Apr 23, 2010, at 12:44 PM, Chris Wright wrote:
>> 
>>> * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
>>>> On Apr 23, 2010, at 9:23 AM, Chris Wright wrote:
>>>>> * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
>>>>>> It looks like ifla_vf_info does contain most of the data set. But if I use it, what NETLINK protocol family should I use in my driver to receive netlink messages? Do I need to create a private protocol family?
>>>>> 
>>>>> No, you don't need to use netlink in your driver.  You just need to fill
>>>>> in the relevant net_device_ops in your driver init.  Specifically:
>>>>> 
>>>>> *      SR-IOV management functions.
>>>>> * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
>>>>> * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8 qos);
>>>>> * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
>>>>> * int (*ndo_get_vf_config)(struct net_device *dev,
>>>>> *                          int vf, struct ifla_vf_info *ivf);
>>>>> 
>>>>> These are all operating on a VF indexed internally w/in the driver, so it's
>>>>> a little cumbersome to use from userspace.
>>>> 
>>>> These are all intended for VFs and are configureable from PF.
>>> 
>>> Yes, and while the set of callbacks can change, they are always tied to
>>> some net_device (typically the PF) that knows how to make hardware
>>> settings on behalf of a VF.
>>> 
>>>> However, in our case, there are multiple physical NIC function on a
>>>> port which are configureable by the eswitch.
>>> 
>>> Is there a PCI function that represents the switch?  Or a special PCI
>>> NIC function that has VEB mgmt plane access?  And do you have examples
>>> of configuration that you'll do here?
>> 
>> There is no PCI function that represents the switch. However, one
>> of the NIC functions can act as a privileged function to configure the
>> eswitch. Typically the first NIC function that is enumerated in the bus
>> manages the eswitch. Typical configurations would be to set tx bandwidth,
>> VLAN ID, MAC address, promiscuous mode setting for each of these ports
>> at the start of the day. This is useful in virtualization scenario where
>> we can do PCI passthru of the functions to the guest and these settings
>> for the guest are configured via the driver in the host.
> 
> (btw, this is not uncommon, there other adapters that have multiple
> functions for a single physical port that is not SR-IOV based)
> 
> How does the privileged function identify the other functions?  IOW, the
> existing SR-IOV ndo callbacks have most of the above (tx bw control, mac,
> vlan id), and have an 'int vf' which is basically just a driver specific
> identifier to a non-privileged function or set of hw resources.  It looks
> like you can use the existing bits (just need to expand a little).
> 
> So far we have only:
> 
> - tx bw control
> - set mac addr
> - set vlan id
> 
> You've additionally identified:
> 
> - set promiscuous mode
> 
> I'm also aware of:
> 
> - setting port aggregation
> - issuing a function reset
> - setting port mirroring or bcast/mcast replication
> - setting anti-spoofing (mac/vlan..)
> - setting security/filtering
> - getting port statistics
> - ...whatever else I'm forgetting
Scott's latest patch already addressed some of these. May be we should add the missing pieces, e.g. setting promiscuous mode, port mirroring etc. from the above list to ndo_ops. Function reset should be handled via FLR.

> 
>> <snip>
>>> 
>>> One idea that has been discussed in the past is to create essentially
>>> a pluggable set of bridge_ops.  The first step would be purely internal
>>> shuffling, to make the existing sw bridge code go through the bridge_ops.
>>> The second step would be making your driver for whichever PCI function
>>> you have that supports managing the bridge create a net_device which is
>>> a bridge during driver init.  And now normal brctl can call into your
>>> VEB via the bridge_ops callbacks. </handwave>
>>> 
>> I liked the idea of iovnl as it works by utilizing port profile. That way the eswitch can be configured with the same port profile that a vswitch in a hypervisor has.
> 
> I don't quite follow you here.
If I am not mistaken, port profile is supposed to keep configuration data of a NIC port and a software vswitch typically residing at the host uses it. When there are multiple physical NICs (on the same physical port) in the hypervisor, there are multiple vswitches created for each of the pNICs. The inter vm traffic in this case goes via the eswitch and thats where the eswitch configuration for these ports comes into picture.

thanks,
Anirban--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next,1/2] add iovnl netlink support

Commit Message

Comments

Patch