diff mbox

[net-next,1/2] add iovnl netlink support

Message ID 20100421181021.GC25928@x200.localdomain
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Chris Wright April 21, 2010, 6:10 p.m. UTC
* Arnd Bergmann (arnd@arndb.de) wrote:
> On Wednesday 21 April 2010, Chris Wright wrote:
> > * Arnd Bergmann (arnd@arndb.de) wrote:
> > > Since it seems what you really want to do is to do the exchange with the
> > > switch from here, maybe the hardware configuration part should be moved
> > > the DCB interface?
> > 
> > I suppose this would work  (although it's a bit odd being out of scope
> > of DCB spec).
> 
> It could be anywhere, it doesn't have to be the DCB interface, but could
> be anything ranging from ethtool to iplink I guess. And we should define
> it in a way that works for any SR-IOV card, whether it's using Cisco's
> protocol in firmware, 802.1Qbg VDP in firmware, lldpad to do VDP or
> none of the above and just provides an internal switch like all the
> existing NICs.

Heh, that's exactly what iovnl does ;-)

> > I don't expect mgmt app to care about the implementation
> > specifics of an adapter, so it will always send this and iovnl message
> > too.  All as part of same setup.
> 
> Why? I really see these things as separate. Obviously a management
> tool like libvirt would need to do both these things eventually, but
> each of them has multiple options that can be combined in various
> ways:
> 
> 1. Setting up the slave device
>  a) create an SR-IOV VF to assign to a guest
>  b) create a macvtap device to pass to qemu or vhost
>  c) attach a tap device to a bridge
>  d) create a macvlan device and put it into a container
>  e) create a virtual interface for a VMDq adapter

OK, but iovnl isn't doing this.

> 2) Registering the slave with the switch
>  a) use Cisco protocol in enic firmware (see patch 2/2)
>  b) use standard VDP in lldpad
>  c) use reverse-engineered cisco protocol in some user tool for
>     non-enic adapters.
>  d) use standard VDP in firmware (hopefully this never happens)
>  e) do nothing at all (as we do today)

And this is the step that is the main purpose of iovnl.

Here's the simplest snippet of libvirt to show this.  It sends
set_port_profile netlink messages and then creates macvtap.  As simple
as it gets...

the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Arnd Bergmann April 21, 2010, 7:39 p.m. UTC | #1
On Wednesday 21 April 2010, Chris Wright wrote:
> * Arnd Bergmann (arnd@arndb.de) wrote:
> > On Wednesday 21 April 2010, Chris Wright wrote:
> > > * Arnd Bergmann (arnd@arndb.de) wrote:
> > > > Since it seems what you really want to do is to do the exchange with the
> > > > switch from here, maybe the hardware configuration part should be moved
> > > > the DCB interface?
> > > 
> > > I suppose this would work  (although it's a bit odd being out of scope
> > > of DCB spec).
> > 
> > It could be anywhere, it doesn't have to be the DCB interface, but could
> > be anything ranging from ethtool to iplink I guess. And we should define
> > it in a way that works for any SR-IOV card, whether it's using Cisco's
> > protocol in firmware, 802.1Qbg VDP in firmware, lldpad to do VDP or
> > none of the above and just provides an internal switch like all the
> > existing NICs.
> 
> Heh, that's exactly what iovnl does ;-)

No, according to what you write below, it's exactly what iovnl does *not* do,
i.e. part 1 in my list.

> > 1. Setting up the slave device
> >  a) create an SR-IOV VF to assign to a guest
> >  b) create a macvtap device to pass to qemu or vhost
> >  c) attach a tap device to a bridge
> >  d) create a macvlan device and put it into a container
> >  e) create a virtual interface for a VMDq adapter
> 
> OK, but iovnl isn't doing this.

The set_mac_vlan that Scott's patch adds seems to implement 1a), as far
as I can tell. Interestingly, this is not actually implemented in
the enic driver in patch 2/2. So if we all agree that this is out of the
scope of iovnl, let's just remove it from the interface and find another
way for it (ethtool, iplink, ..., as listed above).

Note that we still need to pass the MAC address and VLAN ID (or a list
of these) to the external switch, my point is just that this should be
separate from enforcing it in the hypervisor.

> > 2) Registering the slave with the switch
> >  a) use Cisco protocol in enic firmware (see patch 2/2)
> >  b) use standard VDP in lldpad
> >  c) use reverse-engineered cisco protocol in some user tool for
> >     non-enic adapters.
> >  d) use standard VDP in firmware (hopefully this never happens)
> >  e) do nothing at all (as we do today)
> 
> And this is the step that is the main purpose of iovnl.
> 
> Here's the simplest snippet of libvirt to show this.  It sends
> set_port_profile netlink messages and then creates macvtap.  As simple
> as it gets...
> 
> --- a/src/qemu/qemu_conf.c
> +++ b/src/qemu/qemu_conf.c
> @@ -1470,6 +1470,11 @@ qemudPhysIfaceConnect(virConnectPtr conn,
>          net->model && STREQ(net->model, "virtio"))
>          vnet_hdr = 1;
>  
> +    setPortProfileId(net->data.direct.linkdev,
> +                      net->data.direct.mode,
> +                      net->data.direct.profileid,
> +                      net->mac);
> +
>      rc = openMacvtapTap(net->ifname, net->mac, linkdev, brmode,
>                          &res_ifname, vnet_hdr);

Ok. In case of VDP, I guess this needs to be extended with the vlan ID
that has been configured, and possibly with a UUID, because we need to
pass the same one on the target machine if we migrate it.

Alternatively, the setPortProfileId could figure out the MAC address and
VLAN ID from the device itself, but then we don't need to pass either of
them.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Feldman April 21, 2010, 8:25 p.m. UTC | #2
On 4/21/10 12:39 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:

>>> 1. Setting up the slave device
>>>  a) create an SR-IOV VF to assign to a guest
>>>  b) create a macvtap device to pass to qemu or vhost
>>>  c) attach a tap device to a bridge
>>>  d) create a macvlan device and put it into a container
>>>  e) create a virtual interface for a VMDq adapter
>> 
>> OK, but iovnl isn't doing this.
> 
> The set_mac_vlan that Scott's patch adds seems to implement 1a), as far
> as I can tell. Interestingly, this is not actually implemented in
> the enic driver in patch 2/2. So if we all agree that this is out of the
> scope of iovnl, let's just remove it from the interface and find another
> way for it (ethtool, iplink, ..., as listed above).

You're right, not needed for enic since mac addr is included with
port-profile push and vlan membership is implied by port-profile.  So I put
set_mac_vlan in there basically to elicit feedback.

There really wouldn't be much different between iplink and iovnl since
they're both rtnetlink...seems we should keep IOV-related APIs in one place.
Maybe there are other IOV APIs to add to iovnl in the future like:

    vf <- add_vf(pf)
    del_vf(pf, vf)

Ethtool doesn't seem the right place for this.

-scott

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann April 21, 2010, 9:13 p.m. UTC | #3
On Wednesday 21 April 2010, Scott Feldman wrote:
> On 4/21/10 12:39 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> 
> >>> 1. Setting up the slave device
> >>>  a) create an SR-IOV VF to assign to a guest
> >>>  b) create a macvtap device to pass to qemu or vhost
> >>>  c) attach a tap device to a bridge
> >>>  d) create a macvlan device and put it into a container
> >>>  e) create a virtual interface for a VMDq adapter
> >> 
> >> OK, but iovnl isn't doing this.
> > 
> > The set_mac_vlan that Scott's patch adds seems to implement 1a), as far
> > as I can tell. Interestingly, this is not actually implemented in
> > the enic driver in patch 2/2. So if we all agree that this is out of the
> > scope of iovnl, let's just remove it from the interface and find another
> > way for it (ethtool, iplink, ..., as listed above).
> 
> You're right, not needed for enic since mac addr is included with
> port-profile push and vlan membership is implied by port-profile.  So I put
> set_mac_vlan in there basically to elicit feedback.

Ok. Two points though:

- when you say that the mac address is included in the port-profile push,
  does that imply that the VF does not have a mac address prior to this?
  This would again mix the NIC configuration phase with the switch
  association, which I think we really need to avoid if we want to be
  able to implement the association in user space!

- The VLAN ID being implied in the port profile seems to be another
  difference between what enic is doing and the current draft VDP
  that will eventually become 802.1Qbg, and I fear that this difference
  will be visible in the iovnl protocol.

> There really wouldn't be much different between iplink and iovnl since
> they're both rtnetlink...seems we should keep IOV-related APIs in one place.
> Maybe there are other IOV APIs to add to iovnl in the future like:
> 
>     vf <- add_vf(pf)
>     del_vf(pf, vf)
> 
> Ethtool doesn't seem the right place for this.

Right. My preference would probably be make these a subcategory of
the if_link, and use the existing RTM_NEWLINK/RTM_DELLINK commands.
This would make it resemble the existing interfaces and mean you can
use

ip link add link eth0 type macvlan    # for a container
ip link add link eth0 type macvtap    # for qemu/vhost
ip link add link eth0 type vf         # for device assignment

There are obviously significant differences between these three, but
they also share enough of their properties to let us treat them
in similar ways.

If we integrate the iovnl client into iproute2, the sequence for setting
up an enic VF and associating it to the port profile could be

# create vf0, pass mac and vlan id to HW, no association yet
ip link add link eth0 name vf0 type vf mac fe:dc:ba:12:34:56 vlan 78

# associate vf with port profile, mac address must match the one assigned
#  to the interface before.
ip iov assoc eth0 port-profile "general" host-uuid "dcf2a873-f5ee-41dd-a7ad-802a544e48c2" \
	 mac fe:dc:ba:12:34:56

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Wright April 21, 2010, 10:18 p.m. UTC | #4
* Arnd Bergmann (arnd@arndb.de) wrote:
> On Wednesday 21 April 2010, Chris Wright wrote:
> > * Arnd Bergmann (arnd@arndb.de) wrote:
> > > On Wednesday 21 April 2010, Chris Wright wrote:
> > > > * Arnd Bergmann (arnd@arndb.de) wrote:
> > > > > Since it seems what you really want to do is to do the exchange with the
> > > > > switch from here, maybe the hardware configuration part should be moved
> > > > > the DCB interface?
> > > > 
> > > > I suppose this would work  (although it's a bit odd being out of scope
> > > > of DCB spec).
> > > 
> > > It could be anywhere, it doesn't have to be the DCB interface, but could
> > > be anything ranging from ethtool to iplink I guess. And we should define
> > > it in a way that works for any SR-IOV card, whether it's using Cisco's
> > > protocol in firmware, 802.1Qbg VDP in firmware, lldpad to do VDP or
> > > none of the above and just provides an internal switch like all the
> > > existing NICs.
> > 
> > Heh, that's exactly what iovnl does ;-)
> 
> No, according to what you write below, it's exactly what iovnl does *not* do,
> i.e. part 1 in my list.

OK, I see...in this case to me hw setup was the port profile for the
enic to initiate host<->switch negotiation, sorry for confusion.

> > > 1. Setting up the slave device
> > >  a) create an SR-IOV VF to assign to a guest
> > >  b) create a macvtap device to pass to qemu or vhost
> > >  c) attach a tap device to a bridge
> > >  d) create a macvlan device and put it into a container
> > >  e) create a virtual interface for a VMDq adapter
> > 
> > OK, but iovnl isn't doing this.
> 
> The set_mac_vlan that Scott's patch adds seems to implement 1a), as far
> as I can tell. Interestingly, this is not actually implemented in
> the enic driver in patch 2/2. So if we all agree that this is out of the
> scope of iovnl, let's just remove it from the interface and find another
> way for it (ethtool, iplink, ..., as listed above).

Scott, any objection?  At least a way to keep moving forward on the port
profile bit.

> Note that we still need to pass the MAC address and VLAN ID (or a list
> of these) to the external switch, my point is just that this should be
> separate from enforcing it in the hypervisor.

Yup, we should focus on reconciling the diff of enic vs vpd port profile
needs.

> > > 2) Registering the slave with the switch
> > >  a) use Cisco protocol in enic firmware (see patch 2/2)
> > >  b) use standard VDP in lldpad
> > >  c) use reverse-engineered cisco protocol in some user tool for
> > >     non-enic adapters.
> > >  d) use standard VDP in firmware (hopefully this never happens)
> > >  e) do nothing at all (as we do today)
> > 
> > And this is the step that is the main purpose of iovnl.
> > 
> > Here's the simplest snippet of libvirt to show this.  It sends
> > set_port_profile netlink messages and then creates macvtap.  As simple
> > as it gets...
> > 
> > --- a/src/qemu/qemu_conf.c
> > +++ b/src/qemu/qemu_conf.c
> > @@ -1470,6 +1470,11 @@ qemudPhysIfaceConnect(virConnectPtr conn,
> >          net->model && STREQ(net->model, "virtio"))
> >          vnet_hdr = 1;
> >  
> > +    setPortProfileId(net->data.direct.linkdev,
> > +                      net->data.direct.mode,
> > +                      net->data.direct.profileid,
> > +                      net->mac);
> > +
> >      rc = openMacvtapTap(net->ifname, net->mac, linkdev, brmode,
> >                          &res_ifname, vnet_hdr);
> 
> Ok. In case of VDP, I guess this needs to be extended with the vlan ID
> that has been configured, and possibly with a UUID, because we need to
> pass the same one on the target machine if we migrate it.
> 
> Alternatively, the setPortProfileId could figure out the MAC address and
> VLAN ID from the device itself, but then we don't need to pass either of
> them.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Wright April 21, 2010, 10:48 p.m. UTC | #5
* Arnd Bergmann (arnd@arndb.de) wrote:
> On Wednesday 21 April 2010, Scott Feldman wrote:
> > On 4/21/10 12:39 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> > 
> > >>> 1. Setting up the slave device
> > >>>  a) create an SR-IOV VF to assign to a guest
> > >>>  b) create a macvtap device to pass to qemu or vhost
> > >>>  c) attach a tap device to a bridge
> > >>>  d) create a macvlan device and put it into a container
> > >>>  e) create a virtual interface for a VMDq adapter
> > >> 
> > >> OK, but iovnl isn't doing this.
> > > 
> > > The set_mac_vlan that Scott's patch adds seems to implement 1a), as far
> > > as I can tell. Interestingly, this is not actually implemented in
> > > the enic driver in patch 2/2. So if we all agree that this is out of the
> > > scope of iovnl, let's just remove it from the interface and find another
> > > way for it (ethtool, iplink, ..., as listed above).
> > 
> > You're right, not needed for enic since mac addr is included with
> > port-profile push and vlan membership is implied by port-profile.  So I put
> > set_mac_vlan in there basically to elicit feedback.
> 
> Ok. Two points though:
> 
> - when you say that the mac address is included in the port-profile push,
>   does that imply that the VF does not have a mac address prior to this?
>   This would again mix the NIC configuration phase with the switch
>   association, which I think we really need to avoid if we want to be
>   able to implement the association in user space!
> 
> - The VLAN ID being implied in the port profile seems to be another
>   difference between what enic is doing and the current draft VDP
>   that will eventually become 802.1Qbg, and I fear that this difference
>   will be visible in the iovnl protocol.
> 
> > There really wouldn't be much different between iplink and iovnl since
> > they're both rtnetlink...seems we should keep IOV-related APIs in one place.
> > Maybe there are other IOV APIs to add to iovnl in the future like:
> > 
> >     vf <- add_vf(pf)
> >     del_vf(pf, vf)
> > 
> > Ethtool doesn't seem the right place for this.
> 
> Right. My preference would probably be make these a subcategory of
> the if_link, and use the existing RTM_NEWLINK/RTM_DELLINK commands.
> This would make it resemble the existing interfaces and mean you can
> use
> 
> ip link add link eth0 type macvlan    # for a container
> ip link add link eth0 type macvtap    # for qemu/vhost
> ip link add link eth0 type vf         # for device assignment

BTW, what do you mean by device assignment?

> There are obviously significant differences between these three, but
> they also share enough of their properties to let us treat them
> in similar ways.
> 
> If we integrate the iovnl client into iproute2, the sequence for setting
> up an enic VF and associating it to the port profile could be
> 
> # create vf0, pass mac and vlan id to HW, no association yet
> ip link add link eth0 name vf0 type vf mac fe:dc:ba:12:34:56 vlan 78

Just to clarify...right now, the normal SR-IOV VF is already there.
And, or course, can have its mac addr/vlan set already.

> # associate vf with port profile, mac address must match the one assigned
> #  to the interface before.
> ip iov assoc eth0 port-profile "general" host-uuid "dcf2a873-f5ee-41dd-a7ad-802a544e48c2" \
> 	 mac fe:dc:ba:12:34:56

At that point you could just do s/mac fe:.*/link vf0/

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Feldman April 21, 2010, 11:54 p.m. UTC | #6
On 4/21/10 2:13 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:

> On Wednesday 21 April 2010, Scott Feldman wrote:
>> On 4/21/10 12:39 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:
>> 
>>>>> 1. Setting up the slave device
>>>>>  a) create an SR-IOV VF to assign to a guest
>>>>>  b) create a macvtap device to pass to qemu or vhost
>>>>>  c) attach a tap device to a bridge
>>>>>  d) create a macvlan device and put it into a container
>>>>>  e) create a virtual interface for a VMDq adapter
>>>> 
>>>> OK, but iovnl isn't doing this.
>>> 
>>> The set_mac_vlan that Scott's patch adds seems to implement 1a), as far
>>> as I can tell. Interestingly, this is not actually implemented in
>>> the enic driver in patch 2/2. So if we all agree that this is out of the
>>> scope of iovnl, let's just remove it from the interface and find another
>>> way for it (ethtool, iplink, ..., as listed above).
>> 
>> You're right, not needed for enic since mac addr is included with
>> port-profile push and vlan membership is implied by port-profile.  So I put
>> set_mac_vlan in there basically to elicit feedback.
> 
> Ok. Two points though:
> 
> - when you say that the mac address is included in the port-profile push,
>   does that imply that the VF does not have a mac address prior to this?

Correct, VF has no mac addr prior to port-profile being applied.  The
mac_addr is the mac_addr of the VM guest interface that's to use the VF.  If
the port-profile defines L2 mac spoofing, for example, the switch wants to
know the mac address before i/o starts.   I/o doesn't start until
port-profile is applied and the switch virtual port is setup.

>   This would again mix the NIC configuration phase with the switch
>   association, which I think we really need to avoid if we want to be
>   able to implement the association in user space!
> 
> - The VLAN ID being implied in the port profile seems to be another
>   difference between what enic is doing and the current draft VDP
>   that will eventually become 802.1Qbg, and I fear that this difference
>   will be visible in the iovnl protocol.

It's not just a VLAN ID, but the entire VLAN membership for the switch
virtual port.  The port-profile may define a single native VLAN for access
mode on the switch port, or a trunk mode with a list of allowed vlans, with
on native vlan.

The key is the port-profile.  The port-profile resolves the configuration of
the switch virtual port.  The configuration of the switch virtual port
includes many setting like I mentioned earlier: VLAN membership, QoS (rate
limits, priority class, L2 security, etc).
 
>> There really wouldn't be much different between iplink and iovnl since
>> they're both rtnetlink...seems we should keep IOV-related APIs in one place.
>> Maybe there are other IOV APIs to add to iovnl in the future like:
>> 
>>     vf <- add_vf(pf)
>>     del_vf(pf, vf)
>> 
>> Ethtool doesn't seem the right place for this.
> 
> Right. My preference would probably be make these a subcategory of
> the if_link, and use the existing RTM_NEWLINK/RTM_DELLINK commands.
> This would make it resemble the existing interfaces and mean you can
> use
>
> ip link add link eth0 type macvlan    # for a container
> ip link add link eth0 type macvtap    # for qemu/vhost
> ip link add link eth0 type vf         # for device assignment
> 
> There are obviously significant differences between these three, but
> they also share enough of their properties to let us treat them
> in similar ways.
> 

I don't have strong preference for iovnl vs. extending if_link.  I thought I
had a reason against if_link, but I can't recall that now...it'll probably
come to me when I look at it again.  Let me look again...
 
> If we integrate the iovnl client into iproute2, the sequence for setting
> up an enic VF and associating it to the port profile could be
> 
> # create vf0, pass mac and vlan id to HW, no association yet
> ip link add link eth0 name vf0 type vf mac fe:dc:ba:12:34:56 vlan 78
> 
> # associate vf with port profile, mac address must match the one assigned
> #  to the interface before.
> ip iov assoc eth0 port-profile "general" host-uuid
> "dcf2a873-f5ee-41dd-a7ad-802a544e48c2" \
> mac fe:dc:ba:12:34:56

Ya, that sounds pretty close.  I still want the flexibility to direct ops to
a PF link for a VF link.

-scott

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Feldman April 22, 2010, 12:01 a.m. UTC | #7
On 4/21/10 3:18 PM, "Chris Wright" <chrisw@redhat.com> wrote:

>> The set_mac_vlan that Scott's patch adds seems to implement 1a), as far
>> as I can tell. Interestingly, this is not actually implemented in
>> the enic driver in patch 2/2. So if we all agree that this is out of the
>> scope of iovnl, let's just remove it from the interface and find another
>> way for it (ethtool, iplink, ..., as listed above).
> 
> Scott, any objection?  At least a way to keep moving forward on the port
> profile bit.

Yes, that's fine with me, port-profile bit is the most important part.
 
>> Note that we still need to pass the MAC address and VLAN ID (or a list
>> of these) to the external switch, my point is just that this should be
>> separate from enforcing it in the hypervisor.
> 
> Yup, we should focus on reconciling the diff of enic vs vpd port profile
> needs.

-scott

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann April 22, 2010, 6:51 a.m. UTC | #8
On Thursday 22 April 2010, Chris Wright wrote:
> > 
> > ip link add link eth0 type macvlan    # for a container
> > ip link add link eth0 type macvtap    # for qemu/vhost
> > ip link add link eth0 type vf         # for device assignment
> 
> BTW, what do you mean by device assignment?

I mean giving an SR-IOV VF to the guest as a native PCI device
rather than having qemu or vhost present a virtio-net to the
guest.

> > There are obviously significant differences between these three, but
> > they also share enough of their properties to let us treat them
> > in similar ways.
> > 
> > If we integrate the iovnl client into iproute2, the sequence for setting
> > up an enic VF and associating it to the port profile could be
> > 
> > # create vf0, pass mac and vlan id to HW, no association yet
> > ip link add link eth0 name vf0 type vf mac fe:dc:ba:12:34:56 vlan 78
> 
> Just to clarify...right now, the normal SR-IOV VF is already there.
> And, or course, can have its mac addr/vlan set already.

I don't have an SR-IOV card available for testing yet. How is this
configured now?

> > # associate vf with port profile, mac address must match the one assigned
> > #  to the interface before.
> > ip iov assoc eth0 port-profile "general" host-uuid "dcf2a873-f5ee-41dd-a7ad-802a544e48c2" \
> >        mac fe:dc:ba:12:34:56
> 
> At that point you could just do s/mac fe:.*/link vf0/

My point was that this information should be irrelevant to the code doing the
association with the switch. It sort of makes sense when the receiver is enic,
but when we send the same data to lldpad, it doesn't care about the slave device
name but only about the mac address. Especially since the slave device might not
be in the root name space any more, meaning we have no way to find it.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 22, 2010, 7:09 a.m. UTC | #9
From: Arnd Bergmann <arnd@arndb.de>
Date: Wed, 21 Apr 2010 23:13:04 +0200

> My preference would probably be make these a subcategory of the
> if_link, and use the existing RTM_NEWLINK/RTM_DELLINK commands.

I was going to suggest this as well.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann April 22, 2010, 12:49 p.m. UTC | #10
On Thursday 22 April 2010, Scott Feldman wrote:
> On 4/21/10 2:13 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> > On Wednesday 21 April 2010, Scott Feldman wrote:
> >> On 4/21/10 12:39 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> >> You're right, not needed for enic since mac addr is included with
> >> port-profile push and vlan membership is implied by port-profile.  So I put
> >> set_mac_vlan in there basically to elicit feedback.
> > 
> > Ok. Two points though:
> > 
> > - when you say that the mac address is included in the port-profile push,
> >   does that imply that the VF does not have a mac address prior to this?
> 
> Correct, VF has no mac addr prior to port-profile being applied.  The
> mac_addr is the mac_addr of the VM guest interface that's to use the VF.  If
> the port-profile defines L2 mac spoofing, for example, the switch wants to
> know the mac address before i/o starts.   I/o doesn't start until
> port-profile is applied and the switch virtual port is setup.

Is it possible to split this this process, in order to make it more
closely resemble what we have when the registration is in user space?
This would mean that you assign a MAC address to the interface when the
interface gets created, and register the same MAC address at the switch
independent from the creation.

Obviously, if the port-profile (for enic) or the VSI list in the switch
enforces a the mac address and you pass one that's different from the
one that's set in the VF, it won't be able to send any data, but it
remains the job of the switch to enforce that case.

> It's not just a VLAN ID, but the entire VLAN membership for the switch
> virtual port.  The port-profile may define a single native VLAN for access
> mode on the switch port, or a trunk mode with a list of allowed vlans, with
> on native vlan.
> 
> The key is the port-profile.  The port-profile resolves the configuration of
> the switch virtual port.  The configuration of the switch virtual port
> includes many setting like I mentioned earlier: VLAN membership, QoS (rate
> limits, priority class, L2 security, etc).

Ok, I see.

> > If we integrate the iovnl client into iproute2, the sequence for setting
> > up an enic VF and associating it to the port profile could be
> > 
> > # create vf0, pass mac and vlan id to HW, no association yet
> > ip link add link eth0 name vf0 type vf mac fe:dc:ba:12:34:56 vlan 78
> > 
> > # associate vf with port profile, mac address must match the one assigned
> > #  to the interface before.
> > ip iov assoc eth0 port-profile "general" host-uuid
> > "dcf2a873-f5ee-41dd-a7ad-802a544e48c2" \
> > mac fe:dc:ba:12:34:56
> 
> Ya, that sounds pretty close.  I still want the flexibility to direct ops to
> a PF link for a VF link.

Does that mean you require passing both the PF and the VF name?

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Wright April 22, 2010, 5:47 p.m. UTC | #11
* Arnd Bergmann (arnd@arndb.de) wrote:
> On Thursday 22 April 2010, Chris Wright wrote:
> > > 
> > > ip link add link eth0 type macvlan    # for a container
> > > ip link add link eth0 type macvtap    # for qemu/vhost
> > > ip link add link eth0 type vf         # for device assignment
> > 
> > BTW, what do you mean by device assignment?
> 
> I mean giving an SR-IOV VF to the guest as a native PCI device
> rather than having qemu or vhost present a virtio-net to the
> guest.

OK, wasn't clear if you meant that or simply 100% dedicating the interface
via something like virtio.  The add_vf() idea, while neat, doesn't really
match how VF's are allocated.

> > > There are obviously significant differences between these three, but
> > > they also share enough of their properties to let us treat them
> > > in similar ways.
> > > 
> > > If we integrate the iovnl client into iproute2, the sequence for setting
> > > up an enic VF and associating it to the port profile could be
> > > 
> > > # create vf0, pass mac and vlan id to HW, no association yet
> > > ip link add link eth0 name vf0 type vf mac fe:dc:ba:12:34:56 vlan 78
> > 
> > Just to clarify...right now, the normal SR-IOV VF is already there.
> > And, or course, can have its mac addr/vlan set already.
> 
> I don't have an SR-IOV card available for testing yet. How is this
> configured now?

The device shows up in the host as a normal network device, so mgmt tools
currently treat it as if it's no different from a PF.  So that's just
plain old:

SIOCSIFHWADDR or RTM_SETLINK (i.e. normal ->ndo_set_mac_addr)

There's also the possiblity of configuring through the PF (although
this isn't really widely used ATM, and has the disadvantage of exposing
the VF number to userspace in a way that's difficult to use).  This is
also done via RTM_SETLINK (on the PF this time), and will result in
->ndo_set_vf_mac().

> > > # associate vf with port profile, mac address must match the one assigned
> > > #  to the interface before.
> > > ip iov assoc eth0 port-profile "general" host-uuid "dcf2a873-f5ee-41dd-a7ad-802a544e48c2" \
> > >        mac fe:dc:ba:12:34:56
> > 
> > At that point you could just do s/mac fe:.*/link vf0/
> 
> My point was that this information should be irrelevant to the code doing the
> association with the switch. It sort of makes sense when the receiver is enic,
> but when we send the same data to lldpad, it doesn't care about the slave device
> name but only about the mac address. Especially since the slave device might not
> be in the root name space any more, meaning we have no way to find it.

Yeah, w/ namespace I think you'd normally do all setup before handing
into a new namespace.

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann April 22, 2010, 6:48 p.m. UTC | #12
On Thursday 22 April 2010 19:47:29 Chris Wright wrote:
> OK, wasn't clear if you meant that or simply 100% dedicating the interface
> via something like virtio.  The add_vf() idea, while neat, doesn't really
> match how VF's are allocated.

But we still need something like that for allocating queues in VMDq
and similar cases where we do not have pass-through, right?

As far as I can tell we don't have an interface for that yet, but
we have drivers for a number of cards that could do this.

> > I don't have an SR-IOV card available for testing yet. How is this
> > configured now?
> 
> The device shows up in the host as a normal network device, so mgmt tools
> currently treat it as if it's no different from a PF.  So that's just
> plain old:
> 
> SIOCSIFHWADDR or RTM_SETLINK (i.e. normal ->ndo_set_mac_addr)

Ok, but that only works for a fixed number of VFs and you can only
configure the VF before it's assigned to the guest, right?

Both are not serious limitations, but it would be nice to
have an easy way around them. In particular, for assigning
the mac address and vlan id (VF in access mode), there needs
to be some interface that allows the host but not the guest
to change the settings after assigning the card to the guest.

This is a fundamental requirement for VEPA, because the switch
applied its forwarding rules based on the mac address and trusts
the hypervisor to make sure it cannot be faked by the guest.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Wright April 22, 2010, 7:02 p.m. UTC | #13
* Arnd Bergmann (arnd@arndb.de) wrote:
> On Thursday 22 April 2010 19:47:29 Chris Wright wrote:
> > OK, wasn't clear if you meant that or simply 100% dedicating the interface
> > via something like virtio.  The add_vf() idea, while neat, doesn't really
> > match how VF's are allocated.
> 
> But we still need something like that for allocating queues in VMDq
> and similar cases where we do not have pass-through, right?

Iff we care about VMDq w/out SR-IOV (since SR-IOV hardware is VMDq
capable and already has a queue-pair + interrupt + net_dev), yes.

And it's not just VMDq, it's any multi-queue card that can do mac/vlan
filter in hw + header/data split (for direct data DMA to guest buffers).

> As far as I can tell we don't have an interface for that yet, but
> we have drivers for a number of cards that could do this.
> 
> > > I don't have an SR-IOV card available for testing yet. How is this
> > > configured now?
> > 
> > The device shows up in the host as a normal network device, so mgmt tools
> > currently treat it as if it's no different from a PF.  So that's just
> > plain old:
> > 
> > SIOCSIFHWADDR or RTM_SETLINK (i.e. normal ->ndo_set_mac_addr)
> 
> Ok, but that only works for a fixed number of VFs and you can only
> configure the VF before it's assigned to the guest, right?

Depends on assign.

Assign meaning it's still visible in host, but only one guest is using
it via virtio (e.g. vhost-net)....then no, can change anytime (although
it's not typically changed during VM lifecycle).

Assign meaning direct PCI device assignment of the VF to the guest,
then yes, only while the device has driver in host.

> Both are not serious limitations, but it would be nice to
> have an easy way around them. In particular, for assigning
> the mac address and vlan id (VF in access mode), there needs
> to be some interface that allows the host but not the guest
> to change the settings after assigning the card to the guest.
> 
> This is a fundamental requirement for VEPA, because the switch
> applied its forwarding rules based on the mac address and trusts
> the hypervisor to make sure it cannot be faked by the guest.

Sure, but the VF (when directly assigned to the guest) is going to (at
least it should, for security reasons) always trap to a privileged code if
the guest tries to do something like set mac or vlan id.  All the SR-IOV
cards I've seen do this.  The "set VF mac addr" is really a message to
the PF.

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann April 22, 2010, 7:36 p.m. UTC | #14
On Thursday 22 April 2010 21:02:30 Chris Wright wrote:
> * Arnd Bergmann (arnd@arndb.de) wrote:
> > On Thursday 22 April 2010 19:47:29 Chris Wright wrote:
> > > OK, wasn't clear if you meant that or simply 100% dedicating the interface
> > > via something like virtio.  The add_vf() idea, while neat, doesn't really
> > > match how VF's are allocated.
> > 
> > But we still need something like that for allocating queues in VMDq
> > and similar cases where we do not have pass-through, right?
> 
> Iff we care about VMDq w/out SR-IOV (since SR-IOV hardware is VMDq
> capable and already has a queue-pair + interrupt + net_dev), yes.
> 
> And it's not just VMDq, it's any multi-queue card that can do mac/vlan
> filter in hw + header/data split (for direct data DMA to guest buffers).

Right, that's what I meant by VMDq. Do we have a better term to describe
this class of devices, i.e. VMDq and other cards that also have the
features you listed?

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Wright April 22, 2010, 9:03 p.m. UTC | #15
* Arnd Bergmann (arnd@arndb.de) wrote:
> On Thursday 22 April 2010 21:02:30 Chris Wright wrote:
> > * Arnd Bergmann (arnd@arndb.de) wrote:
> > > On Thursday 22 April 2010 19:47:29 Chris Wright wrote:
> > > > OK, wasn't clear if you meant that or simply 100% dedicating the interface
> > > > via something like virtio.  The add_vf() idea, while neat, doesn't really
> > > > match how VF's are allocated.
> > > 
> > > But we still need something like that for allocating queues in VMDq
> > > and similar cases where we do not have pass-through, right?
> > 
> > Iff we care about VMDq w/out SR-IOV (since SR-IOV hardware is VMDq
> > capable and already has a queue-pair + interrupt + net_dev), yes.
> > 
> > And it's not just VMDq, it's any multi-queue card that can do mac/vlan
> > filter in hw + header/data split (for direct data DMA to guest buffers).
> 
> Right, that's what I meant by VMDq. Do we have a better term to describe
> this class of devices, i.e. VMDq and other cards that also have the
> features you listed?

I don't have a good term.  Some of these devices can already surface
multiple netdevs.

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/src/qemu/qemu_conf.c
+++ b/src/qemu/qemu_conf.c
@@ -1470,6 +1470,11 @@  qemudPhysIfaceConnect(virConnectPtr conn,
         net->model && STREQ(net->model, "virtio"))
         vnet_hdr = 1;
 
+    setPortProfileId(net->data.direct.linkdev,
+                      net->data.direct.mode,
+                      net->data.direct.profileid,
+                      net->mac);
+
     rc = openMacvtapTap(net->ifname, net->mac, linkdev, brmode,
                         &res_ifname, vnet_hdr);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in