diff mbox series

[net-next,v9,3/4] virtio_net: Extend virtio to use VF datapath when available

Message ID 1524848820-42258-4-git-send-email-sridhar.samudrala@intel.com
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series Enable virtio_net to act as a standby for a passthru device | expand

Commit Message

Samudrala, Sridhar April 27, 2018, 5:06 p.m. UTC
This patch enables virtio_net to switch over to a VF datapath when a VF
netdev is present with the same MAC address. It allows live migration
of a VM with a direct attached VF without the need to setup a bond/team
between a VF and virtio net device in the guest.

The hypervisor needs to enable only one datapath at any time so that
packets don't get looped back to the VM over the other datapath. When a VF
is plugged, the virtio datapath link state can be marked as down. The
hypervisor needs to unplug the VF device from the guest on the source host
and reset the MAC filter of the VF to initiate failover of datapath to
virtio before starting the migration. After the migration is completed,
the destination hypervisor sets the MAC filter on the VF and plugs it back
to the guest to switch over to VF datapath.

It uses the generic failover framework that provides 2 functions to create
and destroy a master failover netdev. When STANDBY feature is enabled, an
additional netdev(failover netdev) is created that acts as a master device
and tracks the state of the 2 lower netdevs. The original virtio_net netdev
is marked as 'standby' netdev and a passthru device with the same MAC is
registered as 'primary' netdev.

This patch is based on the discussion initiated by Jesse on this thread.
https://marc.info/?l=linux-virtualization&m=151189725224231&w=2

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 drivers/net/Kconfig      |  1 +
 drivers/net/virtio_net.c | 35 ++++++++++++++++++++++++++++++++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

Comments

Jiri Pirko April 28, 2018, 8:24 a.m. UTC | #1
Fri, Apr 27, 2018 at 07:06:59PM CEST, sridhar.samudrala@intel.com wrote:
>This patch enables virtio_net to switch over to a VF datapath when a VF
>netdev is present with the same MAC address. It allows live migration
>of a VM with a direct attached VF without the need to setup a bond/team
>between a VF and virtio net device in the guest.
>
>The hypervisor needs to enable only one datapath at any time so that
>packets don't get looped back to the VM over the other datapath. When a VF

Why? Both datapaths could be enabled at a time. Why the loop on
hypervisor side would be a problem. This in not an issue for
bonding/team as well.


>is plugged, the virtio datapath link state can be marked as down. The
>hypervisor needs to unplug the VF device from the guest on the source host
>and reset the MAC filter of the VF to initiate failover of datapath to

"reset the MAC filter of the VF" - you mean "set the VF mac"?


>virtio before starting the migration. After the migration is completed,
>the destination hypervisor sets the MAC filter on the VF and plugs it back
>to the guest to switch over to VF datapath.
>
>It uses the generic failover framework that provides 2 functions to create
>and destroy a master failover netdev. When STANDBY feature is enabled, an
>additional netdev(failover netdev) is created that acts as a master device
>and tracks the state of the 2 lower netdevs. The original virtio_net netdev
>is marked as 'standby' netdev and a passthru device with the same MAC is
>registered as 'primary' netdev.
>
>This patch is based on the discussion initiated by Jesse on this thread.
>https://marc.info/?l=linux-virtualization&m=151189725224231&w=2

[...]
Jiri Pirko April 28, 2018, 9:42 a.m. UTC | #2
Fri, Apr 27, 2018 at 07:06:59PM CEST, sridhar.samudrala@intel.com wrote:
>This patch enables virtio_net to switch over to a VF datapath when a VF
>netdev is present with the same MAC address. It allows live migration
>of a VM with a direct attached VF without the need to setup a bond/team
>between a VF and virtio net device in the guest.
>
>The hypervisor needs to enable only one datapath at any time so that
>packets don't get looped back to the VM over the other datapath. When a VF
>is plugged, the virtio datapath link state can be marked as down. The
>hypervisor needs to unplug the VF device from the guest on the source host
>and reset the MAC filter of the VF to initiate failover of datapath to
>virtio before starting the migration. After the migration is completed,
>the destination hypervisor sets the MAC filter on the VF and plugs it back
>to the guest to switch over to VF datapath.
>
>It uses the generic failover framework that provides 2 functions to create
>and destroy a master failover netdev. When STANDBY feature is enabled, an
>additional netdev(failover netdev) is created that acts as a master device
>and tracks the state of the 2 lower netdevs. The original virtio_net netdev
>is marked as 'standby' netdev and a passthru device with the same MAC is
>registered as 'primary' netdev.
>
>This patch is based on the discussion initiated by Jesse on this thread.
>https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>

When I enabled the standby feature (hardcoded), I have 2 netdevices now:
4: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:feb2:a7f1/64 scope link 
       valid_lft forever preferred_lft forever
5: ens3n_sby: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:feb2:a7f1/64 scope link 
       valid_lft forever preferred_lft forever

However, it seems to confuse my initscripts on Fedora:
[root@test1 ~]# ifup ens3
./network-functions: line 78: [: /etc/dhcp/dhclient-ens3: binary operator expected
./network-functions: line 80: [: /etc/dhclient-ens3: binary operator expected
./network-functions: line 69: [: /var/lib/dhclient/dhclient-ens3: binary operator expected

Determining IP information for ens3
ens3n_sby...Cannot find device "ens3n_sby.pid"
Cannot find device "ens3n_sby.lease"
 failed.

I tried to change the standby device mac:
ip link set ens3n_sby addr 52:54:00:b2:a7:f2
[root@test1 ~]# ifup ens3

Determining IP information for ens3... done.
[root@test1 ~]#

But now the network does not work. I think that the mac change on
standby device should be probably refused, no?

When I change the mac back, all works fine.


Now I try to change mac of the failover master:
[root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
RTNETLINK answers: Operation not supported

That I did expect to work. I would expect this would change the mac of
the master and both standby and primary slaves.


Now I tried to add a primary pci device. I don't have any fancy VF on my
test setup, but I expected the good old 8139cp to work:
[root@test1 ~]# ethtool -i ens9
driver: 8139cp
....
[root@test1 ~]# ip link set ens9 addr 52:54:00:b2:a7:f1

I see no message in dmesg, so I guess the failover module did not
enslave this netdev. The mac change is not monitored. I would expect
that it is and whenever a device changes mac to the failover one, it
should be enslaved and whenever it changes mac back to something else,
it should be released - the primary one ofcourse.



[...]

>+static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>+				      size_t len)
>+{
>+	struct virtnet_info *vi = netdev_priv(dev);
>+	int ret;
>+
>+	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_STANDBY))
>+		return -EOPNOTSUPP;
>+
>+	ret = snprintf(buf, len, "_sby");

please avoid the "_".

[...]
Siwei Liu April 29, 2018, 8:56 a.m. UTC | #3
On Sat, Apr 28, 2018 at 2:42 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Fri, Apr 27, 2018 at 07:06:59PM CEST, sridhar.samudrala@intel.com wrote:
>>This patch enables virtio_net to switch over to a VF datapath when a VF
>>netdev is present with the same MAC address. It allows live migration
>>of a VM with a direct attached VF without the need to setup a bond/team
>>between a VF and virtio net device in the guest.
>>
>>The hypervisor needs to enable only one datapath at any time so that
>>packets don't get looped back to the VM over the other datapath. When a VF
>>is plugged, the virtio datapath link state can be marked as down. The
>>hypervisor needs to unplug the VF device from the guest on the source host
>>and reset the MAC filter of the VF to initiate failover of datapath to
>>virtio before starting the migration. After the migration is completed,
>>the destination hypervisor sets the MAC filter on the VF and plugs it back
>>to the guest to switch over to VF datapath.
>>
>>It uses the generic failover framework that provides 2 functions to create
>>and destroy a master failover netdev. When STANDBY feature is enabled, an
>>additional netdev(failover netdev) is created that acts as a master device
>>and tracks the state of the 2 lower netdevs. The original virtio_net netdev
>>is marked as 'standby' netdev and a passthru device with the same MAC is
>>registered as 'primary' netdev.
>>
>>This patch is based on the discussion initiated by Jesse on this thread.
>>https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>>
>
> When I enabled the standby feature (hardcoded), I have 2 netdevices now:
> 4: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
>     link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
>     inet6 fe80::5054:ff:feb2:a7f1/64 scope link
>        valid_lft forever preferred_lft forever
> 5: ens3n_sby: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
>     link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
>     inet6 fe80::5054:ff:feb2:a7f1/64 scope link
>        valid_lft forever preferred_lft forever
>
> However, it seems to confuse my initscripts on Fedora:
> [root@test1 ~]# ifup ens3
> ./network-functions: line 78: [: /etc/dhcp/dhclient-ens3: binary operator expected
> ./network-functions: line 80: [: /etc/dhclient-ens3: binary operator expected
> ./network-functions: line 69: [: /var/lib/dhclient/dhclient-ens3: binary operator expected
>
You should teach Fedora and all cloud vendors to upgrade their
initscripts and other userspace tools, no?

> Determining IP information for ens3
> ens3n_sby...Cannot find device "ens3n_sby.pid"
> Cannot find device "ens3n_sby.lease"
>  failed.
>
> I tried to change the standby device mac:
> ip link set ens3n_sby addr 52:54:00:b2:a7:f2
> [root@test1 ~]# ifup ens3
>
> Determining IP information for ens3... done.
> [root@test1 ~]#
>
> But now the network does not work. I think that the mac change on
> standby device should be probably refused, no?
>
> When I change the mac back, all works fine.
>
>
> Now I try to change mac of the failover master:
> [root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
> RTNETLINK answers: Operation not supported
>
> That I did expect to work. I would expect this would change the mac of
> the master and both standby and primary slaves.
>
>
> Now I tried to add a primary pci device. I don't have any fancy VF on my
> test setup, but I expected the good old 8139cp to work:
> [root@test1 ~]# ethtool -i ens9
> driver: 8139cp
> ....
> [root@test1 ~]# ip link set ens9 addr 52:54:00:b2:a7:f1
>
> I see no message in dmesg, so I guess the failover module did not
> enslave this netdev. The mac change is not monitored. I would expect
> that it is and whenever a device changes mac to the failover one, it
> should be enslaved and whenever it changes mac back to something else,
> it should be released - the primary one ofcourse.
>
>
>
> [...]
>
>>+static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>>+                                    size_t len)
>>+{
>>+      struct virtnet_info *vi = netdev_priv(dev);
>>+      int ret;
>>+
>>+      if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_STANDBY))
>>+              return -EOPNOTSUPP;
>>+
>>+      ret = snprintf(buf, len, "_sby");
>
> please avoid the "_".
>
> [...]
Jiri Pirko April 29, 2018, 1:45 p.m. UTC | #4
Sun, Apr 29, 2018 at 10:56:30AM CEST, loseweigh@gmail.com wrote:
>On Sat, Apr 28, 2018 at 2:42 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Fri, Apr 27, 2018 at 07:06:59PM CEST, sridhar.samudrala@intel.com wrote:
>>>This patch enables virtio_net to switch over to a VF datapath when a VF
>>>netdev is present with the same MAC address. It allows live migration
>>>of a VM with a direct attached VF without the need to setup a bond/team
>>>between a VF and virtio net device in the guest.
>>>
>>>The hypervisor needs to enable only one datapath at any time so that
>>>packets don't get looped back to the VM over the other datapath. When a VF
>>>is plugged, the virtio datapath link state can be marked as down. The
>>>hypervisor needs to unplug the VF device from the guest on the source host
>>>and reset the MAC filter of the VF to initiate failover of datapath to
>>>virtio before starting the migration. After the migration is completed,
>>>the destination hypervisor sets the MAC filter on the VF and plugs it back
>>>to the guest to switch over to VF datapath.
>>>
>>>It uses the generic failover framework that provides 2 functions to create
>>>and destroy a master failover netdev. When STANDBY feature is enabled, an
>>>additional netdev(failover netdev) is created that acts as a master device
>>>and tracks the state of the 2 lower netdevs. The original virtio_net netdev
>>>is marked as 'standby' netdev and a passthru device with the same MAC is
>>>registered as 'primary' netdev.
>>>
>>>This patch is based on the discussion initiated by Jesse on this thread.
>>>https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>>>
>>
>> When I enabled the standby feature (hardcoded), I have 2 netdevices now:
>> 4: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
>>     link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
>>     inet6 fe80::5054:ff:feb2:a7f1/64 scope link
>>        valid_lft forever preferred_lft forever
>> 5: ens3n_sby: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
>>     link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
>>     inet6 fe80::5054:ff:feb2:a7f1/64 scope link
>>        valid_lft forever preferred_lft forever
>>
>> However, it seems to confuse my initscripts on Fedora:
>> [root@test1 ~]# ifup ens3
>> ./network-functions: line 78: [: /etc/dhcp/dhclient-ens3: binary operator expected
>> ./network-functions: line 80: [: /etc/dhclient-ens3: binary operator expected
>> ./network-functions: line 69: [: /var/lib/dhclient/dhclient-ens3: binary operator expected
>>
>You should teach Fedora and all cloud vendors to upgrade their
>initscripts and other userspace tools, no?

I just wanted to point out that the conversion from "nostandby" to
"standby" isn't always that smooth as claimed. The claim was "no change
for the current user" iirc.
Samudrala, Sridhar April 30, 2018, 3 a.m. UTC | #5
On 4/28/2018 1:24 AM, Jiri Pirko wrote:
> Fri, Apr 27, 2018 at 07:06:59PM CEST, sridhar.samudrala@intel.com wrote:
>> This patch enables virtio_net to switch over to a VF datapath when a VF
>> netdev is present with the same MAC address. It allows live migration
>> of a VM with a direct attached VF without the need to setup a bond/team
>> between a VF and virtio net device in the guest.
>>
>> The hypervisor needs to enable only one datapath at any time so that
>> packets don't get looped back to the VM over the other datapath. When a VF
> Why? Both datapaths could be enabled at a time. Why the loop on
> hypervisor side would be a problem. This in not an issue for
> bonding/team as well.

Somehow the hypervisor needs to make sure that the broadcasts/multicasts from the VM
sent over the VF datapath don't get looped back to the VM via the virtio-net datapth.
This can happen if both datapaths are enabled at the same time.

I would think this is an issue even with bonding/team as well when virtio-net and
VF are backed by the same PF.


>
>
>> is plugged, the virtio datapath link state can be marked as down. The
>> hypervisor needs to unplug the VF device from the guest on the source host
>> and reset the MAC filter of the VF to initiate failover of datapath to
> "reset the MAC filter of the VF" - you mean "set the VF mac"?

Yes.  the PF should take away the MAC address assigned to the VF so that the PF
starts receiving those packets.

>
>
>> virtio before starting the migration. After the migration is completed,
>> the destination hypervisor sets the MAC filter on the VF and plugs it back
>> to the guest to switch over to VF datapath.
>>
>> It uses the generic failover framework that provides 2 functions to create
>> and destroy a master failover netdev. When STANDBY feature is enabled, an
>> additional netdev(failover netdev) is created that acts as a master device
>> and tracks the state of the 2 lower netdevs. The original virtio_net netdev
>> is marked as 'standby' netdev and a passthru device with the same MAC is
>> registered as 'primary' netdev.
>>
>> This patch is based on the discussion initiated by Jesse on this thread.
>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
> [...]
>
Jiri Pirko April 30, 2018, 7:12 a.m. UTC | #6
Mon, Apr 30, 2018 at 05:00:33AM CEST, sridhar.samudrala@intel.com wrote:
>
>On 4/28/2018 1:24 AM, Jiri Pirko wrote:
>> Fri, Apr 27, 2018 at 07:06:59PM CEST, sridhar.samudrala@intel.com wrote:
>> > This patch enables virtio_net to switch over to a VF datapath when a VF
>> > netdev is present with the same MAC address. It allows live migration
>> > of a VM with a direct attached VF without the need to setup a bond/team
>> > between a VF and virtio net device in the guest.
>> > 
>> > The hypervisor needs to enable only one datapath at any time so that
>> > packets don't get looped back to the VM over the other datapath. When a VF
>> Why? Both datapaths could be enabled at a time. Why the loop on
>> hypervisor side would be a problem. This in not an issue for
>> bonding/team as well.
>
>Somehow the hypervisor needs to make sure that the broadcasts/multicasts from the VM
>sent over the VF datapath don't get looped back to the VM via the virtio-net datapth.

Why? Please see below.


>This can happen if both datapaths are enabled at the same time.
>
>I would think this is an issue even with bonding/team as well when virtio-net and
>VF are backed by the same PF.
>
>

I believe that the scenario is the same as on an ordinary nic/swich
network:

...................

  host
 
       bond0
      /     \
    eth0   eth1
     |       |
...................
     |       |
     p1      p2

  switch

...................

It is perfectly valid to p1 and p2 be up and "bridged" together. Bond
has to cope with loop-backed frames. "Failover driver" should too,
it's the same scenario.


>> 
>> 
>> > is plugged, the virtio datapath link state can be marked as down. The
>> > hypervisor needs to unplug the VF device from the guest on the source host
>> > and reset the MAC filter of the VF to initiate failover of datapath to
>> "reset the MAC filter of the VF" - you mean "set the VF mac"?
>
>Yes.  the PF should take away the MAC address assigned to the VF so that the PF
>starts receiving those packets.

Okay, got it. Please put this in the description.


>
>> 
>> 
>> > virtio before starting the migration. After the migration is completed,
>> > the destination hypervisor sets the MAC filter on the VF and plugs it back
>> > to the guest to switch over to VF datapath.
>> > 
>> > It uses the generic failover framework that provides 2 functions to create
>> > and destroy a master failover netdev. When STANDBY feature is enabled, an
>> > additional netdev(failover netdev) is created that acts as a master device
>> > and tracks the state of the 2 lower netdevs. The original virtio_net netdev
>> > is marked as 'standby' netdev and a passthru device with the same MAC is
>> > registered as 'primary' netdev.
>> > 
>> > This patch is based on the discussion initiated by Jesse on this thread.
>> > https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>> [...]
>> 
>
Jiri Pirko April 30, 2018, 7:20 a.m. UTC | #7
Mon, Apr 30, 2018 at 06:16:58AM CEST, sridhar.samudrala@intel.com wrote:
>On 4/28/2018 2:42 AM, Jiri Pirko wrote:
>> Fri, Apr 27, 2018 at 07:06:59PM CEST,sridhar.samudrala@intel.com  wrote:
>> > This patch enables virtio_net to switch over to a VF datapath when a VF
>> > netdev is present with the same MAC address. It allows live migration
>> > of a VM with a direct attached VF without the need to setup a bond/team
>> > between a VF and virtio net device in the guest.
>> > 
>> > The hypervisor needs to enable only one datapath at any time so that
>> > packets don't get looped back to the VM over the other datapath. When a VF
>> > is plugged, the virtio datapath link state can be marked as down. The
>> > hypervisor needs to unplug the VF device from the guest on the source host
>> > and reset the MAC filter of the VF to initiate failover of datapath to
>> > virtio before starting the migration. After the migration is completed,
>> > the destination hypervisor sets the MAC filter on the VF and plugs it back
>> > to the guest to switch over to VF datapath.
>> > 
>> > It uses the generic failover framework that provides 2 functions to create
>> > and destroy a master failover netdev. When STANDBY feature is enabled, an
>> > additional netdev(failover netdev) is created that acts as a master device
>> > and tracks the state of the 2 lower netdevs. The original virtio_net netdev
>> > is marked as 'standby' netdev and a passthru device with the same MAC is
>> > registered as 'primary' netdev.
>> > 
>> > This patch is based on the discussion initiated by Jesse on this thread.
>> > https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>> > 
>> When I enabled the standby feature (hardcoded), I have 2 netdevices now:
>> 4: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
>>      link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
>>      inet6 fe80::5054:ff:feb2:a7f1/64 scope link
>>         valid_lft forever preferred_lft forever
>> 5: ens3n_sby: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
>>      link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
>>      inet6 fe80::5054:ff:feb2:a7f1/64 scope link
>>         valid_lft forever preferred_lft forever
>> 
>> However, it seems to confuse my initscripts on Fedora:
>> [root@test1 ~]# ifup ens3
>> ./network-functions: line 78: [: /etc/dhcp/dhclient-ens3: binary operator expected
>> ./network-functions: line 80: [: /etc/dhclient-ens3: binary operator expected
>> ./network-functions: line 69: [: /var/lib/dhclient/dhclient-ens3: binary operator expected
>> 
>> Determining IP information for ens3
>> ens3n_sby...Cannot find device "ens3n_sby.pid"
>> Cannot find device "ens3n_sby.lease"
>>   failed.
>> 
>> I tried to change the standby device mac:
>> ip link set ens3n_sby addr 52:54:00:b2:a7:f2
>> [root@test1 ~]# ifup ens3
>> 
>> Determining IP information for ens3... done.
>> [root@test1 ~]#
>> 
>> But now the network does not work. I think that the mac change on
>> standby device should be probably refused, no?
>
>Yes. we should block changing standby device mac.
>
>> When I change the mac back, all works fine.
>
>This is strange. So you had to change the standby device mac twice to
>get dhcp working.

Yes. First to some other mac to not to confuse initscripts, second back
to the "failover mac" in order to make frames go through (I'm guessing
that virtio_net also has mac filter for incoming frames).


>
>I do see NetworkManager trying to get dhcp address on standby device, but

Not using NM. Just good old Fedora initscripts.


>i don't see any issue with connectivity.  To be totally transparent, we
>need to only expose one netdev.

Yep.


>
>> 
>> Now I try to change mac of the failover master:
>> [root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
>> RTNETLINK answers: Operation not supported
>> 
>> That I did expect to work. I would expect this would change the mac of
>> the master and both standby and primary slaves.
>
>If a VF is untrusted, a VM will not able to change its MAC and moreover

Note that at this point, I have no VF. So I'm not sure why you mention
that.


>in this mode we are assuming that the hypervisor has assigned the MAC and
>guest is not expected to change the MAC.

Wait, for ordinary old-fashioned virtio_net, as a VM user, I can change
mac and all works fine. How is this different? Change mac on "failover
instance" should work and should propagate the mac down to its slaves.


>
>For the initial implementation, i would propose not allowing the guest to
>change the MAC of failover or standby dev.

I see no reason for such restriction.


>
>
>> 
>> Now I tried to add a primary pci device. I don't have any fancy VF on my
>> test setup, but I expected the good old 8139cp to work:
>> [root@test1 ~]# ethtool -i ens9
>> driver: 8139cp
>> ....
>> [root@test1 ~]# ip link set ens9 addr 52:54:00:b2:a7:f1
>> 
>> I see no message in dmesg, so I guess the failover module did not
>> enslave this netdev. The mac change is not monitored. I would expect
>> that it is and whenever a device changes mac to the failover one, it
>> should be enslaved and whenever it changes mac back to something else,
>> it should be released - the primary one ofcourse.
>
>Sure. that may be the best way to handle the guest changing the primary
>netdev's mac.

Yep.


>
>> 
>> 
>> 
>> [...]
>> 
>> > +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>> > +				      size_t len)
>> > +{
>> > +	struct virtnet_info *vi = netdev_priv(dev);
>> > +	int ret;
>> > +
>> > +	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_STANDBY))
>> > +		return -EOPNOTSUPP;
>> > +
>> > +	ret = snprintf(buf, len, "_sby");
>> please avoid the "_".
>> 
>> [...]
>
Samudrala, Sridhar April 30, 2018, 7:26 p.m. UTC | #8
On 4/30/2018 12:12 AM, Jiri Pirko wrote:
> Mon, Apr 30, 2018 at 05:00:33AM CEST, sridhar.samudrala@intel.com wrote:
>> On 4/28/2018 1:24 AM, Jiri Pirko wrote:
>>> Fri, Apr 27, 2018 at 07:06:59PM CEST, sridhar.samudrala@intel.com wrote:
>>>> This patch enables virtio_net to switch over to a VF datapath when a VF
>>>> netdev is present with the same MAC address. It allows live migration
>>>> of a VM with a direct attached VF without the need to setup a bond/team
>>>> between a VF and virtio net device in the guest.
>>>>
>>>> The hypervisor needs to enable only one datapath at any time so that
>>>> packets don't get looped back to the VM over the other datapath. When a VF
>>> Why? Both datapaths could be enabled at a time. Why the loop on
>>> hypervisor side would be a problem. This in not an issue for
>>> bonding/team as well.
>> Somehow the hypervisor needs to make sure that the broadcasts/multicasts from the VM
>> sent over the VF datapath don't get looped back to the VM via the virtio-net datapth.
> Why? Please see below.
>
>
>> This can happen if both datapaths are enabled at the same time.
>>
>> I would think this is an issue even with bonding/team as well when virtio-net and
>> VF are backed by the same PF.
>>
>>
> I believe that the scenario is the same as on an ordinary nic/swich
> network:
>
> ...................
>
>    host
>   
>         bond0
>        /     \
>      eth0   eth1
>       |       |
> ...................
>       |       |
>       p1      p2
>
>    switch
>
> ...................
>
> It is perfectly valid to p1 and p2 be up and "bridged" together. Bond
> has to cope with loop-backed frames. "Failover driver" should too,
> it's the same scenario.

OK. So looks like we should be able to handle this by returning RX_HANDLER_EXACT
for frames received on standby device when primary is present.
Jiri Pirko May 1, 2018, 7:33 a.m. UTC | #9
Mon, Apr 30, 2018 at 09:26:34PM CEST, sridhar.samudrala@intel.com wrote:
>On 4/30/2018 12:12 AM, Jiri Pirko wrote:
>> Mon, Apr 30, 2018 at 05:00:33AM CEST, sridhar.samudrala@intel.com wrote:
>> > On 4/28/2018 1:24 AM, Jiri Pirko wrote:
>> > > Fri, Apr 27, 2018 at 07:06:59PM CEST, sridhar.samudrala@intel.com wrote:
>> > > > This patch enables virtio_net to switch over to a VF datapath when a VF
>> > > > netdev is present with the same MAC address. It allows live migration
>> > > > of a VM with a direct attached VF without the need to setup a bond/team
>> > > > between a VF and virtio net device in the guest.
>> > > > 
>> > > > The hypervisor needs to enable only one datapath at any time so that
>> > > > packets don't get looped back to the VM over the other datapath. When a VF
>> > > Why? Both datapaths could be enabled at a time. Why the loop on
>> > > hypervisor side would be a problem. This in not an issue for
>> > > bonding/team as well.
>> > Somehow the hypervisor needs to make sure that the broadcasts/multicasts from the VM
>> > sent over the VF datapath don't get looped back to the VM via the virtio-net datapth.
>> Why? Please see below.
>> 
>> 
>> > This can happen if both datapaths are enabled at the same time.
>> > 
>> > I would think this is an issue even with bonding/team as well when virtio-net and
>> > VF are backed by the same PF.
>> > 
>> > 
>> I believe that the scenario is the same as on an ordinary nic/swich
>> network:
>> 
>> ...................
>> 
>>    host
>>         bond0
>>        /     \
>>      eth0   eth1
>>       |       |
>> ...................
>>       |       |
>>       p1      p2
>> 
>>    switch
>> 
>> ...................
>> 
>> It is perfectly valid to p1 and p2 be up and "bridged" together. Bond
>> has to cope with loop-backed frames. "Failover driver" should too,
>> it's the same scenario.
>
>OK. So looks like we should be able to handle this by returning RX_HANDLER_EXACT
>for frames received on standby device when primary is present.

Yep.
Samudrala, Sridhar May 2, 2018, 12:20 a.m. UTC | #10
On 4/30/2018 12:20 AM, Jiri Pirko wrote:
>
>>> Now I try to change mac of the failover master:
>>> [root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
>>> RTNETLINK answers: Operation not supported
>>>
>>> That I did expect to work. I would expect this would change the mac of
>>> the master and both standby and primary slaves.
>> If a VF is untrusted, a VM will not able to change its MAC and moreover
> Note that at this point, I have no VF. So I'm not sure why you mention
> that.
>
>
>> in this mode we are assuming that the hypervisor has assigned the MAC and
>> guest is not expected to change the MAC.
> Wait, for ordinary old-fashioned virtio_net, as a VM user, I can change
> mac and all works fine. How is this different? Change mac on "failover
> instance" should work and should propagate the mac down to its slaves.
>
>
>> For the initial implementation, i would propose not allowing the guest to
>> change the MAC of failover or standby dev.
> I see no reason for such restriction.
>

It is true that a VM user can change mac address of a normal virtio-net interface,
however when it is in STANDBY mode i think we should not allow this change specifically
because we are creating a failover instance based on a MAC that is assigned by the
hypervisor.

Moreover,  in a cloud environment i would think that PF/hypervisor assigns a MAC to
the VF and it cannot be changed by the guest.

So for the initial implementation, do you see any issues with having this restriction
in STANDBY mode.
Jiri Pirko May 2, 2018, 7:50 a.m. UTC | #11
Wed, May 02, 2018 at 02:20:26AM CEST, sridhar.samudrala@intel.com wrote:
>On 4/30/2018 12:20 AM, Jiri Pirko wrote:
>> 
>> > > Now I try to change mac of the failover master:
>> > > [root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
>> > > RTNETLINK answers: Operation not supported
>> > > 
>> > > That I did expect to work. I would expect this would change the mac of
>> > > the master and both standby and primary slaves.
>> > If a VF is untrusted, a VM will not able to change its MAC and moreover
>> Note that at this point, I have no VF. So I'm not sure why you mention
>> that.
>> 
>> 
>> > in this mode we are assuming that the hypervisor has assigned the MAC and
>> > guest is not expected to change the MAC.
>> Wait, for ordinary old-fashioned virtio_net, as a VM user, I can change
>> mac and all works fine. How is this different? Change mac on "failover
>> instance" should work and should propagate the mac down to its slaves.
>> 
>> 
>> > For the initial implementation, i would propose not allowing the guest to
>> > change the MAC of failover or standby dev.
>> I see no reason for such restriction.
>> 
>
>It is true that a VM user can change mac address of a normal virtio-net interface,
>however when it is in STANDBY mode i think we should not allow this change specifically
>because we are creating a failover instance based on a MAC that is assigned by the
>hypervisor.
>
>Moreover,  in a cloud environment i would think that PF/hypervisor assigns a MAC to
>the VF and it cannot be changed by the guest.

So that is easy. You allow the change of the mac and in the "failover"
mac change implementation you propagate the change down to slaves. If
one slave does not support the change, you bail out. And since VF does
not allow it as you say, once it will be enslaved, the mac change could
not be done. Seems like a correct behavior to me and is in-sync with how
bond/team behaves.


>
>So for the initial implementation, do you see any issues with having this restriction
>in STANDBY mode.
>
>
Samudrala, Sridhar May 2, 2018, 3:34 p.m. UTC | #12
On 5/2/2018 12:50 AM, Jiri Pirko wrote:
> Wed, May 02, 2018 at 02:20:26AM CEST, sridhar.samudrala@intel.com wrote:
>> On 4/30/2018 12:20 AM, Jiri Pirko wrote:
>>>>> Now I try to change mac of the failover master:
>>>>> [root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
>>>>> RTNETLINK answers: Operation not supported
>>>>>
>>>>> That I did expect to work. I would expect this would change the mac of
>>>>> the master and both standby and primary slaves.
>>>> If a VF is untrusted, a VM will not able to change its MAC and moreover
>>> Note that at this point, I have no VF. So I'm not sure why you mention
>>> that.
>>>
>>>
>>>> in this mode we are assuming that the hypervisor has assigned the MAC and
>>>> guest is not expected to change the MAC.
>>> Wait, for ordinary old-fashioned virtio_net, as a VM user, I can change
>>> mac and all works fine. How is this different? Change mac on "failover
>>> instance" should work and should propagate the mac down to its slaves.
>>>
>>>
>>>> For the initial implementation, i would propose not allowing the guest to
>>>> change the MAC of failover or standby dev.
>>> I see no reason for such restriction.
>>>
>> It is true that a VM user can change mac address of a normal virtio-net interface,
>> however when it is in STANDBY mode i think we should not allow this change specifically
>> because we are creating a failover instance based on a MAC that is assigned by the
>> hypervisor.
>>
>> Moreover,  in a cloud environment i would think that PF/hypervisor assigns a MAC to
>> the VF and it cannot be changed by the guest.
> So that is easy. You allow the change of the mac and in the "failover"
> mac change implementation you propagate the change down to slaves. If
> one slave does not support the change, you bail out. And since VF does
> not allow it as you say, once it will be enslaved, the mac change could
> not be done. Seems like a correct behavior to me and is in-sync with how
> bond/team behaves.

If we allow the mac to be changed when the VF is not yet plugged in
and the guest changes the mac, then VF cannot join the failover when
it is hot plugged with the original mac assigned by the hypervisor.
Specifically there could be timing window during the live migration where
VF would be unplugged at the source and if we allow the guest to change the
failover mac, then VF would not be able to register with the failover after
the VM is migrated to the destination.

>
>> So for the initial implementation, do you see any issues with having this restriction
>> in STANDBY mode.
>>
>>
Michael S. Tsirkin May 2, 2018, 3:47 p.m. UTC | #13
On Wed, May 02, 2018 at 09:50:21AM +0200, Jiri Pirko wrote:
> Wed, May 02, 2018 at 02:20:26AM CEST, sridhar.samudrala@intel.com wrote:
> >On 4/30/2018 12:20 AM, Jiri Pirko wrote:
> >> 
> >> > > Now I try to change mac of the failover master:
> >> > > [root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
> >> > > RTNETLINK answers: Operation not supported
> >> > > 
> >> > > That I did expect to work. I would expect this would change the mac of
> >> > > the master and both standby and primary slaves.
> >> > If a VF is untrusted, a VM will not able to change its MAC and moreover
> >> Note that at this point, I have no VF. So I'm not sure why you mention
> >> that.
> >> 
> >> 
> >> > in this mode we are assuming that the hypervisor has assigned the MAC and
> >> > guest is not expected to change the MAC.
> >> Wait, for ordinary old-fashioned virtio_net, as a VM user, I can change
> >> mac and all works fine. How is this different? Change mac on "failover
> >> instance" should work and should propagate the mac down to its slaves.
> >> 
> >> 
> >> > For the initial implementation, i would propose not allowing the guest to
> >> > change the MAC of failover or standby dev.
> >> I see no reason for such restriction.
> >> 
> >
> >It is true that a VM user can change mac address of a normal virtio-net interface,
> >however when it is in STANDBY mode i think we should not allow this change specifically
> >because we are creating a failover instance based on a MAC that is assigned by the
> >hypervisor.
> >
> >Moreover,  in a cloud environment i would think that PF/hypervisor assigns a MAC to
> >the VF and it cannot be changed by the guest.
> 
> So that is easy. You allow the change of the mac and in the "failover"
> mac change implementation you propagate the change down to slaves. If
> one slave does not support the change, you bail out. And since VF does

I wish people would say primary/standby and not "VF" :)

> not allow it as you say, once it will be enslaved, the mac change could
> not be done. Seems like a correct behavior to me


what if primary does not allow mac changes and is attached after
mac is changed on standy?


> and is in-sync with how
> bond/team behaves.

I think in the end virtio can just block MAC changes for simplicity.

I personally don't see softmac as a must have feature in v1,
we can add it later.

What's the situation with init scripts and whether it's
possible to make them work well would be a better question.

> 
> >
> >So for the initial implementation, do you see any issues with having this restriction
> >in STANDBY mode.
> >
> >
Jiri Pirko May 2, 2018, 4:04 p.m. UTC | #14
Wed, May 02, 2018 at 05:47:27PM CEST, mst@redhat.com wrote:
>On Wed, May 02, 2018 at 09:50:21AM +0200, Jiri Pirko wrote:
>> Wed, May 02, 2018 at 02:20:26AM CEST, sridhar.samudrala@intel.com wrote:
>> >On 4/30/2018 12:20 AM, Jiri Pirko wrote:
>> >> 
>> >> > > Now I try to change mac of the failover master:
>> >> > > [root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
>> >> > > RTNETLINK answers: Operation not supported
>> >> > > 
>> >> > > That I did expect to work. I would expect this would change the mac of
>> >> > > the master and both standby and primary slaves.
>> >> > If a VF is untrusted, a VM will not able to change its MAC and moreover
>> >> Note that at this point, I have no VF. So I'm not sure why you mention
>> >> that.
>> >> 
>> >> 
>> >> > in this mode we are assuming that the hypervisor has assigned the MAC and
>> >> > guest is not expected to change the MAC.
>> >> Wait, for ordinary old-fashioned virtio_net, as a VM user, I can change
>> >> mac and all works fine. How is this different? Change mac on "failover
>> >> instance" should work and should propagate the mac down to its slaves.
>> >> 
>> >> 
>> >> > For the initial implementation, i would propose not allowing the guest to
>> >> > change the MAC of failover or standby dev.
>> >> I see no reason for such restriction.
>> >> 
>> >
>> >It is true that a VM user can change mac address of a normal virtio-net interface,
>> >however when it is in STANDBY mode i think we should not allow this change specifically
>> >because we are creating a failover instance based on a MAC that is assigned by the
>> >hypervisor.
>> >
>> >Moreover,  in a cloud environment i would think that PF/hypervisor assigns a MAC to
>> >the VF and it cannot be changed by the guest.
>> 
>> So that is easy. You allow the change of the mac and in the "failover"
>> mac change implementation you propagate the change down to slaves. If
>> one slave does not support the change, you bail out. And since VF does
>
>I wish people would say primary/standby and not "VF" :)

Sure, sorry.


>
>> not allow it as you say, once it will be enslaved, the mac change could
>> not be done. Seems like a correct behavior to me
>
>
>what if primary does not allow mac changes and is attached after
>mac is changed on standy?

Mac should be changed on failover. In that case, the primary would have
a different mac and therefore it won't get enslaved.

>
>
>> and is in-sync with how
>> bond/team behaves.
>
>I think in the end virtio can just block MAC changes for simplicity.
>
>I personally don't see softmac as a must have feature in v1,
>we can add it later.

Okay.


>
>What's the situation with init scripts and whether it's
>possible to make them work well would be a better question.
>
>> 
>> >
>> >So for the initial implementation, do you see any issues with having this restriction
>> >in STANDBY mode.
>> >
>> >
Jiri Pirko May 2, 2018, 4:05 p.m. UTC | #15
Wed, May 02, 2018 at 05:34:44PM CEST, sridhar.samudrala@intel.com wrote:
>On 5/2/2018 12:50 AM, Jiri Pirko wrote:
>> Wed, May 02, 2018 at 02:20:26AM CEST, sridhar.samudrala@intel.com wrote:
>> > On 4/30/2018 12:20 AM, Jiri Pirko wrote:
>> > > > > Now I try to change mac of the failover master:
>> > > > > [root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
>> > > > > RTNETLINK answers: Operation not supported
>> > > > > 
>> > > > > That I did expect to work. I would expect this would change the mac of
>> > > > > the master and both standby and primary slaves.
>> > > > If a VF is untrusted, a VM will not able to change its MAC and moreover
>> > > Note that at this point, I have no VF. So I'm not sure why you mention
>> > > that.
>> > > 
>> > > 
>> > > > in this mode we are assuming that the hypervisor has assigned the MAC and
>> > > > guest is not expected to change the MAC.
>> > > Wait, for ordinary old-fashioned virtio_net, as a VM user, I can change
>> > > mac and all works fine. How is this different? Change mac on "failover
>> > > instance" should work and should propagate the mac down to its slaves.
>> > > 
>> > > 
>> > > > For the initial implementation, i would propose not allowing the guest to
>> > > > change the MAC of failover or standby dev.
>> > > I see no reason for such restriction.
>> > > 
>> > It is true that a VM user can change mac address of a normal virtio-net interface,
>> > however when it is in STANDBY mode i think we should not allow this change specifically
>> > because we are creating a failover instance based on a MAC that is assigned by the
>> > hypervisor.
>> > 
>> > Moreover,  in a cloud environment i would think that PF/hypervisor assigns a MAC to
>> > the VF and it cannot be changed by the guest.
>> So that is easy. You allow the change of the mac and in the "failover"
>> mac change implementation you propagate the change down to slaves. If
>> one slave does not support the change, you bail out. And since VF does
>> not allow it as you say, once it will be enslaved, the mac change could
>> not be done. Seems like a correct behavior to me and is in-sync with how
>> bond/team behaves.
>
>If we allow the mac to be changed when the VF is not yet plugged in
>and the guest changes the mac, then VF cannot join the failover when
>it is hot plugged with the original mac assigned by the hypervisor.
>Specifically there could be timing window during the live migration where
>VF would be unplugged at the source and if we allow the guest to change the
>failover mac, then VF would not be able to register with the failover after
>the VM is migrated to the destination.

Okay. So as suggested by mst, just forbid the mac changes all together.


>
>> 
>> > So for the initial implementation, do you see any issues with having this restriction
>> > in STANDBY mode.
>> > 
>> > 
>
diff mbox series

Patch

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 891846655000..c4995625d9b1 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -331,6 +331,7 @@  config VETH
 config VIRTIO_NET
 	tristate "Virtio network driver"
 	depends on VIRTIO
+	select NET_FAILOVER
 	---help---
 	  This is the virtual network driver for virtio.  It can be used with
 	  QEMU based VMMs (like KVM or Xen).  Say Y or M.
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 51a085b1a242..c326ee5344c0 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -30,8 +30,11 @@ 
 #include <linux/cpu.h>
 #include <linux/average.h>
 #include <linux/filter.h>
+#include <linux/netdevice.h>
+#include <linux/pci.h>
 #include <net/route.h>
 #include <net/xdp.h>
+#include <net/net_failover.h>
 
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
@@ -210,6 +213,9 @@  struct virtnet_info {
 	u32 speed;
 
 	unsigned long guest_offloads;
+
+	/* failover when STANDBY feature enabled */
+	struct net_failover *failover;
 };
 
 struct padded_vnet_hdr {
@@ -2306,6 +2312,22 @@  static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 	}
 }
 
+static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
+				      size_t len)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	int ret;
+
+	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_STANDBY))
+		return -EOPNOTSUPP;
+
+	ret = snprintf(buf, len, "_sby");
+	if (ret >= len)
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
 static const struct net_device_ops virtnet_netdev = {
 	.ndo_open            = virtnet_open,
 	.ndo_stop   	     = virtnet_close,
@@ -2323,6 +2345,7 @@  static const struct net_device_ops virtnet_netdev = {
 	.ndo_xdp_xmit		= virtnet_xdp_xmit,
 	.ndo_xdp_flush		= virtnet_xdp_flush,
 	.ndo_features_check	= passthru_features_check,
+	.ndo_get_phys_port_name	= virtnet_get_phys_port_name,
 };
 
 static void virtnet_config_changed_work(struct work_struct *work)
@@ -2876,10 +2899,16 @@  static int virtnet_probe(struct virtio_device *vdev)
 
 	virtnet_init_settings(dev);
 
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
+		err = net_failover_create(vi->dev, &vi->failover);
+		if (err)
+			goto free_vqs;
+	}
+
 	err = register_netdev(dev);
 	if (err) {
 		pr_debug("virtio_net: registering device failed\n");
-		goto free_vqs;
+		goto free_failover;
 	}
 
 	virtio_device_ready(vdev);
@@ -2916,6 +2945,8 @@  static int virtnet_probe(struct virtio_device *vdev)
 	vi->vdev->config->reset(vdev);
 
 	unregister_netdev(dev);
+free_failover:
+	net_failover_destroy(vi->failover);
 free_vqs:
 	cancel_delayed_work_sync(&vi->refill);
 	free_receive_page_frags(vi);
@@ -2950,6 +2981,8 @@  static void virtnet_remove(struct virtio_device *vdev)
 
 	unregister_netdev(vi->dev);
 
+	net_failover_destroy(vi->failover);
+
 	remove_vq_common(vi);
 
 	free_netdev(vi->dev);