mbox series

[net-next,0/2] Enable virtio to act as a master for a passthru device

Message ID 1514939738-22423-1-git-send-email-sridhar.samudrala@intel.com
Headers show
Series Enable virtio to act as a master for a passthru device | expand

Message

Samudrala, Sridhar Jan. 3, 2018, 12:35 a.m. UTC
This patch series enables virtio to switch over to a VF datapath when a VF
netdev is present with the same MAC address. It allows live migration of a VM
with a direct attached VF without the need to setup a bond/team between a
VF and virtio net device in the guest.

The hypervisor needs to unplug the VF device from the guest on the source
host and reset the MAC filter of the VF to initiate failover of datapath to
virtio before starting the migration. After the migration is completed, the
destination hypervisor sets the MAC filter on the VF and plugs it back to
the guest to switch over to VF datapath.

It is based on netvsc implementation and it may be possible to make this code 
generic and move it to a common location that can be shared by netvsc and virtio.

This patch series is based on the discussion initiated by Jesse on this thread.
https://marc.info/?l=linux-virtualization&m=151189725224231&w=2

Sridhar Samudrala (2):
  virtio_net: Introduce VIRTIO_NET_F_MASTER feature bit
  virtio_net: Extend virtio to use VF datapath when available

 drivers/net/virtio_net.c        | 308 +++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/virtio_net.h |   1 +
 2 files changed, 306 insertions(+), 3 deletions(-)

Comments

Jakub Kicinski Jan. 3, 2018, 2:16 a.m. UTC | #1
On Tue,  2 Jan 2018 16:35:36 -0800, Sridhar Samudrala wrote:
> This patch series enables virtio to switch over to a VF datapath when a VF
> netdev is present with the same MAC address. It allows live migration of a VM
> with a direct attached VF without the need to setup a bond/team between a
> VF and virtio net device in the guest.
> 
> The hypervisor needs to unplug the VF device from the guest on the source
> host and reset the MAC filter of the VF to initiate failover of datapath to
> virtio before starting the migration. After the migration is completed, the
> destination hypervisor sets the MAC filter on the VF and plugs it back to
> the guest to switch over to VF datapath.
> 
> It is based on netvsc implementation and it may be possible to make this code 
> generic and move it to a common location that can be shared by netvsc and virtio.
> 
> This patch series is based on the discussion initiated by Jesse on this thread.
> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2

How does the notion of a device which is both a bond and a leg of a
bond fit with Alex's recent discussions about feature propagation?
Which propagation rules will apply to VirtIO master?  Meaning of the
flags on a software upper device may be different.  Why muddy the
architecture like this and not introduce a synthetic bond device?
Alexander Duyck Jan. 3, 2018, 4:59 p.m. UTC | #2
On Tue, Jan 2, 2018 at 6:16 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
> On Tue,  2 Jan 2018 16:35:36 -0800, Sridhar Samudrala wrote:
>> This patch series enables virtio to switch over to a VF datapath when a VF
>> netdev is present with the same MAC address. It allows live migration of a VM
>> with a direct attached VF without the need to setup a bond/team between a
>> VF and virtio net device in the guest.
>>
>> The hypervisor needs to unplug the VF device from the guest on the source
>> host and reset the MAC filter of the VF to initiate failover of datapath to
>> virtio before starting the migration. After the migration is completed, the
>> destination hypervisor sets the MAC filter on the VF and plugs it back to
>> the guest to switch over to VF datapath.
>>
>> It is based on netvsc implementation and it may be possible to make this code
>> generic and move it to a common location that can be shared by netvsc and virtio.
>>
>> This patch series is based on the discussion initiated by Jesse on this thread.
>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>
> How does the notion of a device which is both a bond and a leg of a
> bond fit with Alex's recent discussions about feature propagation?
> Which propagation rules will apply to VirtIO master?  Meaning of the
> flags on a software upper device may be different.  Why muddy the
> architecture like this and not introduce a synthetic bond device?

It doesn't really fit with the notion I had. I think there may have
been a bit of a disconnect as I have been out for the last week or so
for the holidays.

My thought on this was that the feature bit should be spawning a new
para-virtual bond device and that bond should have the virto and the
VF as slaves. Also I thought there was some discussion about trying to
reuse as much of the netvsc code as possible for this so that we could
avoid duplication of effort and have the two drivers use the same
approach. It seems like it should be pretty straight forward since you
would have the feature bit in the case of virto, and netvsc just does
this sort of thing by default if I am not mistaken.

- Alex
Samudrala, Sridhar Jan. 3, 2018, 6:14 p.m. UTC | #3
On 1/3/2018 8:59 AM, Alexander Duyck wrote:
> On Tue, Jan 2, 2018 at 6:16 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>> On Tue,  2 Jan 2018 16:35:36 -0800, Sridhar Samudrala wrote:
>>> This patch series enables virtio to switch over to a VF datapath when a VF
>>> netdev is present with the same MAC address. It allows live migration of a VM
>>> with a direct attached VF without the need to setup a bond/team between a
>>> VF and virtio net device in the guest.
>>>
>>> The hypervisor needs to unplug the VF device from the guest on the source
>>> host and reset the MAC filter of the VF to initiate failover of datapath to
>>> virtio before starting the migration. After the migration is completed, the
>>> destination hypervisor sets the MAC filter on the VF and plugs it back to
>>> the guest to switch over to VF datapath.
>>>
>>> It is based on netvsc implementation and it may be possible to make this code
>>> generic and move it to a common location that can be shared by netvsc and virtio.
>>>
>>> This patch series is based on the discussion initiated by Jesse on this thread.
>>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>> How does the notion of a device which is both a bond and a leg of a
>> bond fit with Alex's recent discussions about feature propagation?
>> Which propagation rules will apply to VirtIO master?  Meaning of the
>> flags on a software upper device may be different.  Why muddy the
>> architecture like this and not introduce a synthetic bond device?
> It doesn't really fit with the notion I had. I think there may have
> been a bit of a disconnect as I have been out for the last week or so
> for the holidays.
>
> My thought on this was that the feature bit should be spawning a new
> para-virtual bond device and that bond should have the virto and the
> VF as slaves. Also I thought there was some discussion about trying to
> reuse as much of the netvsc code as possible for this so that we could
> avoid duplication of effort and have the two drivers use the same
> approach. It seems like it should be pretty straight forward since you
> would have the feature bit in the case of virto, and netvsc just does
> this sort of thing by default if I am not mistaken.
This patch is mostly based on netvsc implementation. The only change is 
avoiding the
explicit dev_open() call of the VF netdev after a delay. I am assuming 
that the guest userspace
will bring up the VF netdev and the hypervisor will update the MAC 
filters to switch to
the right data path.
We could commonize the code and make it shared between netvsc and 
virtio. Do we want
to do this right away or later? If so, what would be a good location for 
these shared functions?
Is it net/core/dev.c?

Also, if we want to go with a solution that creates a bond device, do we 
want virtio_net/netvsc
drivers to create a upper device?  Such a solution is already possible 
via config scripts that can
create a bond with virtio and a VF net device as slaves.  netvsc and 
this patch series is trying to
make it as simple as possible for the VM to use directly attached 
devices and support live migration
by switching to virtio datapath as a backup during the migration process 
when the VF device
is unplugged.

Thanks
Sridhar
Alexander Duyck Jan. 3, 2018, 6:28 p.m. UTC | #4
On Wed, Jan 3, 2018 at 10:14 AM, Samudrala, Sridhar
<sridhar.samudrala@intel.com> wrote:
>
>
> On 1/3/2018 8:59 AM, Alexander Duyck wrote:
>>
>> On Tue, Jan 2, 2018 at 6:16 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>>>
>>> On Tue,  2 Jan 2018 16:35:36 -0800, Sridhar Samudrala wrote:
>>>>
>>>> This patch series enables virtio to switch over to a VF datapath when a
>>>> VF
>>>> netdev is present with the same MAC address. It allows live migration of
>>>> a VM
>>>> with a direct attached VF without the need to setup a bond/team between
>>>> a
>>>> VF and virtio net device in the guest.
>>>>
>>>> The hypervisor needs to unplug the VF device from the guest on the
>>>> source
>>>> host and reset the MAC filter of the VF to initiate failover of datapath
>>>> to
>>>> virtio before starting the migration. After the migration is completed,
>>>> the
>>>> destination hypervisor sets the MAC filter on the VF and plugs it back
>>>> to
>>>> the guest to switch over to VF datapath.
>>>>
>>>> It is based on netvsc implementation and it may be possible to make this
>>>> code
>>>> generic and move it to a common location that can be shared by netvsc
>>>> and virtio.
>>>>
>>>> This patch series is based on the discussion initiated by Jesse on this
>>>> thread.
>>>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>>>
>>> How does the notion of a device which is both a bond and a leg of a
>>> bond fit with Alex's recent discussions about feature propagation?
>>> Which propagation rules will apply to VirtIO master?  Meaning of the
>>> flags on a software upper device may be different.  Why muddy the
>>> architecture like this and not introduce a synthetic bond device?
>>
>> It doesn't really fit with the notion I had. I think there may have
>> been a bit of a disconnect as I have been out for the last week or so
>> for the holidays.
>>
>> My thought on this was that the feature bit should be spawning a new
>> para-virtual bond device and that bond should have the virto and the
>> VF as slaves. Also I thought there was some discussion about trying to
>> reuse as much of the netvsc code as possible for this so that we could
>> avoid duplication of effort and have the two drivers use the same
>> approach. It seems like it should be pretty straight forward since you
>> would have the feature bit in the case of virto, and netvsc just does
>> this sort of thing by default if I am not mistaken.
>
> This patch is mostly based on netvsc implementation. The only change is
> avoiding the
> explicit dev_open() call of the VF netdev after a delay. I am assuming that
> the guest userspace
> will bring up the VF netdev and the hypervisor will update the MAC filters
> to switch to
> the right data path.
> We could commonize the code and make it shared between netvsc and virtio. Do
> we want
> to do this right away or later? If so, what would be a good location for
> these shared functions?
> Is it net/core/dev.c?

No, I would think about starting a new driver file in "/drivers/net/".
The idea is this driver would be utilized to create a bond
automatically and set the appropriate registration hooks. If nothing
else you could probably just call it something generic like virt-bond
or vbond or whatever.

> Also, if we want to go with a solution that creates a bond device, do we
> want virtio_net/netvsc
> drivers to create a upper device?  Such a solution is already possible via
> config scripts that can
> create a bond with virtio and a VF net device as slaves.  netvsc and this
> patch series is trying to
> make it as simple as possible for the VM to use directly attached devices
> and support live migration
> by switching to virtio datapath as a backup during the migration process
> when the VF device
> is unplugged.

We all understand that. But you are making the solution very virtio
specific. We want to see this be usable for other interfaces such as
netsc and whatever other virtual interfaces are floating around out
there.

Also I haven't seen us address what happens as far as how we will
handle this on the host. My thought was we should have a paired
interface. Something like veth, but made up of a bond on each end. So
in the host we should have one bond that has a tap/vhost interface and
a VF port representor, and on the other we would be looking at the
virtio interface and the VF. Attaching the tap/vhost to the bond could
be a way of triggering the feature bit to be set in the virtio. That
way communication between the guest and the host won't get too
confusing as you will see all traffic from the bonded MAC address
always show up on the host side bond instead of potentially showing up
on two unrelated interfaces. It would also make for a good way to
resolve the east/west traffic problem on hosts since you could just
send the broadcast/multicast traffic via the tap/vhost/virtio channel
instead of having to send it back through the port representor and eat
up all that PCIe bus traffic.
Samudrala, Sridhar Jan. 4, 2018, 12:22 a.m. UTC | #5
On 1/3/2018 10:28 AM, Alexander Duyck wrote:
> On Wed, Jan 3, 2018 at 10:14 AM, Samudrala, Sridhar
> <sridhar.samudrala@intel.com> wrote:
>>
>> On 1/3/2018 8:59 AM, Alexander Duyck wrote:
>>> On Tue, Jan 2, 2018 at 6:16 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>>>> On Tue,  2 Jan 2018 16:35:36 -0800, Sridhar Samudrala wrote:
>>>>> This patch series enables virtio to switch over to a VF datapath when a
>>>>> VF
>>>>> netdev is present with the same MAC address. It allows live migration of
>>>>> a VM
>>>>> with a direct attached VF without the need to setup a bond/team between
>>>>> a
>>>>> VF and virtio net device in the guest.
>>>>>
>>>>> The hypervisor needs to unplug the VF device from the guest on the
>>>>> source
>>>>> host and reset the MAC filter of the VF to initiate failover of datapath
>>>>> to
>>>>> virtio before starting the migration. After the migration is completed,
>>>>> the
>>>>> destination hypervisor sets the MAC filter on the VF and plugs it back
>>>>> to
>>>>> the guest to switch over to VF datapath.
>>>>>
>>>>> It is based on netvsc implementation and it may be possible to make this
>>>>> code
>>>>> generic and move it to a common location that can be shared by netvsc
>>>>> and virtio.
>>>>>
>>>>> This patch series is based on the discussion initiated by Jesse on this
>>>>> thread.
>>>>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>>>> How does the notion of a device which is both a bond and a leg of a
>>>> bond fit with Alex's recent discussions about feature propagation?
>>>> Which propagation rules will apply to VirtIO master?  Meaning of the
>>>> flags on a software upper device may be different.  Why muddy the
>>>> architecture like this and not introduce a synthetic bond device?
>>> It doesn't really fit with the notion I had. I think there may have
>>> been a bit of a disconnect as I have been out for the last week or so
>>> for the holidays.
>>>
>>> My thought on this was that the feature bit should be spawning a new
>>> para-virtual bond device and that bond should have the virto and the
>>> VF as slaves. Also I thought there was some discussion about trying to
>>> reuse as much of the netvsc code as possible for this so that we could
>>> avoid duplication of effort and have the two drivers use the same
>>> approach. It seems like it should be pretty straight forward since you
>>> would have the feature bit in the case of virto, and netvsc just does
>>> this sort of thing by default if I am not mistaken.
>> This patch is mostly based on netvsc implementation. The only change is
>> avoiding the
>> explicit dev_open() call of the VF netdev after a delay. I am assuming that
>> the guest userspace
>> will bring up the VF netdev and the hypervisor will update the MAC filters
>> to switch to
>> the right data path.
>> We could commonize the code and make it shared between netvsc and virtio. Do
>> we want
>> to do this right away or later? If so, what would be a good location for
>> these shared functions?
>> Is it net/core/dev.c?
> No, I would think about starting a new driver file in "/drivers/net/".
> The idea is this driver would be utilized to create a bond
> automatically and set the appropriate registration hooks. If nothing
> else you could probably just call it something generic like virt-bond
> or vbond or whatever.

We are trying to avoid creating another driver or a device.  Can we look 
into
consolidation of the 2 implementations(virtio & netvsc) as a later patch?
>
>> Also, if we want to go with a solution that creates a bond device, do we
>> want virtio_net/netvsc
>> drivers to create a upper device?  Such a solution is already possible via
>> config scripts that can
>> create a bond with virtio and a VF net device as slaves.  netvsc and this
>> patch series is trying to
>> make it as simple as possible for the VM to use directly attached devices
>> and support live migration
>> by switching to virtio datapath as a backup during the migration process
>> when the VF device
>> is unplugged.
> We all understand that. But you are making the solution very virtio
> specific. We want to see this be usable for other interfaces such as
> netsc and whatever other virtual interfaces are floating around out
> there.
>
> Also I haven't seen us address what happens as far as how we will
> handle this on the host. My thought was we should have a paired
> interface. Something like veth, but made up of a bond on each end. So
> in the host we should have one bond that has a tap/vhost interface and
> a VF port representor, and on the other we would be looking at the
> virtio interface and the VF. Attaching the tap/vhost to the bond could
> be a way of triggering the feature bit to be set in the virtio. That
> way communication between the guest and the host won't get too
> confusing as you will see all traffic from the bonded MAC address
> always show up on the host side bond instead of potentially showing up
> on two unrelated interfaces. It would also make for a good way to
> resolve the east/west traffic problem on hosts since you could just
> send the broadcast/multicast traffic via the tap/vhost/virtio channel
> instead of having to send it back through the port representor and eat
> up all that PCIe bus traffic.
 From the host point of view, here is a simple script that needs to be 
run to do the
live migration. We don't need any bond configuration on the host.

virsh detach-interface $DOMAIN hostdev --mac $MAC
ip link set $PF vf $VF_NUM mac $ZERO_MAC

virsh migrate --live $DOMAIN qemu+ssh://$REMOTE_HOST/system

ssh $REMOTE_HOST ip link set $PF vf $VF_NUM mac $MAC
ssh $REMOTE_HOST virsh attach-interface $DOMAIN hostdev $REMOTE_HOSTDEV 
--mac $MAC
Siwei Liu Jan. 22, 2018, 7 p.m. UTC | #6
Apologies I didn't notice that the discussion was mistakenly taken
offline. Post it back.

-Siwei

On Sat, Jan 13, 2018 at 7:25 AM, Siwei Liu <loseweigh@gmail.com> wrote:
> On Thu, Jan 11, 2018 at 12:32 PM, Samudrala, Sridhar
> <sridhar.samudrala@intel.com> wrote:
>> On 1/8/2018 9:22 AM, Siwei Liu wrote:
>>>
>>> On Sat, Jan 6, 2018 at 2:33 AM, Samudrala, Sridhar
>>> <sridhar.samudrala@intel.com> wrote:
>>>>
>>>> On 1/5/2018 9:07 AM, Siwei Liu wrote:
>>>>>
>>>>> On Thu, Jan 4, 2018 at 8:22 AM, Samudrala, Sridhar
>>>>> <sridhar.samudrala@intel.com> wrote:
>>>>>>
>>>>>> On 1/3/2018 10:28 AM, Alexander Duyck wrote:
>>>>>>>
>>>>>>> On Wed, Jan 3, 2018 at 10:14 AM, Samudrala, Sridhar
>>>>>>> <sridhar.samudrala@intel.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/3/2018 8:59 AM, Alexander Duyck wrote:
>>>>>>>>>
>>>>>>>>> On Tue, Jan 2, 2018 at 6:16 PM, Jakub Kicinski <kubakici@wp.pl>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On Tue,  2 Jan 2018 16:35:36 -0800, Sridhar Samudrala wrote:
>>>>>>>>>>>
>>>>>>>>>>> This patch series enables virtio to switch over to a VF datapath
>>>>>>>>>>> when
>>>>>>>>>>> a
>>>>>>>>>>> VF
>>>>>>>>>>> netdev is present with the same MAC address. It allows live
>>>>>>>>>>> migration
>>>>>>>>>>> of
>>>>>>>>>>> a VM
>>>>>>>>>>> with a direct attached VF without the need to setup a bond/team
>>>>>>>>>>> between
>>>>>>>>>>> a
>>>>>>>>>>> VF and virtio net device in the guest.
>>>>>>>>>>>
>>>>>>>>>>> The hypervisor needs to unplug the VF device from the guest on the
>>>>>>>>>>> source
>>>>>>>>>>> host and reset the MAC filter of the VF to initiate failover of
>>>>>>>>>>> datapath
>>>>>>>>>>> to
>>>>>>>>>>> virtio before starting the migration. After the migration is
>>>>>>>>>>> completed,
>>>>>>>>>>> the
>>>>>>>>>>> destination hypervisor sets the MAC filter on the VF and plugs it
>>>>>>>>>>> back
>>>>>>>>>>> to
>>>>>>>>>>> the guest to switch over to VF datapath.
>>>>>>>>>>>
>>>>>>>>>>> It is based on netvsc implementation and it may be possible to
>>>>>>>>>>> make
>>>>>>>>>>> this
>>>>>>>>>>> code
>>>>>>>>>>> generic and move it to a common location that can be shared by
>>>>>>>>>>> netvsc
>>>>>>>>>>> and virtio.
>>>>>>>>>>>
>>>>>>>>>>> This patch series is based on the discussion initiated by Jesse on
>>>>>>>>>>> this
>>>>>>>>>>> thread.
>>>>>>>>>>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>>>>>>>>>>
>>>>>>>>>> How does the notion of a device which is both a bond and a leg of a
>>>>>>>>>> bond fit with Alex's recent discussions about feature propagation?
>>>>>>>>>> Which propagation rules will apply to VirtIO master?  Meaning of
>>>>>>>>>> the
>>>>>>>>>> flags on a software upper device may be different.  Why muddy the
>>>>>>>>>> architecture like this and not introduce a synthetic bond device?
>>>>>>>>>
>>>>>>>>> It doesn't really fit with the notion I had. I think there may have
>>>>>>>>> been a bit of a disconnect as I have been out for the last week or
>>>>>>>>> so
>>>>>>>>> for the holidays.
>>>>>>>>>
>>>>>>>>> My thought on this was that the feature bit should be spawning a new
>>>>>>>>> para-virtual bond device and that bond should have the virto and the
>>>>>>>>> VF as slaves. Also I thought there was some discussion about trying
>>>>>>>>> to
>>>>>>>>> reuse as much of the netvsc code as possible for this so that we
>>>>>>>>> could
>>>>>>>>> avoid duplication of effort and have the two drivers use the same
>>>>>>>>> approach. It seems like it should be pretty straight forward since
>>>>>>>>> you
>>>>>>>>> would have the feature bit in the case of virto, and netvsc just
>>>>>>>>> does
>>>>>>>>> this sort of thing by default if I am not mistaken.
>>>>>>>>
>>>>>>>> This patch is mostly based on netvsc implementation. The only change
>>>>>>>> is
>>>>>>>> avoiding the
>>>>>>>> explicit dev_open() call of the VF netdev after a delay. I am
>>>>>>>> assuming
>>>>>>>> that
>>>>>>>> the guest userspace
>>>>>>>> will bring up the VF netdev and the hypervisor will update the MAC
>>>>>>>> filters
>>>>>>>> to switch to
>>>>>>>> the right data path.
>>>>>>>> We could commonize the code and make it shared between netvsc and
>>>>>>>> virtio.
>>>>>>>> Do
>>>>>>>> we want
>>>>>>>> to do this right away or later? If so, what would be a good location
>>>>>>>> for
>>>>>>>> these shared functions?
>>>>>>>> Is it net/core/dev.c?
>>>>>>>
>>>>>>> No, I would think about starting a new driver file in "/drivers/net/".
>>>>>>> The idea is this driver would be utilized to create a bond
>>>>>>> automatically and set the appropriate registration hooks. If nothing
>>>>>>> else you could probably just call it something generic like virt-bond
>>>>>>> or vbond or whatever.
>>>>>>
>>>>>>
>>>>>> We are trying to avoid creating another driver or a device.  Can we
>>>>>> look
>>>>>> into
>>>>>> consolidation of the 2 implementations(virtio & netvsc) as a later
>>>>>> patch?
>>>>>>
>>>>>>>> Also, if we want to go with a solution that creates a bond device, do
>>>>>>>> we
>>>>>>>> want virtio_net/netvsc
>>>>>>>> drivers to create a upper device?  Such a solution is already
>>>>>>>> possible
>>>>>>>> via
>>>>>>>> config scripts that can
>>>>>>>> create a bond with virtio and a VF net device as slaves.  netvsc and
>>>>>>>> this
>>>>>>>> patch series is trying to
>>>>>>>> make it as simple as possible for the VM to use directly attached
>>>>>>>> devices
>>>>>>>> and support live migration
>>>>>>>> by switching to virtio datapath as a backup during the migration
>>>>>>>> process
>>>>>>>> when the VF device
>>>>>>>> is unplugged.
>>>>>>>
>>>>>>> We all understand that. But you are making the solution very virtio
>>>>>>> specific. We want to see this be usable for other interfaces such as
>>>>>>> netsc and whatever other virtual interfaces are floating around out
>>>>>>> there.
>>>>>>>
>>>>>>> Also I haven't seen us address what happens as far as how we will
>>>>>>> handle this on the host. My thought was we should have a paired
>>>>>>> interface. Something like veth, but made up of a bond on each end. So
>>>>>>> in the host we should have one bond that has a tap/vhost interface and
>>>>>>> a VF port representor, and on the other we would be looking at the
>>>>>>> virtio interface and the VF. Attaching the tap/vhost to the bond could
>>>>>>> be a way of triggering the feature bit to be set in the virtio. That
>>>>>>> way communication between the guest and the host won't get too
>>>>>>> confusing as you will see all traffic from the bonded MAC address
>>>>>>> always show up on the host side bond instead of potentially showing up
>>>>>>> on two unrelated interfaces. It would also make for a good way to
>>>>>>> resolve the east/west traffic problem on hosts since you could just
>>>>>>> send the broadcast/multicast traffic via the tap/vhost/virtio channel
>>>>>>> instead of having to send it back through the port representor and eat
>>>>>>> up all that PCIe bus traffic.
>>>>>>
>>>>>>   From the host point of view, here is a simple script that needs to be
>>>>>> run to
>>>>>> do the
>>>>>> live migration. We don't need any bond configuration on the host.
>>>>>>
>>>>>> virsh detach-interface $DOMAIN hostdev --mac $MAC
>>>>>> ip link set $PF vf $VF_NUM mac $ZERO_MAC
>>>>>
>>>>> I'm not sure I understand how this script may work with regard to
>>>>> "live" migration.
>>>>>
>>>>> I'm confused, this script seems to require virtio-net to be configured
>>>>> on top of a different PF than where the migrating VF is seated. Or
>>>>> else, how does identical MAC address filter get programmed to one PF
>>>>> with two (or more) child virtual interfaces (e.g. one macvtap for
>>>>> virtio-net plus one VF)? The coincidence of it being able to work on
>>>>> the NIC of one/some vendor(s) does not apply to the others AFAIK.
>>>>>
>>>>> If you're planning to use a different PF, I don't see how gratuitous
>>>>> ARP announcements are generated to make this a "live" migration.
>>>>
>>>>
>>>> I am not using a different PF.  virtio is backed by a tap/bridge with PF
>>>> attached
>>>> to that bridge.  When we reset VF MAC after it is unplugged, all the
>>>> packets
>>>> for
>>>> the guest MAC will go to PF and reach virtio via the bridge.
>>>>
>>> That is the limitation of this scheme: it only works for virtio backed
>>> by tap/bridge, rather than backed by macvtap on top of the
>>> corresponding *PF*. Nowadays more datacenter users prefer macvtap as
>>> opposed to bridge, simply because of better isolation and performance
>>> (e.g. host stack consumption on NIC promiscuity processing are not
>>> scalable for bridges). Additionally, the ongoing virtio receive
>>> zero-copy work will be tightly integrated with macvtap, the
>>> performance optimization of which is apparently difficult (if
>>> technically possible at all) to be done on bridge. Why do we limit the
>>> host backend support to only bridge at this point?
>>
>>
>> No. This should work with virtio backed by macvtap over PF too.
>>
>>>
>>>> If we want to use virtio backed by macvtap on top of another VF as the
>>>> backup
>>>> channel, and we could set the guest MAC to that VF after unplugging the
>>>> directly
>>>> attached VF.
>>>
>>> I meant macvtap on the regarding PF instead of another VF. You know,
>>> users shouldn't have to change guest MAC back and forth. Live
>>> migration shouldn't involve any form of user intervention IMHO.
>>
>> Yes. macvtap on top of PF should work too. Hypervisor doesn't need to change
>> the guest MAC.  The PF driver needs to program the HW MAC filters so that
>> the
>> frames reach PF when VF is unplugged.
>
> So the HW MAC filter is deferred to get programmed for virtio only
> until VF is unplugged, correct? This is not the regular plumbing order
> for macvtap. Unless I miss something obvious, how does this get
> reflected in the script below?
>
>  virsh detach-interface $DOMAIN hostdev --mac $MAC
>  ip link set $PF vf $VF_NUM mac $ZERO_MAC
>
> i.e. commands above won't automatically trigger the programming of MAC
> filters for virtio.
>
> If you program two identical MAC address filters for both VF and
> virito at the same point, I'm sure if won't work at all. It does not
> sound clear to me how you propose to make it work if you don't plan
> to change the plumbing order?
>
>>
>>
>>>
>>>>>> virsh migrate --live $DOMAIN qemu+ssh://$REMOTE_HOST/system
>>>>>>
>>>>>> ssh $REMOTE_HOST ip link set $PF vf $VF_NUM mac $MAC
>>>>>> ssh $REMOTE_HOST virsh attach-interface $DOMAIN hostdev $REMOTE_HOSTDEV
>>>>>> --mac $MAC
>>>>>
>>>>> How do you keep guest side VF configurations e.g. MTU and VLAN filters
>>>>> around across the migration? More broadly, how do you make sure the
>>>>> new VF still as performant as previously done such that all hardware
>>>>> ring tunings and offload settings can be kept as much as it can be?
>>>>> I'm afraid this simple script won't work for those real-world
>>>>> scenarios.
>>>>
>>>>
>>>>> I would agree with Alex that we'll soon need a host-side stub/entity
>>>>> with cached guest configurations that may make VF switching
>>>>> straightforward and transparent.
>>>>
>>>> The script is only adding MAC filter to the VF on the destination. If the
>>>> source host has
>>>> done any additional tunings on the VF they need to be done on the
>>>> destination host too.
>>>
>>> I was mainly saying the VF's run-time configuration in the guest more
>>> than those to be configured from the host side. Let's say guest admin
>>> had changed the VF's MTU value, the default of which is 1500, to 9000
>>> before the migration. How do you save and restore the old running
>>> config for the VF across the migration?
>>
>> Such optimizations should be possible on top of this patch.  We need to sync
>> up
>> any changes/updates to VF configuration/features with virtio.
>
> This is possible but not the ideal way to build it. Virtio perhaps
> would not be the best  place to stack this (VF specifics for live
> migration) up further. We need a new driver and do it right from the
> very beginning.
>
> Thanks,
> -Siwei
>
>>
>>>
>>>> It is also possible that the VF on the destination is based on a totally
>>>> different NIC which
>>>> may be more or less performant. Or the destination may not even support a
>>>> VF
>>>> datapath too.
>>>
>>> This argument is rather weak. In almost all real-world live migration
>>> scenarios, the hardware configurations on both source and destination
>>> are (required to be) identical. Being able to support heterogenous
>>> live migration doesn't mean we can do nothing but throw all running
>>> configs or driver tunings away when it's done. Specifically, I don't
>>> find a reason not to apply the guest network configs including NIC
>>> offload settings if those are commonly supported on both ends, even on
>>> virtio-net. While for some of the configs it might be noticeable for
>>> user to respond to the loss or change, complaints would still arise
>>> when issues are painful to troubleshoot and/or difficult to get them
>>> detected and restored. This is why I say real-world scenarios are more
>>> complex than just switch and go.
>>>
>>
>> Sure. These patches by themselves don't enable live migration automatically.
>> Hypervisor
>> needs to do some additional setup before and after the migration.