diff mbox series

[RFC,11/12] vdpa: use SVQ to stall dataplane while NIC state is being restored

Message ID 20230720181459.607008-12-eperezma@redhat.com
State New
Headers show
Series Prefer to use SVQ to stall dataplane at NIC state restore through CVQ | expand

Commit Message

Eugenio Perez Martin July 20, 2023, 6:14 p.m. UTC
Some dynamic state of a virtio-net vDPA devices is restored from CVQ in
the event of a live migration.  However, dataplane needs to be disabled
so the NIC does not receive buffers in the invalid ring.

As a default method to achieve it, let's offer a shadow vring with 0
avail idx.  As a fallback method, we will enable dataplane vqs later, as
proposed previously.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 net/vhost-vdpa.c | 49 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 44 insertions(+), 5 deletions(-)

Comments

Si-Wei Liu July 21, 2023, 10:58 p.m. UTC | #1
On 7/20/2023 11:14 AM, Eugenio Pérez wrote:
> Some dynamic state of a virtio-net vDPA devices is restored from CVQ in
> the event of a live migration.  However, dataplane needs to be disabled
> so the NIC does not receive buffers in the invalid ring.
>
> As a default method to achieve it, let's offer a shadow vring with 0
> avail idx.  As a fallback method, we will enable dataplane vqs later, as
> proposed previously.
Let's not jump to conclusion too early what will be the default v.s. 
fallback [1] - as this is on a latency sensitive path, I'm not fully 
convinced ring reset could perform better than or equally same as the 
deferred dataplane enablement approach on hardware. At this stage I 
think ring_reset has no adoption on vendors device, while it's 
definitely easier with lower hardware overhead for vendor to implement 
deferred dataplane enabling. If at some point vendor's device has to 
support RING_RESET for other use cases (MTU change propagation for ex., 
a prerequisite for GRO HW) than live migration, defaulting to RING_RESET 
on this SVQ path has no real benefit but adds complications needlessly 
to vendor's device.

[1] 
https://lore.kernel.org/virtualization/bf2164a9-1dfd-14d9-be2a-8bb7620a0619@oracle.com/T/#m15caca6fbb00ca9c00e2b33391297a2d8282ff89

Thanks,
-Siwei

>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   net/vhost-vdpa.c | 49 +++++++++++++++++++++++++++++++++++++++++++-----
>   1 file changed, 44 insertions(+), 5 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index af83de92f8..e14ae48f23 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -338,10 +338,25 @@ static int vhost_vdpa_net_data_start(NetClientState *nc)
>   {
>       VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>       struct vhost_vdpa *v = &s->vhost_vdpa;
> +    bool has_cvq = v->dev->vq_index_end % 2;
>   
>       assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>   
> -    if (s->always_svq ||
> +    if (has_cvq && (v->dev->features & VIRTIO_F_RING_RESET)) {
> +        /*
> +         * Offer a fake vring to the device while the state is restored
> +         * through CVQ.  That way, the guest will not see packets in unexpected
> +         * queues.
> +         *
> +         * This will be undone after loading all state through CVQ, at
> +         * vhost_vdpa_net_load.
> +         *
> +         * TODO: Future optimizations may skip some SVQ setup and teardown,
> +         * like set the right kick and call fd or doorbell maps directly, and
> +         * the iova tree.
> +         */
> +        v->shadow_vqs_enabled = true;
> +    } else if (s->always_svq ||
>           migration_is_setup_or_active(migrate_get_current()->state)) {
>           v->shadow_vqs_enabled = true;
>           v->shadow_data = true;
> @@ -738,10 +753,34 @@ static int vhost_vdpa_net_load(NetClientState *nc)
>           return r;
>       }
>   
> -    for (int i = 0; i < v->dev->vq_index; ++i) {
> -        r = vhost_vdpa_set_vring_ready(v, i);
> -        if (unlikely(r)) {
> -            return r;
> +    if (v->dev->features & VIRTIO_F_RING_RESET && !s->always_svq &&
> +        !migration_is_setup_or_active(migrate_get_current()->state)) {
> +        NICState *nic = qemu_get_nic(s->nc.peer);
> +        int queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> +
> +        for (int i = 0; i < queue_pairs; ++i) {
> +            NetClientState *ncs = qemu_get_peer(nic->ncs, i);
> +            VhostVDPAState *s_i = DO_UPCAST(VhostVDPAState, nc, ncs);
> +
> +            for (int j = 0; j < 2; ++j) {
> +                vhost_net_virtqueue_reset(v->dev->vdev, ncs->peer, j);
> +            }
> +
> +            s_i->vhost_vdpa.shadow_vqs_enabled = false;
> +
> +            for (int j = 0; j < 2; ++j) {
> +                r = vhost_net_virtqueue_restart(v->dev->vdev, ncs->peer, j);
> +                if (unlikely(r < 0)) {
> +                    return r;
> +                }
> +            }
> +        }
> +    } else {
> +        for (int i = 0; i < v->dev->vq_index; ++i) {
> +            r = vhost_vdpa_set_vring_ready(v, i);
> +            if (unlikely(r)) {
> +                return r;
> +            }
>           }
>       }
>
Eugenio Perez Martin July 24, 2023, 7:59 p.m. UTC | #2
On Sat, Jul 22, 2023 at 12:59 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 7/20/2023 11:14 AM, Eugenio Pérez wrote:
> > Some dynamic state of a virtio-net vDPA devices is restored from CVQ in
> > the event of a live migration.  However, dataplane needs to be disabled
> > so the NIC does not receive buffers in the invalid ring.
> >
> > As a default method to achieve it, let's offer a shadow vring with 0
> > avail idx.  As a fallback method, we will enable dataplane vqs later, as
> > proposed previously.
> Let's not jump to conclusion too early what will be the default v.s.
> fallback [1] - as this is on a latency sensitive path, I'm not fully
> convinced ring reset could perform better than or equally same as the
> deferred dataplane enablement approach on hardware. At this stage I
> think ring_reset has no adoption on vendors device, while it's
> definitely easier with lower hardware overhead for vendor to implement
> deferred dataplane enabling. If at some point vendor's device has to
> support RING_RESET for other use cases (MTU change propagation for ex.,
> a prerequisite for GRO HW) than live migration, defaulting to RING_RESET
> on this SVQ path has no real benefit but adds complications needlessly
> to vendor's device.
>

I agree with that. Let's say "*This series* uses RING_RESET as the
default method, and late vq enablement as fallback".

Michael, given the current HW support, would it work to start merging
the late enable for vDPA after the feature freeze, and then add the
use of RING_RESET on top later?

Thanks!

> [1]
> https://lore.kernel.org/virtualization/bf2164a9-1dfd-14d9-be2a-8bb7620a0619@oracle.com/T/#m15caca6fbb00ca9c00e2b33391297a2d8282ff89
>
> Thanks,
> -Siwei
>
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   net/vhost-vdpa.c | 49 +++++++++++++++++++++++++++++++++++++++++++-----
> >   1 file changed, 44 insertions(+), 5 deletions(-)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index af83de92f8..e14ae48f23 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -338,10 +338,25 @@ static int vhost_vdpa_net_data_start(NetClientState *nc)
> >   {
> >       VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> >       struct vhost_vdpa *v = &s->vhost_vdpa;
> > +    bool has_cvq = v->dev->vq_index_end % 2;
> >
> >       assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >
> > -    if (s->always_svq ||
> > +    if (has_cvq && (v->dev->features & VIRTIO_F_RING_RESET)) {
> > +        /*
> > +         * Offer a fake vring to the device while the state is restored
> > +         * through CVQ.  That way, the guest will not see packets in unexpected
> > +         * queues.
> > +         *
> > +         * This will be undone after loading all state through CVQ, at
> > +         * vhost_vdpa_net_load.
> > +         *
> > +         * TODO: Future optimizations may skip some SVQ setup and teardown,
> > +         * like set the right kick and call fd or doorbell maps directly, and
> > +         * the iova tree.
> > +         */
> > +        v->shadow_vqs_enabled = true;
> > +    } else if (s->always_svq ||
> >           migration_is_setup_or_active(migrate_get_current()->state)) {
> >           v->shadow_vqs_enabled = true;
> >           v->shadow_data = true;
> > @@ -738,10 +753,34 @@ static int vhost_vdpa_net_load(NetClientState *nc)
> >           return r;
> >       }
> >
> > -    for (int i = 0; i < v->dev->vq_index; ++i) {
> > -        r = vhost_vdpa_set_vring_ready(v, i);
> > -        if (unlikely(r)) {
> > -            return r;
> > +    if (v->dev->features & VIRTIO_F_RING_RESET && !s->always_svq &&
> > +        !migration_is_setup_or_active(migrate_get_current()->state)) {
> > +        NICState *nic = qemu_get_nic(s->nc.peer);
> > +        int queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> > +
> > +        for (int i = 0; i < queue_pairs; ++i) {
> > +            NetClientState *ncs = qemu_get_peer(nic->ncs, i);
> > +            VhostVDPAState *s_i = DO_UPCAST(VhostVDPAState, nc, ncs);
> > +
> > +            for (int j = 0; j < 2; ++j) {
> > +                vhost_net_virtqueue_reset(v->dev->vdev, ncs->peer, j);
> > +            }
> > +
> > +            s_i->vhost_vdpa.shadow_vqs_enabled = false;
> > +
> > +            for (int j = 0; j < 2; ++j) {
> > +                r = vhost_net_virtqueue_restart(v->dev->vdev, ncs->peer, j);
> > +                if (unlikely(r < 0)) {
> > +                    return r;
> > +                }
> > +            }
> > +        }
> > +    } else {
> > +        for (int i = 0; i < v->dev->vq_index; ++i) {
> > +            r = vhost_vdpa_set_vring_ready(v, i);
> > +            if (unlikely(r)) {
> > +                return r;
> > +            }
> >           }
> >       }
> >
>
Jason Wang July 25, 2023, 3:48 a.m. UTC | #3
On Tue, Jul 25, 2023 at 3:59 AM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Sat, Jul 22, 2023 at 12:59 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >
> >
> >
> > On 7/20/2023 11:14 AM, Eugenio Pérez wrote:
> > > Some dynamic state of a virtio-net vDPA devices is restored from CVQ in
> > > the event of a live migration.  However, dataplane needs to be disabled
> > > so the NIC does not receive buffers in the invalid ring.
> > >
> > > As a default method to achieve it, let's offer a shadow vring with 0
> > > avail idx.  As a fallback method, we will enable dataplane vqs later, as
> > > proposed previously.
> > Let's not jump to conclusion too early what will be the default v.s.
> > fallback [1] - as this is on a latency sensitive path, I'm not fully
> > convinced ring reset could perform better than or equally same as the
> > deferred dataplane enablement approach on hardware. At this stage I
> > think ring_reset has no adoption on vendors device, while it's
> > definitely easier with lower hardware overhead for vendor to implement
> > deferred dataplane enabling.

That's my feeling as well.

> > If at some point vendor's device has to
> > support RING_RESET for other use cases (MTU change propagation for ex.,
> > a prerequisite for GRO HW) than live migration,

Currently, it is used for changing ring size, and it is planned for
AF_XDP. But I think neither of them requires low latency.

Thanks

> > defaulting to RING_RESET
> > on this SVQ path has no real benefit but adds complications needlessly
> > to vendor's device.
> >
>
> I agree with that. Let's say "*This series* uses RING_RESET as the
> default method, and late vq enablement as fallback".
>
> Michael, given the current HW support, would it work to start merging
> the late enable for vDPA after the feature freeze, and then add the
> use of RING_RESET on top later?
>
> Thanks!
>
> > [1]
> > https://lore.kernel.org/virtualization/bf2164a9-1dfd-14d9-be2a-8bb7620a0619@oracle.com/T/#m15caca6fbb00ca9c00e2b33391297a2d8282ff89
> >
> > Thanks,
> > -Siwei
> >
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >   net/vhost-vdpa.c | 49 +++++++++++++++++++++++++++++++++++++++++++-----
> > >   1 file changed, 44 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > index af83de92f8..e14ae48f23 100644
> > > --- a/net/vhost-vdpa.c
> > > +++ b/net/vhost-vdpa.c
> > > @@ -338,10 +338,25 @@ static int vhost_vdpa_net_data_start(NetClientState *nc)
> > >   {
> > >       VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > >       struct vhost_vdpa *v = &s->vhost_vdpa;
> > > +    bool has_cvq = v->dev->vq_index_end % 2;
> > >
> > >       assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > >
> > > -    if (s->always_svq ||
> > > +    if (has_cvq && (v->dev->features & VIRTIO_F_RING_RESET)) {
> > > +        /*
> > > +         * Offer a fake vring to the device while the state is restored
> > > +         * through CVQ.  That way, the guest will not see packets in unexpected
> > > +         * queues.
> > > +         *
> > > +         * This will be undone after loading all state through CVQ, at
> > > +         * vhost_vdpa_net_load.
> > > +         *
> > > +         * TODO: Future optimizations may skip some SVQ setup and teardown,
> > > +         * like set the right kick and call fd or doorbell maps directly, and
> > > +         * the iova tree.
> > > +         */
> > > +        v->shadow_vqs_enabled = true;
> > > +    } else if (s->always_svq ||
> > >           migration_is_setup_or_active(migrate_get_current()->state)) {
> > >           v->shadow_vqs_enabled = true;
> > >           v->shadow_data = true;
> > > @@ -738,10 +753,34 @@ static int vhost_vdpa_net_load(NetClientState *nc)
> > >           return r;
> > >       }
> > >
> > > -    for (int i = 0; i < v->dev->vq_index; ++i) {
> > > -        r = vhost_vdpa_set_vring_ready(v, i);
> > > -        if (unlikely(r)) {
> > > -            return r;
> > > +    if (v->dev->features & VIRTIO_F_RING_RESET && !s->always_svq &&
> > > +        !migration_is_setup_or_active(migrate_get_current()->state)) {
> > > +        NICState *nic = qemu_get_nic(s->nc.peer);
> > > +        int queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> > > +
> > > +        for (int i = 0; i < queue_pairs; ++i) {
> > > +            NetClientState *ncs = qemu_get_peer(nic->ncs, i);
> > > +            VhostVDPAState *s_i = DO_UPCAST(VhostVDPAState, nc, ncs);
> > > +
> > > +            for (int j = 0; j < 2; ++j) {
> > > +                vhost_net_virtqueue_reset(v->dev->vdev, ncs->peer, j);
> > > +            }
> > > +
> > > +            s_i->vhost_vdpa.shadow_vqs_enabled = false;
> > > +
> > > +            for (int j = 0; j < 2; ++j) {
> > > +                r = vhost_net_virtqueue_restart(v->dev->vdev, ncs->peer, j);
> > > +                if (unlikely(r < 0)) {
> > > +                    return r;
> > > +                }
> > > +            }
> > > +        }
> > > +    } else {
> > > +        for (int i = 0; i < v->dev->vq_index; ++i) {
> > > +            r = vhost_vdpa_set_vring_ready(v, i);
> > > +            if (unlikely(r)) {
> > > +                return r;
> > > +            }
> > >           }
> > >       }
> > >
> >
>
diff mbox series

Patch

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index af83de92f8..e14ae48f23 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -338,10 +338,25 @@  static int vhost_vdpa_net_data_start(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
     struct vhost_vdpa *v = &s->vhost_vdpa;
+    bool has_cvq = v->dev->vq_index_end % 2;
 
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 
-    if (s->always_svq ||
+    if (has_cvq && (v->dev->features & VIRTIO_F_RING_RESET)) {
+        /*
+         * Offer a fake vring to the device while the state is restored
+         * through CVQ.  That way, the guest will not see packets in unexpected
+         * queues.
+         *
+         * This will be undone after loading all state through CVQ, at
+         * vhost_vdpa_net_load.
+         *
+         * TODO: Future optimizations may skip some SVQ setup and teardown,
+         * like set the right kick and call fd or doorbell maps directly, and
+         * the iova tree.
+         */
+        v->shadow_vqs_enabled = true;
+    } else if (s->always_svq ||
         migration_is_setup_or_active(migrate_get_current()->state)) {
         v->shadow_vqs_enabled = true;
         v->shadow_data = true;
@@ -738,10 +753,34 @@  static int vhost_vdpa_net_load(NetClientState *nc)
         return r;
     }
 
-    for (int i = 0; i < v->dev->vq_index; ++i) {
-        r = vhost_vdpa_set_vring_ready(v, i);
-        if (unlikely(r)) {
-            return r;
+    if (v->dev->features & VIRTIO_F_RING_RESET && !s->always_svq &&
+        !migration_is_setup_or_active(migrate_get_current()->state)) {
+        NICState *nic = qemu_get_nic(s->nc.peer);
+        int queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
+
+        for (int i = 0; i < queue_pairs; ++i) {
+            NetClientState *ncs = qemu_get_peer(nic->ncs, i);
+            VhostVDPAState *s_i = DO_UPCAST(VhostVDPAState, nc, ncs);
+
+            for (int j = 0; j < 2; ++j) {
+                vhost_net_virtqueue_reset(v->dev->vdev, ncs->peer, j);
+            }
+
+            s_i->vhost_vdpa.shadow_vqs_enabled = false;
+
+            for (int j = 0; j < 2; ++j) {
+                r = vhost_net_virtqueue_restart(v->dev->vdev, ncs->peer, j);
+                if (unlikely(r < 0)) {
+                    return r;
+                }
+            }
+        }
+    } else {
+        for (int i = 0; i < v->dev->vq_index; ++i) {
+            r = vhost_vdpa_set_vring_ready(v, i);
+            if (unlikely(r)) {
+                return r;
+            }
         }
     }