diff mbox series

[v2,1/3] vhost-user: do not send RESET_OWNER on device reset

Message ID 20231004014532.1228637-2-stefanha@redhat.com
State New
Headers show
Series [v2,1/3] vhost-user: do not send RESET_OWNER on device reset | expand

Commit Message

Stefan Hajnoczi Oct. 4, 2023, 1:45 a.m. UTC
The VHOST_USER_RESET_OWNER message is deprecated in the spec:

   This is no longer used. Used to be sent to request disabling all
   rings, but some back-ends interpreted it to also discard connection
   state (this interpretation would lead to bugs).  It is recommended
   that back-ends either ignore this message, or use it to disable all
   rings.

The only caller of vhost_user_reset_device() is vhost_user_scsi_reset().
It checks that F_RESET_DEVICE was negotiated before calling it:

  static void vhost_user_scsi_reset(VirtIODevice *vdev)
  {
      VHostSCSICommon *vsc = VHOST_SCSI_COMMON(vdev);
      struct vhost_dev *dev = &vsc->dev;

      /*
       * Historically, reset was not implemented so only reset devices
       * that are expecting it.
       */
      if (!virtio_has_feature(dev->protocol_features,
                              VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
          return;
      }

      if (dev->vhost_ops->vhost_reset_device) {
          dev->vhost_ops->vhost_reset_device(dev);
      }
  }

Therefore VHOST_USER_RESET_OWNER is actually never sent by
vhost_user_reset_device(). Remove the dead code. This effectively moves
the vhost-user protocol specific code from vhost-user-scsi.c into
vhost-user.c where it belongs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/scsi/vhost-user-scsi.c |  9 ---------
 hw/virtio/vhost-user.c    | 13 +++++++++----
 2 files changed, 9 insertions(+), 13 deletions(-)

Comments

Hanna Czenczek Oct. 4, 2023, 10:44 a.m. UTC | #1
On 04.10.23 03:45, Stefan Hajnoczi wrote:
> The VHOST_USER_RESET_OWNER message is deprecated in the spec:
>
>     This is no longer used. Used to be sent to request disabling all
>     rings, but some back-ends interpreted it to also discard connection
>     state (this interpretation would lead to bugs).  It is recommended
>     that back-ends either ignore this message, or use it to disable all
>     rings.

According to the spec, it is then indeed better to not call it in 
vhost_user_reset_device, because it seems like it would be interpreted 
as something completely different.

However, between the three back-end implementations of vhost-user I know 
of (libvhost-user, DPDK, the vhost crates; four if you count RSD), none 
implement RESET_DEVICE.  libvhost-user and DPDK do implement 
RESET_OWNER, though, and they both do it by resetting the device, not by 
disabling any vring.  The vhost crate also implements RESET_OWNER, but 
it doesn’t do anything but forward it as such to the actual device 
implementation (virtiofsd doesn’t implement this function, so ignores 
it).  It does document that it would disable all vrings, but does so in 
the past and has marked it deprecated (ever since the method was 
introduced in the fourth commit to the repository, making it extremely 
unlikely that anyone would implement it).

So I would like to know why the spec says that it would disable all 
vrings, when none of the implementations (qemu, libvhost-user, DPDK) 
agree on that.  Let me look it up:

Before commit c61f09ed855, it did say “stopping” instead of 
“disabling”.  The commit doesn’t explain why it changed this. Commit 
a586e65bbd0 (just a week prior) deprecated the command, changing it from 
“connection is about to be closed, [front-end] will no longer own this 
connection” to “deprecated, used to be sent to request stopping all 
vrings”.  To me, the front-end closing the connection sounds like a good 
point to reset, which would indeed stop all vrings, but not just that.  
Notably, qemu agrees, because RESET_OWNER is used only in the 
vhost_user_reset_device() function. a586e65bbd0^ removed that function’s 
use, though, specifically because it would cause a reset, when the 
intention was just to stop.

So it sounds to me like “used to be sent to request stopping all vrings” 
is rather what vhost_net wanted, but specifically not what the message 
did, which was anything between nothing and a reset, I presume (because 
it never specified what the back-end was supposed to do, though 
apparently libvhost-user and DPDK both took it to mean reset).  Why it 
was then changed to “disabling”, I absolutely cannot say.

Now, the code change here is indeed effectively a no-op, as you deduce 
below, but in the context of the whole series the situation is a bit 
different: As far as I understand, the point is to have guest-initiated 
resets be forwarded to back-ends.  But by removing the RESET_OWNER 
fallback, no back-end will actually do a reset still.

I understand that as per the specification, using RESET_OWNER for 
resetting is wrong.  But all implementations that implemented it before 
it was deprecated do interpret it as a reset, so I don’t think using it 
as a fallback is actually wrong.

Hanna

> The only caller of vhost_user_reset_device() is vhost_user_scsi_reset().
> It checks that F_RESET_DEVICE was negotiated before calling it:
>
>    static void vhost_user_scsi_reset(VirtIODevice *vdev)
>    {
>        VHostSCSICommon *vsc = VHOST_SCSI_COMMON(vdev);
>        struct vhost_dev *dev = &vsc->dev;
>
>        /*
>         * Historically, reset was not implemented so only reset devices
>         * that are expecting it.
>         */
>        if (!virtio_has_feature(dev->protocol_features,
>                                VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
>            return;
>        }
>
>        if (dev->vhost_ops->vhost_reset_device) {
>            dev->vhost_ops->vhost_reset_device(dev);
>        }
>    }
>
> Therefore VHOST_USER_RESET_OWNER is actually never sent by
> vhost_user_reset_device(). Remove the dead code. This effectively moves
> the vhost-user protocol specific code from vhost-user-scsi.c into
> vhost-user.c where it belongs.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   hw/scsi/vhost-user-scsi.c |  9 ---------
>   hw/virtio/vhost-user.c    | 13 +++++++++----
>   2 files changed, 9 insertions(+), 13 deletions(-)
>
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index ee99b19e7a..8582b2e8ab 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -71,15 +71,6 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev)
>       VHostSCSICommon *vsc = VHOST_SCSI_COMMON(vdev);
>       struct vhost_dev *dev = &vsc->dev;
>   
> -    /*
> -     * Historically, reset was not implemented so only reset devices
> -     * that are expecting it.
> -     */
> -    if (!virtio_has_feature(dev->protocol_features,
> -                            VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
> -        return;
> -    }
> -
>       if (dev->vhost_ops->vhost_reset_device) {
>           dev->vhost_ops->vhost_reset_device(dev);
>       }
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 8dcf049d42..7bed9ad7d5 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -1492,12 +1492,17 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
>   {
>       VhostUserMsg msg = {
>           .hdr.flags = VHOST_USER_VERSION,
> +        .hdr.request = VHOST_USER_RESET_DEVICE,
>       };
>   
> -    msg.hdr.request = virtio_has_feature(dev->protocol_features,
> -                                         VHOST_USER_PROTOCOL_F_RESET_DEVICE)
> -        ? VHOST_USER_RESET_DEVICE
> -        : VHOST_USER_RESET_OWNER;
> +    /*
> +     * Historically, reset was not implemented so only reset devices
> +     * that are expecting it.
> +     */
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
> +        return -ENOSYS;
> +    }
>   
>       return vhost_user_write(dev, &msg, NULL, 0);
>   }
Stefan Hajnoczi Oct. 4, 2023, 11:15 a.m. UTC | #2
On Wed, 4 Oct 2023 at 06:44, Hanna Czenczek <hreitz@redhat.com> wrote:
>
> On 04.10.23 03:45, Stefan Hajnoczi wrote:
> > The VHOST_USER_RESET_OWNER message is deprecated in the spec:
> >
> >     This is no longer used. Used to be sent to request disabling all
> >     rings, but some back-ends interpreted it to also discard connection
> >     state (this interpretation would lead to bugs).  It is recommended
> >     that back-ends either ignore this message, or use it to disable all
> >     rings.
>
> According to the spec, it is then indeed better to not call it in
> vhost_user_reset_device, because it seems like it would be interpreted
> as something completely different.
>
> However, between the three back-end implementations of vhost-user I know
> of (libvhost-user, DPDK, the vhost crates; four if you count RSD), none
> implement RESET_DEVICE.  libvhost-user and DPDK do implement
> RESET_OWNER, though, and they both do it by resetting the device, not by
> disabling any vring.  The vhost crate also implements RESET_OWNER, but
> it doesn’t do anything but forward it as such to the actual device
> implementation (virtiofsd doesn’t implement this function, so ignores
> it).  It does document that it would disable all vrings, but does so in
> the past and has marked it deprecated (ever since the method was
> introduced in the fourth commit to the repository, making it extremely
> unlikely that anyone would implement it).

Hi Hanna,
vhost-user-backend still seems to reset all vhost-user protocol state,
making RESET_OWNER unusable:
https://github.com/rust-vmm/vhost/blob/main/crates/vhost-user-backend/src/handler.rs#L230

Have I missed something?

Stefan

>
> So I would like to know why the spec says that it would disable all
> vrings, when none of the implementations (qemu, libvhost-user, DPDK)
> agree on that.  Let me look it up:
>
> Before commit c61f09ed855, it did say “stopping” instead of
> “disabling”.  The commit doesn’t explain why it changed this. Commit
> a586e65bbd0 (just a week prior) deprecated the command, changing it from
> “connection is about to be closed, [front-end] will no longer own this
> connection” to “deprecated, used to be sent to request stopping all
> vrings”.  To me, the front-end closing the connection sounds like a good
> point to reset, which would indeed stop all vrings, but not just that.
> Notably, qemu agrees, because RESET_OWNER is used only in the
> vhost_user_reset_device() function. a586e65bbd0^ removed that function’s
> use, though, specifically because it would cause a reset, when the
> intention was just to stop.
>
> So it sounds to me like “used to be sent to request stopping all vrings”
> is rather what vhost_net wanted, but specifically not what the message
> did, which was anything between nothing and a reset, I presume (because
> it never specified what the back-end was supposed to do, though
> apparently libvhost-user and DPDK both took it to mean reset).  Why it
> was then changed to “disabling”, I absolutely cannot say.
>
> Now, the code change here is indeed effectively a no-op, as you deduce
> below, but in the context of the whole series the situation is a bit
> different: As far as I understand, the point is to have guest-initiated
> resets be forwarded to back-ends.  But by removing the RESET_OWNER
> fallback, no back-end will actually do a reset still.
>
> I understand that as per the specification, using RESET_OWNER for
> resetting is wrong.  But all implementations that implemented it before
> it was deprecated do interpret it as a reset, so I don’t think using it
> as a fallback is actually wrong.
>
> Hanna
>
> > The only caller of vhost_user_reset_device() is vhost_user_scsi_reset().
> > It checks that F_RESET_DEVICE was negotiated before calling it:
> >
> >    static void vhost_user_scsi_reset(VirtIODevice *vdev)
> >    {
> >        VHostSCSICommon *vsc = VHOST_SCSI_COMMON(vdev);
> >        struct vhost_dev *dev = &vsc->dev;
> >
> >        /*
> >         * Historically, reset was not implemented so only reset devices
> >         * that are expecting it.
> >         */
> >        if (!virtio_has_feature(dev->protocol_features,
> >                                VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
> >            return;
> >        }
> >
> >        if (dev->vhost_ops->vhost_reset_device) {
> >            dev->vhost_ops->vhost_reset_device(dev);
> >        }
> >    }
> >
> > Therefore VHOST_USER_RESET_OWNER is actually never sent by
> > vhost_user_reset_device(). Remove the dead code. This effectively moves
> > the vhost-user protocol specific code from vhost-user-scsi.c into
> > vhost-user.c where it belongs.
> >
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >   hw/scsi/vhost-user-scsi.c |  9 ---------
> >   hw/virtio/vhost-user.c    | 13 +++++++++----
> >   2 files changed, 9 insertions(+), 13 deletions(-)
> >
> > diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> > index ee99b19e7a..8582b2e8ab 100644
> > --- a/hw/scsi/vhost-user-scsi.c
> > +++ b/hw/scsi/vhost-user-scsi.c
> > @@ -71,15 +71,6 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev)
> >       VHostSCSICommon *vsc = VHOST_SCSI_COMMON(vdev);
> >       struct vhost_dev *dev = &vsc->dev;
> >
> > -    /*
> > -     * Historically, reset was not implemented so only reset devices
> > -     * that are expecting it.
> > -     */
> > -    if (!virtio_has_feature(dev->protocol_features,
> > -                            VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
> > -        return;
> > -    }
> > -
> >       if (dev->vhost_ops->vhost_reset_device) {
> >           dev->vhost_ops->vhost_reset_device(dev);
> >       }
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 8dcf049d42..7bed9ad7d5 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -1492,12 +1492,17 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
> >   {
> >       VhostUserMsg msg = {
> >           .hdr.flags = VHOST_USER_VERSION,
> > +        .hdr.request = VHOST_USER_RESET_DEVICE,
> >       };
> >
> > -    msg.hdr.request = virtio_has_feature(dev->protocol_features,
> > -                                         VHOST_USER_PROTOCOL_F_RESET_DEVICE)
> > -        ? VHOST_USER_RESET_DEVICE
> > -        : VHOST_USER_RESET_OWNER;
> > +    /*
> > +     * Historically, reset was not implemented so only reset devices
> > +     * that are expecting it.
> > +     */
> > +    if (!virtio_has_feature(dev->protocol_features,
> > +                            VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
> > +        return -ENOSYS;
> > +    }
> >
> >       return vhost_user_write(dev, &msg, NULL, 0);
> >   }
>
>
Hanna Czenczek Oct. 4, 2023, 12:23 p.m. UTC | #3
On 04.10.23 13:15, Stefan Hajnoczi wrote:
> On Wed, 4 Oct 2023 at 06:44, Hanna Czenczek <hreitz@redhat.com> wrote:
>> On 04.10.23 03:45, Stefan Hajnoczi wrote:
>>> The VHOST_USER_RESET_OWNER message is deprecated in the spec:
>>>
>>>      This is no longer used. Used to be sent to request disabling all
>>>      rings, but some back-ends interpreted it to also discard connection
>>>      state (this interpretation would lead to bugs).  It is recommended
>>>      that back-ends either ignore this message, or use it to disable all
>>>      rings.
>> According to the spec, it is then indeed better to not call it in
>> vhost_user_reset_device, because it seems like it would be interpreted
>> as something completely different.
>>
>> However, between the three back-end implementations of vhost-user I know
>> of (libvhost-user, DPDK, the vhost crates; four if you count RSD), none
>> implement RESET_DEVICE.  libvhost-user and DPDK do implement
>> RESET_OWNER, though, and they both do it by resetting the device, not by
>> disabling any vring.  The vhost crate also implements RESET_OWNER, but
>> it doesn’t do anything but forward it as such to the actual device
>> implementation (virtiofsd doesn’t implement this function, so ignores
>> it).  It does document that it would disable all vrings, but does so in
>> the past and has marked it deprecated (ever since the method was
>> introduced in the fourth commit to the repository, making it extremely
>> unlikely that anyone would implement it).
> Hi Hanna,
> vhost-user-backend still seems to reset all vhost-user protocol state,
> making RESET_OWNER unusable:
> https://github.com/rust-vmm/vhost/blob/main/crates/vhost-user-backend/src/handler.rs#L230
>
> Have I missed something?

No, I missed that.  I overlooked that when grepping for reset_owner.

This implementation kind of does follow the original pre-2015 
description of RESET_OWNER, but I can’t believe this code originated 
from pre-2015, which makes it really weird.  It’s strange that in a 
commit from April 2019 the vhost crate marked the function as (“no 
longer used”), and then it was implemented in September of 2019, notably 
as something completely different than “Used to be sent to request 
disabling all rings”.

Another thing I noticed is that while libvhost-user does call the 
function vu_reset_device_exec(), what it does is indeed disable all 
vrings instead of resetting anything, i.e. what the spec says (and what 
I didn’t think anyone did).

DPDK interestingly does do a reset, and this includes clearing 
protocol_features (in reset_device()).

So two things: First, I’d prefer (slightly) for the commit message to 
mention that while RESET_OWNER would not be usable for a reset if 
everything were according to spec, the main problem to me seems that 
RESET_OWNER was never clearly defined before being deprecated, so 
different implementations do different things regardless of what the 
spec says now, which means it’s basically burned and no front-end may 
ever issue it at all lest it wants to get the back-end into an 
implementation-defined state.

Second, I wonder what exactly you mean by saying that RESET_OWNER 
resetting all vhost-user protocol state were to make the command 
unusable for resetting.  The vhost-user-backend implementation doesn’t 
clear all state, but resets only three things: The @owned field (which I 
don’t think is really used outside of 
vhost/src/vhost_user/dummy_slave.rs (this name is begging for a 
conscious language overhaul...), which appears not really important?), 
the virtio features (this I would expect any device reset to do), and 
the vhost-user protocol features.  Now, yes, clearing the vhost-user 
protocol features doesn’t seem like something that should be done just 
because of a device reset. However, note that DPDK’s reset_device() does 
this, too.  What I’m getting at is that we don’t have a clear definition 
of what RESET_DEVICE is supposed to do either, i.e. it doesn’t 
explicitly say that protocol features are not to be reset.  Sure, that 
may be obvious, but we should clarify this so that if e.g. DPDK is to 
implement RESET_DEVICE, they will take care not use its reset_device() 
implementation, which does reset protocol_features.

(Also, now that I look at RESET_DEVICE – why does it say that it 
disables all vrings?  (Has done so since its introduction.)  Is this 
something that qemu expects and will handle, i.e. that after a 
guest-initiated reset, the rings are enabled when the guest wants to use 
the device again?  Also, it doesn’t say the rings are stopped, so 
basically, as it is *right now* (where we only have three ring states, 
stopped, started+disabled, started+enabled), disabling vrings implicitly 
means they must still be started, because they can’t be disabled when 
stopped.  I’m going to change that, but as it reads right now, 
RESET_DEVICE does not seem like the reset we want. We should really get 
to fix that, too, before back-ends start to actually implement it.)

Hanna
diff mbox series

Patch

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index ee99b19e7a..8582b2e8ab 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -71,15 +71,6 @@  static void vhost_user_scsi_reset(VirtIODevice *vdev)
     VHostSCSICommon *vsc = VHOST_SCSI_COMMON(vdev);
     struct vhost_dev *dev = &vsc->dev;
 
-    /*
-     * Historically, reset was not implemented so only reset devices
-     * that are expecting it.
-     */
-    if (!virtio_has_feature(dev->protocol_features,
-                            VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
-        return;
-    }
-
     if (dev->vhost_ops->vhost_reset_device) {
         dev->vhost_ops->vhost_reset_device(dev);
     }
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 8dcf049d42..7bed9ad7d5 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1492,12 +1492,17 @@  static int vhost_user_reset_device(struct vhost_dev *dev)
 {
     VhostUserMsg msg = {
         .hdr.flags = VHOST_USER_VERSION,
+        .hdr.request = VHOST_USER_RESET_DEVICE,
     };
 
-    msg.hdr.request = virtio_has_feature(dev->protocol_features,
-                                         VHOST_USER_PROTOCOL_F_RESET_DEVICE)
-        ? VHOST_USER_RESET_DEVICE
-        : VHOST_USER_RESET_OWNER;
+    /*
+     * Historically, reset was not implemented so only reset devices
+     * that are expecting it.
+     */
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
+        return -ENOSYS;
+    }
 
     return vhost_user_write(dev, &msg, NULL, 0);
 }