diff mbox series

[2/4] vhost-user: Convert slave channel to QIOChannelSocket

Message ID 20210308123141.26444-3-groug@kaod.org
State New
Headers show
Series virtiofsd: Avoid potential deadlocks | expand

Commit Message

Greg Kurz March 8, 2021, 12:31 p.m. UTC
The slave channel is implemented with socketpair() : QEMU creates
the pair, passes one of the socket to virtiofsd and monitors the
other one with the main event loop using qemu_set_fd_handler().

In order to fix a potential deadlock between QEMU and a vhost-user
external process (e.g. virtiofsd with DAX), we want the nested
event loop in vhost_user_read() to monitor the slave channel as
well.

Prepare ground for this by converting the slave channel to be a
QIOChannelSocket. This will make monitoring of the slave channel
as simple as calling qio_channel_add_watch_source() from
vhost_user_read() instead of open-coding it.

This also allows to get rid of the ancillary data parsing since
QIOChannelSocket can do this for us. Note that the MSG_CTRUNC
check is dropped on the way because QIOChannelSocket ignores this
case. This isn't a problem since slave_read() provisions space for
8 file descriptors, but affected vhost-user slave protocol messages
generally only convey one. If for some reason a buggy implementation
passes more file descriptors, no need to break the connection, just
like we don't break it if some other type of ancillary data is
received : this isn't explicitely violating the protocol per-se so
it seems better to ignore it.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/virtio/vhost-user.c | 99 ++++++++++++++++++------------------------
 1 file changed, 42 insertions(+), 57 deletions(-)

Comments

Stefan Hajnoczi March 9, 2021, 3:17 p.m. UTC | #1
On Mon, Mar 08, 2021 at 01:31:39PM +0100, Greg Kurz wrote:
> +    g_autofree int *fd = NULL;
> +    size_t fdsize = 0;
> +    int i;
>  
>      /* Read header */
>      iov.iov_base = &hdr;
>      iov.iov_len = VHOST_USER_HDR_SIZE;
>  
>      do {
> -        size = recvmsg(u->slave_fd, &msgh, 0);
> -    } while (size < 0 && (errno == EINTR || errno == EAGAIN));
> +        size = qio_channel_readv_full(ioc, &iov, 1, &fd, &fdsize, NULL);
> +    } while (size == QIO_CHANNEL_ERR_BLOCK);

Is it possible to leak file descriptors and fd[] memory if we receive a
short message and then loop around? qio_channel_readv_full() will
overwrite &fd and that's how the leak occurs.

On the other hand, it looks like ioc is in blocking mode. I'm not sure
QIO_CHANNEL_ERR_BLOCK can occur?

> @@ -1500,8 +1479,8 @@ static void slave_read(void *opaque)
>  
>      /* Read payload */
>      do {
> -        size = read(u->slave_fd, &payload, hdr.size);
> -    } while (size < 0 && (errno == EINTR || errno == EAGAIN));
> +        size = qio_channel_read(ioc, (char *) &payload, hdr.size, NULL);
> +    } while (size == QIO_CHANNEL_ERR_BLOCK);

Same question here.
Greg Kurz March 9, 2021, 8:23 p.m. UTC | #2
On Tue, 9 Mar 2021 15:17:21 +0000
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Mon, Mar 08, 2021 at 01:31:39PM +0100, Greg Kurz wrote:
> > +    g_autofree int *fd = NULL;
> > +    size_t fdsize = 0;
> > +    int i;
> >  
> >      /* Read header */
> >      iov.iov_base = &hdr;
> >      iov.iov_len = VHOST_USER_HDR_SIZE;
> >  
> >      do {
> > -        size = recvmsg(u->slave_fd, &msgh, 0);
> > -    } while (size < 0 && (errno == EINTR || errno == EAGAIN));
> > +        size = qio_channel_readv_full(ioc, &iov, 1, &fd, &fdsize, NULL);
> > +    } while (size == QIO_CHANNEL_ERR_BLOCK);
> 
> Is it possible to leak file descriptors and fd[] memory if we receive a
> short message and then loop around? qio_channel_readv_full() will
> overwrite &fd and that's how the leak occurs.
> 

qio_channel_readv_full() only returns QIO_CHANNEL_ERR_BLOCK when the
channel is non-blocking mode and no data is available. Under the hood,
this translates to this code in qio_channel_socket_readv():

 retry:
    ret = recvmsg(sioc->fd, &msg, sflags);
    if (ret < 0) {
        if (errno == EAGAIN) {
            return QIO_CHANNEL_ERR_BLOCK;
        }
        if (errno == EINTR) {
            goto retry;
        }

        error_setg_errno(errp, errno,
                         "Unable to read from socket");
        return -1;
    }

This is strictly equivalent to what we currently have. This cannot
leak file descriptors because we would only loop until the first
byte of real data is available and ancillary data cannot fly without
real data, i.e. no file descriptors when recvmsg() returns EAGAIN.

> On the other hand, it looks like ioc is in blocking mode. I'm not sure
> QIO_CHANNEL_ERR_BLOCK can occur?
> 

You're right but the existing code is ready to handle the non-blocking
case, so I just kept this behavior.

> > @@ -1500,8 +1479,8 @@ static void slave_read(void *opaque)
> >  
> >      /* Read payload */
> >      do {
> > -        size = read(u->slave_fd, &payload, hdr.size);
> > -    } while (size < 0 && (errno == EINTR || errno == EAGAIN));
> > +        size = qio_channel_read(ioc, (char *) &payload, hdr.size, NULL);
> > +    } while (size == QIO_CHANNEL_ERR_BLOCK);
> 
> Same question here.

This also ends up in qio_channel_socket_readv().
Stefan Hajnoczi March 10, 2021, 11:27 a.m. UTC | #3
On Tue, Mar 09, 2021 at 09:23:22PM +0100, Greg Kurz wrote:
> On Tue, 9 Mar 2021 15:17:21 +0000
> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> > On Mon, Mar 08, 2021 at 01:31:39PM +0100, Greg Kurz wrote:
> > > +    g_autofree int *fd = NULL;
> > > +    size_t fdsize = 0;
> > > +    int i;
> > >  
> > >      /* Read header */
> > >      iov.iov_base = &hdr;
> > >      iov.iov_len = VHOST_USER_HDR_SIZE;
> > >  
> > >      do {
> > > -        size = recvmsg(u->slave_fd, &msgh, 0);
> > > -    } while (size < 0 && (errno == EINTR || errno == EAGAIN));
> > > +        size = qio_channel_readv_full(ioc, &iov, 1, &fd, &fdsize, NULL);
> > > +    } while (size == QIO_CHANNEL_ERR_BLOCK);
> > 
> > Is it possible to leak file descriptors and fd[] memory if we receive a
> > short message and then loop around? qio_channel_readv_full() will
> > overwrite &fd and that's how the leak occurs.
> > 
> 
> qio_channel_readv_full() only returns QIO_CHANNEL_ERR_BLOCK when the
> channel is non-blocking mode and no data is available. Under the hood,
> this translates to this code in qio_channel_socket_readv():
> 
>  retry:
>     ret = recvmsg(sioc->fd, &msg, sflags);
>     if (ret < 0) {
>         if (errno == EAGAIN) {
>             return QIO_CHANNEL_ERR_BLOCK;
>         }
>         if (errno == EINTR) {
>             goto retry;
>         }
> 
>         error_setg_errno(errp, errno,
>                          "Unable to read from socket");
>         return -1;
>     }
> 
> This is strictly equivalent to what we currently have. This cannot
> leak file descriptors because we would only loop until the first
> byte of real data is available and ancillary data cannot fly without
> real data, i.e. no file descriptors when recvmsg() returns EAGAIN.
> 
> > On the other hand, it looks like ioc is in blocking mode. I'm not sure
> > QIO_CHANNEL_ERR_BLOCK can occur?
> > 
> 
> You're right but the existing code is ready to handle the non-blocking
> case, so I just kept this behavior.

I'm confused by this tentative non-blocking support because if we set
the fd to non-blocking, then qio_channel_readv_full() can return less
than size bytes (a short read) and that will cause a failure:

  if (size != VHOST_USER_HDR_SIZE) {
      error_report("Failed to read from slave.");
      goto err;
  }

So although the while QIO_CHANNEL_ERR_BLOCK loop suggests the code
supports non-blocking, it doesn't really support it :).

I think it would be clearer to remove the while QIO_CHANNEL_ERR_BLOCK
loop. However, this is not directly related to the bug that this series
fixes, so if you prefer to keep it, feel free.
Daniel P. Berrangé March 10, 2021, 11:43 a.m. UTC | #4
On Tue, Mar 09, 2021 at 09:23:22PM +0100, Greg Kurz wrote:
> On Tue, 9 Mar 2021 15:17:21 +0000
> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> > On Mon, Mar 08, 2021 at 01:31:39PM +0100, Greg Kurz wrote:
> > > +    g_autofree int *fd = NULL;
> > > +    size_t fdsize = 0;
> > > +    int i;
> > >  
> > >      /* Read header */
> > >      iov.iov_base = &hdr;
> > >      iov.iov_len = VHOST_USER_HDR_SIZE;
> > >  
> > >      do {
> > > -        size = recvmsg(u->slave_fd, &msgh, 0);
> > > -    } while (size < 0 && (errno == EINTR || errno == EAGAIN));
> > > +        size = qio_channel_readv_full(ioc, &iov, 1, &fd, &fdsize, NULL);
> > > +    } while (size == QIO_CHANNEL_ERR_BLOCK);
> > 
> > Is it possible to leak file descriptors and fd[] memory if we receive a
> > short message and then loop around? qio_channel_readv_full() will
> > overwrite &fd and that's how the leak occurs.
> > 
> 
> qio_channel_readv_full() only returns QIO_CHANNEL_ERR_BLOCK when the
> channel is non-blocking mode and no data is available. Under the hood,
> this translates to this code in qio_channel_socket_readv():
> 
>  retry:
>     ret = recvmsg(sioc->fd, &msg, sflags);
>     if (ret < 0) {
>         if (errno == EAGAIN) {
>             return QIO_CHANNEL_ERR_BLOCK;
>         }
>         if (errno == EINTR) {
>             goto retry;
>         }
> 
>         error_setg_errno(errp, errno,
>                          "Unable to read from socket");
>         return -1;
>     }
> 
> This is strictly equivalent to what we currently have. This cannot
> leak file descriptors because we would only loop until the first
> byte of real data is available and ancillary data cannot fly without
> real data, i.e. no file descriptors when recvmsg() returns EAGAIN.

Yep, EAGAIN will only happen if you have no data AND no FDs.

> 
> > On the other hand, it looks like ioc is in blocking mode. I'm not sure
> > QIO_CHANNEL_ERR_BLOCK can occur?
> > 
> 
> You're right but the existing code is ready to handle the non-blocking
> case, so I just kept this behavior.

The existing code is *not* handling the non-blocking case in any
useful way IMHO. It will block execution of this QEMU thread, and
sit in a tight loop burning CPU in the EAGAIN case.

Handling non-blocking means using an I/O watch with the event loop
to be notified when something becomes ready.

I notice the code that follows is also doing


    if (size != VHOST_USER_HDR_SIZE) {
        error_report("Failed to read from slave.");
        goto err;
    }

which is also dubious because it assumes the previous recvmsg is
guaranteed to return VHOST_USER_HDR_SIZE bytes of data in a single
call.

It does at least look like the code is intentionally blocking
execution fo the QEMU thread until the expected amount of I/O
is receieved.

Given this, you should remove the while loop entirely and
just call

   qio_channel_readv_full_all()

this will block execution until *all* the requestd data bytes
are read. It will re-try the recvmsg in the event of a partial
data read, and will intelligently wait in poll() on EAGAIN.

> 
> > > @@ -1500,8 +1479,8 @@ static void slave_read(void *opaque)
> > >  
> > >      /* Read payload */
> > >      do {
> > > -        size = read(u->slave_fd, &payload, hdr.size);
> > > -    } while (size < 0 && (errno == EINTR || errno == EAGAIN));
> > > +        size = qio_channel_read(ioc, (char *) &payload, hdr.size, NULL);
> > > +    } while (size == QIO_CHANNEL_ERR_BLOCK);
> > 
> > Same question here.
> 
> This also ends up in qio_channel_socket_readv().



Regards,
Daniel
Greg Kurz March 10, 2021, 1:08 p.m. UTC | #5
On Wed, 10 Mar 2021 11:27:16 +0000
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Tue, Mar 09, 2021 at 09:23:22PM +0100, Greg Kurz wrote:
> > On Tue, 9 Mar 2021 15:17:21 +0000
> > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > > On Mon, Mar 08, 2021 at 01:31:39PM +0100, Greg Kurz wrote:
> > > > +    g_autofree int *fd = NULL;
> > > > +    size_t fdsize = 0;
> > > > +    int i;
> > > >  
> > > >      /* Read header */
> > > >      iov.iov_base = &hdr;
> > > >      iov.iov_len = VHOST_USER_HDR_SIZE;
> > > >  
> > > >      do {
> > > > -        size = recvmsg(u->slave_fd, &msgh, 0);
> > > > -    } while (size < 0 && (errno == EINTR || errno == EAGAIN));
> > > > +        size = qio_channel_readv_full(ioc, &iov, 1, &fd, &fdsize, NULL);
> > > > +    } while (size == QIO_CHANNEL_ERR_BLOCK);
> > > 
> > > Is it possible to leak file descriptors and fd[] memory if we receive a
> > > short message and then loop around? qio_channel_readv_full() will
> > > overwrite &fd and that's how the leak occurs.
> > > 
> > 
> > qio_channel_readv_full() only returns QIO_CHANNEL_ERR_BLOCK when the
> > channel is non-blocking mode and no data is available. Under the hood,
> > this translates to this code in qio_channel_socket_readv():
> > 
> >  retry:
> >     ret = recvmsg(sioc->fd, &msg, sflags);
> >     if (ret < 0) {
> >         if (errno == EAGAIN) {
> >             return QIO_CHANNEL_ERR_BLOCK;
> >         }
> >         if (errno == EINTR) {
> >             goto retry;
> >         }
> > 
> >         error_setg_errno(errp, errno,
> >                          "Unable to read from socket");
> >         return -1;
> >     }
> > 
> > This is strictly equivalent to what we currently have. This cannot
> > leak file descriptors because we would only loop until the first
> > byte of real data is available and ancillary data cannot fly without
> > real data, i.e. no file descriptors when recvmsg() returns EAGAIN.
> > 
> > > On the other hand, it looks like ioc is in blocking mode. I'm not sure
> > > QIO_CHANNEL_ERR_BLOCK can occur?
> > > 
> > 
> > You're right but the existing code is ready to handle the non-blocking
> > case, so I just kept this behavior.
> 
> I'm confused by this tentative non-blocking support because if we set
> the fd to non-blocking, then qio_channel_readv_full() can return less
> than size bytes (a short read) and that will cause a failure:
> 
>   if (size != VHOST_USER_HDR_SIZE) {
>       error_report("Failed to read from slave.");
>       goto err;
>   }
> 
> So although the while QIO_CHANNEL_ERR_BLOCK loop suggests the code
> supports non-blocking, it doesn't really support it :).
> 

Yeah I agree. These EAGAIN checks that we have today are really
misleading. I'll get rid of them in a preliminary patch to
reduce the noise on the slave channel conversion.

> I think it would be clearer to remove the while QIO_CHANNEL_ERR_BLOCK
> loop. However, this is not directly related to the bug that this series
> fixes, so if you prefer to keep it, feel free.
Greg Kurz March 10, 2021, 1:45 p.m. UTC | #6
On Wed, 10 Mar 2021 11:43:58 +0000
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Tue, Mar 09, 2021 at 09:23:22PM +0100, Greg Kurz wrote:
> > On Tue, 9 Mar 2021 15:17:21 +0000
> > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > > On Mon, Mar 08, 2021 at 01:31:39PM +0100, Greg Kurz wrote:
> > > > +    g_autofree int *fd = NULL;
> > > > +    size_t fdsize = 0;
> > > > +    int i;
> > > >  
> > > >      /* Read header */
> > > >      iov.iov_base = &hdr;
> > > >      iov.iov_len = VHOST_USER_HDR_SIZE;
> > > >  
> > > >      do {
> > > > -        size = recvmsg(u->slave_fd, &msgh, 0);
> > > > -    } while (size < 0 && (errno == EINTR || errno == EAGAIN));
> > > > +        size = qio_channel_readv_full(ioc, &iov, 1, &fd, &fdsize, NULL);
> > > > +    } while (size == QIO_CHANNEL_ERR_BLOCK);
> > > 
> > > Is it possible to leak file descriptors and fd[] memory if we receive a
> > > short message and then loop around? qio_channel_readv_full() will
> > > overwrite &fd and that's how the leak occurs.
> > > 
> > 
> > qio_channel_readv_full() only returns QIO_CHANNEL_ERR_BLOCK when the
> > channel is non-blocking mode and no data is available. Under the hood,
> > this translates to this code in qio_channel_socket_readv():
> > 
> >  retry:
> >     ret = recvmsg(sioc->fd, &msg, sflags);
> >     if (ret < 0) {
> >         if (errno == EAGAIN) {
> >             return QIO_CHANNEL_ERR_BLOCK;
> >         }
> >         if (errno == EINTR) {
> >             goto retry;
> >         }
> > 
> >         error_setg_errno(errp, errno,
> >                          "Unable to read from socket");
> >         return -1;
> >     }
> > 
> > This is strictly equivalent to what we currently have. This cannot
> > leak file descriptors because we would only loop until the first
> > byte of real data is available and ancillary data cannot fly without
> > real data, i.e. no file descriptors when recvmsg() returns EAGAIN.
> 
> Yep, EAGAIN will only happen if you have no data AND no FDs.
> 
> > 
> > > On the other hand, it looks like ioc is in blocking mode. I'm not sure
> > > QIO_CHANNEL_ERR_BLOCK can occur?
> > > 
> > 
> > You're right but the existing code is ready to handle the non-blocking
> > case, so I just kept this behavior.
> 
> The existing code is *not* handling the non-blocking case in any
> useful way IMHO. It will block execution of this QEMU thread, and
> sit in a tight loop burning CPU in the EAGAIN case.
> 
> Handling non-blocking means using an I/O watch with the event loop
> to be notified when something becomes ready.
> 

Hmm... slave_read() is a handler registered with qemu_set_fd_handler().
Isn't it already the case then ? Can the first call to recvmsg() even
return EAGAIN actually ?

> I notice the code that follows is also doing
> 
> 
>     if (size != VHOST_USER_HDR_SIZE) {
>         error_report("Failed to read from slave.");
>         goto err;
>     }
> 
> which is also dubious because it assumes the previous recvmsg is
> guaranteed to return VHOST_USER_HDR_SIZE bytes of data in a single
> call.
> 

Yeah this is broken since recvmsg() doesn't guarantee full reads.

> It does at least look like the code is intentionally blocking
> execution fo the QEMU thread until the expected amount of I/O
> is receieved.
> 
> Given this, you should remove the while loop entirely and
> just call
> 
>    qio_channel_readv_full_all()
> 
> this will block execution until *all* the requestd data bytes
> are read. It will re-try the recvmsg in the event of a partial
> data read, and will intelligently wait in poll() on EAGAIN.
> 

Thanks for the suggestion !

> > 
> > > > @@ -1500,8 +1479,8 @@ static void slave_read(void *opaque)
> > > >  
> > > >      /* Read payload */
> > > >      do {
> > > > -        size = read(u->slave_fd, &payload, hdr.size);
> > > > -    } while (size < 0 && (errno == EINTR || errno == EAGAIN));
> > > > +        size = qio_channel_read(ioc, (char *) &payload, hdr.size, NULL);
> > > > +    } while (size == QIO_CHANNEL_ERR_BLOCK);
> > > 
> > > Same question here.
> > 
> > This also ends up in qio_channel_socket_readv().
> 
> 
> 
> Regards,
> Daniel
Daniel P. Berrangé March 10, 2021, 1:48 p.m. UTC | #7
On Wed, Mar 10, 2021 at 02:45:25PM +0100, Greg Kurz wrote:
> On Wed, 10 Mar 2021 11:43:58 +0000
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> > On Tue, Mar 09, 2021 at 09:23:22PM +0100, Greg Kurz wrote:
> > > On Tue, 9 Mar 2021 15:17:21 +0000
> > > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > 
> > > > On Mon, Mar 08, 2021 at 01:31:39PM +0100, Greg Kurz wrote:
> > > > > +    g_autofree int *fd = NULL;
> > > > > +    size_t fdsize = 0;
> > > > > +    int i;
> > > > >  
> > > > >      /* Read header */
> > > > >      iov.iov_base = &hdr;
> > > > >      iov.iov_len = VHOST_USER_HDR_SIZE;
> > > > >  
> > > > >      do {
> > > > > -        size = recvmsg(u->slave_fd, &msgh, 0);
> > > > > -    } while (size < 0 && (errno == EINTR || errno == EAGAIN));
> > > > > +        size = qio_channel_readv_full(ioc, &iov, 1, &fd, &fdsize, NULL);
> > > > > +    } while (size == QIO_CHANNEL_ERR_BLOCK);
> > > > 
> > > > Is it possible to leak file descriptors and fd[] memory if we receive a
> > > > short message and then loop around? qio_channel_readv_full() will
> > > > overwrite &fd and that's how the leak occurs.
> > > > 
> > > 
> > > qio_channel_readv_full() only returns QIO_CHANNEL_ERR_BLOCK when the
> > > channel is non-blocking mode and no data is available. Under the hood,
> > > this translates to this code in qio_channel_socket_readv():
> > > 
> > >  retry:
> > >     ret = recvmsg(sioc->fd, &msg, sflags);
> > >     if (ret < 0) {
> > >         if (errno == EAGAIN) {
> > >             return QIO_CHANNEL_ERR_BLOCK;
> > >         }
> > >         if (errno == EINTR) {
> > >             goto retry;
> > >         }
> > > 
> > >         error_setg_errno(errp, errno,
> > >                          "Unable to read from socket");
> > >         return -1;
> > >     }
> > > 
> > > This is strictly equivalent to what we currently have. This cannot
> > > leak file descriptors because we would only loop until the first
> > > byte of real data is available and ancillary data cannot fly without
> > > real data, i.e. no file descriptors when recvmsg() returns EAGAIN.
> > 
> > Yep, EAGAIN will only happen if you have no data AND no FDs.
> > 
> > > 
> > > > On the other hand, it looks like ioc is in blocking mode. I'm not sure
> > > > QIO_CHANNEL_ERR_BLOCK can occur?
> > > > 
> > > 
> > > You're right but the existing code is ready to handle the non-blocking
> > > case, so I just kept this behavior.
> > 
> > The existing code is *not* handling the non-blocking case in any
> > useful way IMHO. It will block execution of this QEMU thread, and
> > sit in a tight loop burning CPU in the EAGAIN case.
> > 
> > Handling non-blocking means using an I/O watch with the event loop
> > to be notified when something becomes ready.
> > 
> 
> Hmm... slave_read() is a handler registered with qemu_set_fd_handler().
> Isn't it already the case then ? Can the first call to recvmsg() even
> return EAGAIN actually ?

IIUC it shouldn't be able to return EAGAIN in that scenario for stream
sockets (TCP, UNIX), only a datagram socket (UDP) would be at risk of
EAGAIN



Regards,
Daniel
diff mbox series

Patch

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 8a0574d5f959..e395d0d1fd81 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -16,6 +16,7 @@ 
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-net.h"
 #include "chardev/char-fe.h"
+#include "io/channel-socket.h"
 #include "sysemu/kvm.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
@@ -237,7 +238,8 @@  struct vhost_user {
     struct vhost_dev *dev;
     /* Shared between vhost devs of the same virtio device */
     VhostUserState *user;
-    int slave_fd;
+    QIOChannel *slave_ioc;
+    GSource *slave_src;
     NotifierWithReturn postcopy_notifier;
     struct PostCopyFD  postcopy_fd;
     uint64_t           postcopy_client_bases[VHOST_USER_MAX_RAM_SLOTS];
@@ -1441,56 +1443,33 @@  static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
     return 0;
 }
 
-static void slave_read(void *opaque)
+static gboolean slave_read(QIOChannel *ioc, GIOCondition condition,
+                           gpointer opaque)
 {
     struct vhost_dev *dev = opaque;
     struct vhost_user *u = dev->opaque;
     VhostUserHeader hdr = { 0, };
     VhostUserPayload payload = { 0, };
-    int size, ret = 0;
+    ssize_t size;
+    int ret = 0;
     struct iovec iov;
-    struct msghdr msgh;
-    int fd[VHOST_USER_SLAVE_MAX_FDS];
-    char control[CMSG_SPACE(sizeof(fd))];
-    struct cmsghdr *cmsg;
-    int i, fdsize = 0;
-
-    memset(&msgh, 0, sizeof(msgh));
-    msgh.msg_iov = &iov;
-    msgh.msg_iovlen = 1;
-    msgh.msg_control = control;
-    msgh.msg_controllen = sizeof(control);
-
-    memset(fd, -1, sizeof(fd));
+    g_autofree int *fd = NULL;
+    size_t fdsize = 0;
+    int i;
 
     /* Read header */
     iov.iov_base = &hdr;
     iov.iov_len = VHOST_USER_HDR_SIZE;
 
     do {
-        size = recvmsg(u->slave_fd, &msgh, 0);
-    } while (size < 0 && (errno == EINTR || errno == EAGAIN));
+        size = qio_channel_readv_full(ioc, &iov, 1, &fd, &fdsize, NULL);
+    } while (size == QIO_CHANNEL_ERR_BLOCK);
 
     if (size != VHOST_USER_HDR_SIZE) {
         error_report("Failed to read from slave.");
         goto err;
     }
 
-    if (msgh.msg_flags & MSG_CTRUNC) {
-        error_report("Truncated message.");
-        goto err;
-    }
-
-    for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
-         cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
-            if (cmsg->cmsg_level == SOL_SOCKET &&
-                cmsg->cmsg_type == SCM_RIGHTS) {
-                    fdsize = cmsg->cmsg_len - CMSG_LEN(0);
-                    memcpy(fd, CMSG_DATA(cmsg), fdsize);
-                    break;
-            }
-    }
-
     if (hdr.size > VHOST_USER_PAYLOAD_SIZE) {
         error_report("Failed to read msg header."
                 " Size %d exceeds the maximum %zu.", hdr.size,
@@ -1500,8 +1479,8 @@  static void slave_read(void *opaque)
 
     /* Read payload */
     do {
-        size = read(u->slave_fd, &payload, hdr.size);
-    } while (size < 0 && (errno == EINTR || errno == EAGAIN));
+        size = qio_channel_read(ioc, (char *) &payload, hdr.size, NULL);
+    } while (size == QIO_CHANNEL_ERR_BLOCK);
 
     if (size != hdr.size) {
         error_report("Failed to read payload from slave.");
@@ -1517,7 +1496,7 @@  static void slave_read(void *opaque)
         break;
     case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
         ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
-                                                          fd[0]);
+                                                          fd ? fd[0] : -1);
         break;
     default:
         error_report("Received unexpected msg type: %d.", hdr.request);
@@ -1525,8 +1504,8 @@  static void slave_read(void *opaque)
     }
 
     /* Close the remaining file descriptors. */
-    for (i = 0; i < fdsize; i++) {
-        if (fd[i] != -1) {
+    if (fd) {
+        for (i = 0; i < fdsize; i++) {
             close(fd[i]);
         }
     }
@@ -1551,8 +1530,8 @@  static void slave_read(void *opaque)
         iovec[1].iov_len = hdr.size;
 
         do {
-            size = writev(u->slave_fd, iovec, ARRAY_SIZE(iovec));
-        } while (size < 0 && (errno == EINTR || errno == EAGAIN));
+            size = qio_channel_writev(ioc, iovec, ARRAY_SIZE(iovec), NULL);
+        } while (size == QIO_CHANNEL_ERR_BLOCK);
 
         if (size != VHOST_USER_HDR_SIZE + hdr.size) {
             error_report("Failed to send msg reply to slave.");
@@ -1560,18 +1539,20 @@  static void slave_read(void *opaque)
         }
     }
 
-    return;
+    return G_SOURCE_CONTINUE;
 
 err:
-    qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
-    close(u->slave_fd);
-    u->slave_fd = -1;
-    for (i = 0; i < fdsize; i++) {
-        if (fd[i] != -1) {
+    g_source_destroy(u->slave_src);
+    g_source_unref(u->slave_src);
+    u->slave_src = NULL;
+    object_unref(OBJECT(ioc));
+    u->slave_ioc = NULL;
+    if (fd) {
+        for (i = 0; i < fdsize; i++) {
             close(fd[i]);
         }
     }
-    return;
+    return G_SOURCE_REMOVE;
 }
 
 static int vhost_setup_slave_channel(struct vhost_dev *dev)
@@ -1595,8 +1576,9 @@  static int vhost_setup_slave_channel(struct vhost_dev *dev)
         return -1;
     }
 
-    u->slave_fd = sv[0];
-    qemu_set_fd_handler(u->slave_fd, slave_read, NULL, dev);
+    u->slave_ioc = QIO_CHANNEL(qio_channel_socket_new_fd(sv[0], &error_abort));
+    u->slave_src = qio_channel_add_watch_source(u->slave_ioc, G_IO_IN,
+                                                slave_read, dev, NULL, NULL);
 
     if (reply_supported) {
         msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
@@ -1614,9 +1596,11 @@  static int vhost_setup_slave_channel(struct vhost_dev *dev)
 out:
     close(sv[1]);
     if (ret) {
-        qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
-        close(u->slave_fd);
-        u->slave_fd = -1;
+        g_source_destroy(u->slave_src);
+        g_source_unref(u->slave_src);
+        u->slave_src = NULL;
+        object_unref(OBJECT(u->slave_ioc));
+        u->slave_ioc = NULL;
     }
 
     return ret;
@@ -1853,7 +1837,6 @@  static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
 
     u = g_new0(struct vhost_user, 1);
     u->user = opaque;
-    u->slave_fd = -1;
     u->dev = dev;
     dev->opaque = u;
 
@@ -1968,10 +1951,12 @@  static int vhost_user_backend_cleanup(struct vhost_dev *dev)
         close(u->postcopy_fd.fd);
         u->postcopy_fd.handler = NULL;
     }
-    if (u->slave_fd >= 0) {
-        qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
-        close(u->slave_fd);
-        u->slave_fd = -1;
+    if (u->slave_ioc) {
+        g_source_destroy(u->slave_src);
+        g_source_unref(u->slave_src);
+        u->slave_src = NULL;
+        object_unref(OBJECT(u->slave_ioc));
+        u->slave_ioc = NULL;
     }
     g_free(u->region_rb);
     u->region_rb = NULL;