Message ID | 1486041504-23627-1-git-send-email-den@openvz.org |
---|---|
State | New |
Headers | show |
On 02/02/2017 04:18 PM, Denis V. Lunev wrote: > From: Anton Nefedov <anton.nefedov@virtuozzo.com> > > Socket backend read handler should normally perform a disconnect, however > the read handler may not get a chance to run if the frontend is not ready > (qemu_chr_be_can_write() == 0). > > This means that in virtio-serial frontend case if > - the host has disconnected (giving EPIPE on socket write) > - and the guest has disconnected (-> frontend not ready -> backend > will not read) > - and there is still data (frontend->backend) to flush (has to be a really > tricky timing but nevertheless, we have observed the case in production) > > This results in virtio-serial trying to flush this data continiously forming > a busy loop. > > Solution: react on write error in the socket write handler. > errno is not reliable after qio_channel_writev_full(), so we may not get > the exact EPIPE, so disconnect on any error but QIO_CHANNEL_ERR_BLOCK which > io_channel_send_full() converts to errno EAGAIN. > We must not disconnect right away though, there still may be data to read > (see 4bf1cb0). > > Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> > Signed-off-by: Denis V. Lunev <den@openvz.org> > CC: Paolo Bonzini <pbonzini@redhat.com> > CC: Daniel P. Berrange <berrange@redhat.com> > CC: Marc-André Lureau <marcandre.lureau@redhat.com> > --- > Changes from v1: > - we do not rely on EPIPE anynore. Socket should be closed on all errors > except EAGAIN > > chardev/char-socket.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/chardev/char-socket.c b/chardev/char-socket.c > index 4068dc5..e2137aa 100644 > --- a/chardev/char-socket.c > +++ b/chardev/char-socket.c > @@ -97,6 +97,9 @@ static gboolean tcp_chr_accept(QIOChannel *chan, > GIOCondition cond, > void *opaque); > > +static int tcp_chr_read_poll(void *opaque); > +static void tcp_chr_disconnect(CharDriverState *chr); > + > /* Called with chr_write_lock held. */ > static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len) > { > @@ -114,6 +117,13 @@ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len) > s->write_msgfds_num = 0; > } > > + if (ret < 0 && errno != EAGAIN) { > + if (tcp_chr_read_poll(chr) <= 0) { > + tcp_chr_disconnect(chr); > + return len; > + } /* else let the read handler finish it properly */ > + } > + > return ret; > } else { > /* XXX: indicate an error ? */ pls disregard. I am very sorry. This patch was made against all version and will not compile. Den
Hi On Thu, Feb 2, 2017 at 5:20 PM Denis V. Lunev <den@openvz.org> wrote: > From: Anton Nefedov <anton.nefedov@virtuozzo.com> > > Socket backend read handler should normally perform a disconnect, however > the read handler may not get a chance to run if the frontend is not ready > (qemu_chr_be_can_write() == 0). > > This means that in virtio-serial frontend case if > - the host has disconnected (giving EPIPE on socket write) > - and the guest has disconnected (-> frontend not ready -> backend > will not read) > - and there is still data (frontend->backend) to flush (has to be a really > tricky timing but nevertheless, we have observed the case in production) > > This results in virtio-serial trying to flush this data continiously > forming > a busy loop. > > Solution: react on write error in the socket write handler. > errno is not reliable after qio_channel_writev_full(), so we may not get > the exact EPIPE, so disconnect on any error but QIO_CHANNEL_ERR_BLOCK which > io_channel_send_full() converts to errno EAGAIN. > We must not disconnect right away though, there still may be data to read > (see 4bf1cb0). > > Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> > Signed-off-by: Denis V. Lunev <den@openvz.org> > CC: Paolo Bonzini <pbonzini@redhat.com> > CC: Daniel P. Berrange <berrange@redhat.com> > CC: Marc-André Lureau <marcandre.lureau@redhat.com> > --- > Changes from v1: > - we do not rely on EPIPE anynore. Socket should be closed on all errors > except EAGAIN > > chardev/char-socket.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/chardev/char-socket.c b/chardev/char-socket.c > index 4068dc5..e2137aa 100644 > --- a/chardev/char-socket.c > +++ b/chardev/char-socket.c > @@ -97,6 +97,9 @@ static gboolean tcp_chr_accept(QIOChannel *chan, > GIOCondition cond, > void *opaque); > > +static int tcp_chr_read_poll(void *opaque); > +static void tcp_chr_disconnect(CharDriverState *chr); > + > /* Called with chr_write_lock held. */ > static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len) > { > @@ -114,6 +117,13 @@ static int tcp_chr_write(Chardev *chr, const uint8_t > *buf, int len) > s->write_msgfds_num = 0; > } > > + if (ret < 0 && errno != EAGAIN) { > + if (tcp_chr_read_poll(chr) <= 0) { > + tcp_chr_disconnect(chr); > + return len; > This change breaks a number of assumption in vhost-user code. Until now, a vhost-user function assumed that dev->vhost_ops would remain as long as the function is running, so it may call vhost_ops callbacks several time, which may eventually fail to do io, but no crash would occur. The disconnection would be handled later with the HUP handler. Now, vhost_ops may be cleared during a write (chr_disconnect -> CHR_EVENT_CLOSED in net_vhost_user_event). This can be randomly reproduced with vhost-user-test -p /x86_64/vhost-user/flags-mismatch/subprocess (broken pipe, qemu crashes) I am trying to fix this, but it isn't simple (races, many code paths etc), so I just wanted to inform you about it. + } /* else let the read handler finish it properly */ > + } > + > return ret; > } else { > /* XXX: indicate an error ? */ > -- > 2.7.4 > > > -- Marc-André Lureau
Marc-André Lureau <marcandre.lureau@gmail.com> writes: > Hi > > On Thu, Feb 2, 2017 at 5:20 PM Denis V. Lunev <den@openvz.org> wrote: > >> From: Anton Nefedov <anton.nefedov@virtuozzo.com> >> >> Socket backend read handler should normally perform a disconnect, however >> the read handler may not get a chance to run if the frontend is not ready >> (qemu_chr_be_can_write() == 0). >> >> This means that in virtio-serial frontend case if >> - the host has disconnected (giving EPIPE on socket write) >> - and the guest has disconnected (-> frontend not ready -> backend >> will not read) >> - and there is still data (frontend->backend) to flush (has to be a really >> tricky timing but nevertheless, we have observed the case in production) >> >> This results in virtio-serial trying to flush this data continiously >> forming >> a busy loop. >> >> Solution: react on write error in the socket write handler. >> errno is not reliable after qio_channel_writev_full(), so we may not get >> the exact EPIPE, so disconnect on any error but QIO_CHANNEL_ERR_BLOCK which >> io_channel_send_full() converts to errno EAGAIN. >> We must not disconnect right away though, there still may be data to read >> (see 4bf1cb0). >> >> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> >> Signed-off-by: Denis V. Lunev <den@openvz.org> >> CC: Paolo Bonzini <pbonzini@redhat.com> >> CC: Daniel P. Berrange <berrange@redhat.com> >> CC: Marc-André Lureau <marcandre.lureau@redhat.com> >> --- >> Changes from v1: >> - we do not rely on EPIPE anynore. Socket should be closed on all errors >> except EAGAIN >> >> chardev/char-socket.c | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/chardev/char-socket.c b/chardev/char-socket.c >> index 4068dc5..e2137aa 100644 >> --- a/chardev/char-socket.c >> +++ b/chardev/char-socket.c >> @@ -97,6 +97,9 @@ static gboolean tcp_chr_accept(QIOChannel *chan, >> GIOCondition cond, >> void *opaque); >> >> +static int tcp_chr_read_poll(void *opaque); >> +static void tcp_chr_disconnect(CharDriverState *chr); >> + >> /* Called with chr_write_lock held. */ >> static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len) >> { >> @@ -114,6 +117,13 @@ static int tcp_chr_write(Chardev *chr, const uint8_t >> *buf, int len) >> s->write_msgfds_num = 0; >> } >> >> + if (ret < 0 && errno != EAGAIN) { >> + if (tcp_chr_read_poll(chr) <= 0) { >> + tcp_chr_disconnect(chr); >> + return len; >> > > This change breaks a number of assumption in vhost-user code. Until now, a > vhost-user function assumed that dev->vhost_ops would remain as long as the > function is running, so it may call vhost_ops callbacks several time, which > may eventually fail to do io, but no crash would occur. The disconnection > would be handled later with the HUP handler. Now, vhost_ops may be cleared > during a write (chr_disconnect -> CHR_EVENT_CLOSED in > net_vhost_user_event). This can be randomly reproduced with vhost-user-test > -p /x86_64/vhost-user/flags-mismatch/subprocess (broken pipe, qemu crashes) It hangs for me with annoying frequency. I'm glad you found the culprit! > I am trying to fix this, but it isn't simple (races, many code paths etc), > so I just wanted to inform you about it. Can we revert this patch until we have a fix?
Hi On Fri, Feb 24, 2017 at 5:42 PM Markus Armbruster <armbru@redhat.com> wrote: > Marc-André Lureau <marcandre.lureau@gmail.com> writes: > > > Hi > > > > On Thu, Feb 2, 2017 at 5:20 PM Denis V. Lunev <den@openvz.org> wrote: > > > >> From: Anton Nefedov <anton.nefedov@virtuozzo.com> > >> > >> Socket backend read handler should normally perform a disconnect, > however > >> the read handler may not get a chance to run if the frontend is not > ready > >> (qemu_chr_be_can_write() == 0). > >> > >> This means that in virtio-serial frontend case if > >> - the host has disconnected (giving EPIPE on socket write) > >> - and the guest has disconnected (-> frontend not ready -> backend > >> will not read) > >> - and there is still data (frontend->backend) to flush (has to be a > really > >> tricky timing but nevertheless, we have observed the case in > production) > >> > >> This results in virtio-serial trying to flush this data continiously > >> forming > >> a busy loop. > >> > >> Solution: react on write error in the socket write handler. > >> errno is not reliable after qio_channel_writev_full(), so we may not get > >> the exact EPIPE, so disconnect on any error but QIO_CHANNEL_ERR_BLOCK > which > >> io_channel_send_full() converts to errno EAGAIN. > >> We must not disconnect right away though, there still may be data to > read > >> (see 4bf1cb0). > >> > >> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> > >> Signed-off-by: Denis V. Lunev <den@openvz.org> > >> CC: Paolo Bonzini <pbonzini@redhat.com> > >> CC: Daniel P. Berrange <berrange@redhat.com> > >> CC: Marc-André Lureau <marcandre.lureau@redhat.com> > >> --- > >> Changes from v1: > >> - we do not rely on EPIPE anynore. Socket should be closed on all errors > >> except EAGAIN > >> > >> chardev/char-socket.c | 10 ++++++++++ > >> 1 file changed, 10 insertions(+) > >> > >> diff --git a/chardev/char-socket.c b/chardev/char-socket.c > >> index 4068dc5..e2137aa 100644 > >> --- a/chardev/char-socket.c > >> +++ b/chardev/char-socket.c > >> @@ -97,6 +97,9 @@ static gboolean tcp_chr_accept(QIOChannel *chan, > >> GIOCondition cond, > >> void *opaque); > >> > >> +static int tcp_chr_read_poll(void *opaque); > >> +static void tcp_chr_disconnect(CharDriverState *chr); > >> + > >> /* Called with chr_write_lock held. */ > >> static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len) > >> { > >> @@ -114,6 +117,13 @@ static int tcp_chr_write(Chardev *chr, const > uint8_t > >> *buf, int len) > >> s->write_msgfds_num = 0; > >> } > >> > >> + if (ret < 0 && errno != EAGAIN) { > >> + if (tcp_chr_read_poll(chr) <= 0) { > >> + tcp_chr_disconnect(chr); > >> + return len; > >> > > > > This change breaks a number of assumption in vhost-user code. Until now, > a > > vhost-user function assumed that dev->vhost_ops would remain as long as > the > > function is running, so it may call vhost_ops callbacks several time, > which > > may eventually fail to do io, but no crash would occur. The disconnection > > would be handled later with the HUP handler. Now, vhost_ops may be > cleared > > during a write (chr_disconnect -> CHR_EVENT_CLOSED in > > net_vhost_user_event). This can be randomly reproduced with > vhost-user-test > > -p /x86_64/vhost-user/flags-mismatch/subprocess (broken pipe, qemu > crashes) > > It hangs for me with annoying frequency. I'm glad you found the culprit! > > Yeah, when reverted, it seems it never hangs here. But the test exercices the code in interesting ways, with some unpredictable situations (device initialization while the backend disconnects/reconnects), so I wouldn't be surprised if we find more hidden issues. (vhost-user disconnect handling still isn't close to being safe) > I am trying to fix this, but it isn't simple (races, many code paths etc), > > so I just wanted to inform you about it. > > Can we revert this patch until we have a fix? > That or disable the test for now. But I wonder if this patch is a good solution to the problem. And I don't like the socket backend behaving differently than other backends.
On 24/02/2017 15:46, Marc-André Lureau wrote: > >>> + if (ret < 0 && errno != EAGAIN) { >>> + if (tcp_chr_read_poll(chr) <= 0) { >>> + tcp_chr_disconnect(chr); >>> + return len; >>> >> >> This change breaks a number of assumption in vhost-user code. Until now, a >> vhost-user function assumed that dev->vhost_ops would remain as long as the >> function is running, so it may call vhost_ops callbacks several time, which >> may eventually fail to do io, but no crash would occur. The disconnection >> would be handled later with the HUP handler. Now, vhost_ops may be cleared >> during a write (chr_disconnect -> CHR_EVENT_CLOSED in >> net_vhost_user_event). This can be randomly reproduced with >> vhost-user-test -p /x86_64/vhost-user/flags-mismatch/subprocess Would it work to call tcp_chr_disconnect from an idle source (and destroy the source on the next connect)? Paolo
On 02/24/2017 04:31 PM, Paolo Bonzini wrote: > > On 24/02/2017 15:46, Marc-André Lureau wrote: >>>> + if (ret < 0 && errno != EAGAIN) { >>>> + if (tcp_chr_read_poll(chr) <= 0) { >>>> + tcp_chr_disconnect(chr); >>>> + return len; >>>> >>> This change breaks a number of assumption in vhost-user code. Until now, a >>> vhost-user function assumed that dev->vhost_ops would remain as long as the >>> function is running, so it may call vhost_ops callbacks several time, which >>> may eventually fail to do io, but no crash would occur. The disconnection >>> would be handled later with the HUP handler. Now, vhost_ops may be cleared >>> during a write (chr_disconnect -> CHR_EVENT_CLOSED in >>> net_vhost_user_event). This can be randomly reproduced with >>> vhost-user-test -p /x86_64/vhost-user/flags-mismatch/subprocess > Would it work to call tcp_chr_disconnect from an idle source (and > destroy the source on the next connect)? > > Paolo I think no, but will think more on Monday. Unfortunately today is official holiday in Russia :( Right now the code loops forever. May be we should notify the guest that the problem really happens. Den
Hi On Fri, Feb 24, 2017 at 7:31 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > > On 24/02/2017 15:46, Marc-André Lureau wrote: > > > >>> + if (ret < 0 && errno != EAGAIN) { > >>> + if (tcp_chr_read_poll(chr) <= 0) { > >>> + tcp_chr_disconnect(chr); > >>> + return len; > >>> > >> > >> This change breaks a number of assumption in vhost-user code. Until > now, a > >> vhost-user function assumed that dev->vhost_ops would remain as long as > the > >> function is running, so it may call vhost_ops callbacks several time, > which > >> may eventually fail to do io, but no crash would occur. The > disconnection > >> would be handled later with the HUP handler. Now, vhost_ops may be > cleared > >> during a write (chr_disconnect -> CHR_EVENT_CLOSED in > >> net_vhost_user_event). This can be randomly reproduced with > >> vhost-user-test -p /x86_64/vhost-user/flags-mismatch/subprocess > > Would it work to call tcp_chr_disconnect from an idle source (and > destroy the source on the next connect)? > Yes, I have a solution in this direction: https://da.gd/NUZf7 Note that vhost code could reach disconnect on read and segv in similar ways even if this commit was reverted. Delaying cleanup would solve that case too. Otherwise, significant work needs to be done to make vhost more robust on disconnect...
diff --git a/chardev/char-socket.c b/chardev/char-socket.c index 4068dc5..e2137aa 100644 --- a/chardev/char-socket.c +++ b/chardev/char-socket.c @@ -97,6 +97,9 @@ static gboolean tcp_chr_accept(QIOChannel *chan, GIOCondition cond, void *opaque); +static int tcp_chr_read_poll(void *opaque); +static void tcp_chr_disconnect(CharDriverState *chr); + /* Called with chr_write_lock held. */ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len) { @@ -114,6 +117,13 @@ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len) s->write_msgfds_num = 0; } + if (ret < 0 && errno != EAGAIN) { + if (tcp_chr_read_poll(chr) <= 0) { + tcp_chr_disconnect(chr); + return len; + } /* else let the read handler finish it properly */ + } + return ret; } else { /* XXX: indicate an error ? */