diff mbox series

[RFC,3/3] aio-posix: call ->poll_end() when removing AioHandler

Message ID 20231213211544.1601971-4-stefanha@redhat.com
State New
Headers show
Series aio-posix: call ->poll_end() when removing AioHandler | expand

Commit Message

Stefan Hajnoczi Dec. 13, 2023, 9:15 p.m. UTC
Hanna Czenczek found the root cause for --device
virtio-scsi-pci,iothread= hangs across hotplug/unplug. When AioContext
polling has started on the virtqueue host notifier in the IOThread and
the main loop thread calls aio_set_fd_handler(), then the io_poll_end()
callback is not invoked on the virtqueue host notifier. The virtqueue is
left with polling disabled and never detects guest activity once
attached again (because virtqueue kicks are not enabled and AioContext
only starts polling after the fd becomes active for the first time). The
result is that the virtio-scsi device stops responding to its virtqueue.

Previous patches made aio_set_fd_handler() run in the AioContext's home
thread so it's now possible to call ->io_poll_end() to solve this bug.

Buglink: https://issues.redhat.com/browse/RHEL-3934
Cc: Hanna Czenczek <hreitz@redhat.com>
Cc: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 util/aio-posix.c | 53 ++++++++++++++++++++++++++++++------------------
 1 file changed, 33 insertions(+), 20 deletions(-)

Comments

Paolo Bonzini Dec. 13, 2023, 9:52 p.m. UTC | #1
On Wed, Dec 13, 2023 at 10:15 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> -    /* If a read is in progress, just mark the node as deleted */
> -    if (ctx->walking_handlers > 0) {
> -        QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, node, node_deleted);
> -        return false;
> +    /* If polling was started on the node then end it now */
> +    if (ctx->poll_started && node->io_poll_end) {
> +        node->io_poll_end(node->opaque);
> +
> +        /* Poll one last time in case ->io_poll_end() raced with the event */
> +        if (node->io_poll(node->opaque)) {
> +            poll_ready = true;
> +        }

Do you need this at all? If the caller is removing the handlers, they
should have put themselves in a state where they don't care about the
file descriptor becoming readable.

You still have to be careful because aio_bh_schedule_oneshot() can
trigger remote nested event loops (which can cause deadlocks and, I am
especially afraid, can take some time and invalidate the expectation
that you don't need to drop the BQL). But it does simplify this patch
quite a bit.

Paolo
Stefan Hajnoczi Dec. 14, 2023, 8:12 p.m. UTC | #2
On Wed, Dec 13, 2023 at 10:52:58PM +0100, Paolo Bonzini wrote:
> On Wed, Dec 13, 2023 at 10:15 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > -    /* If a read is in progress, just mark the node as deleted */
> > -    if (ctx->walking_handlers > 0) {
> > -        QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, node, node_deleted);
> > -        return false;
> > +    /* If polling was started on the node then end it now */
> > +    if (ctx->poll_started && node->io_poll_end) {
> > +        node->io_poll_end(node->opaque);
> > +
> > +        /* Poll one last time in case ->io_poll_end() raced with the event */
> > +        if (node->io_poll(node->opaque)) {
> > +            poll_ready = true;
> > +        }
> 
> Do you need this at all? If the caller is removing the handlers, they
> should have put themselves in a state where they don't care about the
> file descriptor becoming readable.

The caller no longer wishes to monitor the fd. This may be temporary
though. The fd must stay readable so that the caller can monitor it
again in the future and handle pending events.

->io_poll_begin() and ->io_poll_end() are used to bypass the fd while in
polling mode (e.g. disabling virtqueue kicks). This is a performance
optimization that avoids wasting cycles servicing fds when we're polling
anyway.

When the caller does:
1. aio_set_fd_handler(ctx, fd, NULL, ...)    // stop monitoring
2. aio_set_fd_handler(ctx, fd, read_fd, ...) // start monitoring

Then read_fd() should be called if the fd became readable between 1 and 2.

Since the fd may be bypassed until ->io_poll_end() returns, we must poll
one last time to check if an event snuck in right at the end without
making the fd readable. If polling detected an event, then we must do
something. We cannot drop the event.

(An alternative is to poll once before monitoring the fd again. That way
pending events are detected even if the fd wasn't readable. That is
currently not the way aio-posix.c works though.)

Stefan
Paolo Bonzini Dec. 14, 2023, 8:39 p.m. UTC | #3
Il gio 14 dic 2023, 21:12 Stefan Hajnoczi <stefanha@redhat.com> ha scritto:

> Since the fd may be bypassed until ->io_poll_end() returns, we must poll
> one last time to check if an event snuck in right at the end without
> making the fd readable. If polling detected an event, then we must do
> something. We cannot drop the event


I agree that in general we cannot. I wonder however if, in the (already
racy) case of a concurrent aio_set_fd_handler(ctx, fd, NULL, ...), you
really need to call poll_ready here.

>
> (An alternative is to poll once before monitoring the fd again. That way
> pending events are detected even if the fd wasn't readable. That is
> currently not the way aio-posix.c works though.)
>

Yes, that would be a change. If I understood correctly Hanna's suggestions
in the issue, she also mentioned doing a manual virtqueue notification
before monitoring restarts. So basically my idea boils down to implementing
that, and then cleaning up everything on top.

Paolo



> Stefan
>
Stefan Hajnoczi Dec. 18, 2023, 2:27 p.m. UTC | #4
On Thu, Dec 14, 2023 at 09:39:02PM +0100, Paolo Bonzini wrote:
> Il gio 14 dic 2023, 21:12 Stefan Hajnoczi <stefanha@redhat.com> ha scritto:
> 
> > Since the fd may be bypassed until ->io_poll_end() returns, we must poll
> > one last time to check if an event snuck in right at the end without
> > making the fd readable. If polling detected an event, then we must do
> > something. We cannot drop the event
> 
> 
> I agree that in general we cannot. I wonder however if, in the (already
> racy) case of a concurrent aio_set_fd_handler(ctx, fd, NULL, ...), you
> really need to call poll_ready here.

It doesn't need to happen here but it needs to be recorded so that the
handler (either poll_ready or the fd read handler) runs later. In the
case of eventfds it's easy to write(fd, ...) so that activity will be
picked up again when file descriptor monitoring resumes.

> >
> > (An alternative is to poll once before monitoring the fd again. That way
> > pending events are detected even if the fd wasn't readable. That is
> > currently not the way aio-posix.c works though.)
> >
> 
> Yes, that would be a change. If I understood correctly Hanna's suggestions
> in the issue, she also mentioned doing a manual virtqueue notification
> before monitoring restarts. So basically my idea boils down to implementing
> that, and then cleaning up everything on top.

Okay.

Stefan
diff mbox series

Patch

diff --git a/util/aio-posix.c b/util/aio-posix.c
index 86e232e9d3..2245a9659b 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -64,8 +64,11 @@  static AioHandler *find_aio_handler(AioContext *ctx, int fd)
     return NULL;
 }
 
-static bool aio_remove_fd_handler(AioContext *ctx, AioHandler *node)
+static void aio_remove_fd_handler(AioContext *ctx, AioHandler *node)
 {
+    bool poll_ready;
+    bool delete_node = false;
+
     /* If the GSource is in the process of being destroyed then
      * g_source_remove_poll() causes an assertion failure.  Skip
      * removal in that case, because glib cleans up its state during
@@ -76,25 +79,40 @@  static bool aio_remove_fd_handler(AioContext *ctx, AioHandler *node)
     }
 
     node->pfd.revents = 0;
+    poll_ready = node->poll_ready;
     node->poll_ready = false;
 
     /* If the fd monitor has already marked it deleted, leave it alone */
-    if (QLIST_IS_INSERTED(node, node_deleted)) {
-        return false;
+    if (!QLIST_IS_INSERTED(node, node_deleted)) {
+        /* If a read is in progress, just mark the node as deleted */
+        if (ctx->walking_handlers > 0) {
+            QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, node, node_deleted);
+        } else {
+            /* Otherwise, delete it for real.  We can't just mark it as
+             * deleted because deleted nodes are only cleaned up while
+             * no one is walking the handlers list.
+             */
+            QLIST_SAFE_REMOVE(node, node_poll);
+            QLIST_REMOVE(node, node);
+            delete_node = true;
+        }
     }
 
-    /* If a read is in progress, just mark the node as deleted */
-    if (ctx->walking_handlers > 0) {
-        QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, node, node_deleted);
-        return false;
+    /* If polling was started on the node then end it now */
+    if (ctx->poll_started && node->io_poll_end) {
+        node->io_poll_end(node->opaque);
+
+        /* Poll one last time in case ->io_poll_end() raced with the event */
+        if (node->io_poll(node->opaque)) {
+            poll_ready = true;
+        }
+    }
+    if (poll_ready) {
+        node->io_poll_ready(node->opaque);
+    }
+    if (delete_node) {
+        g_free(node);
     }
-    /* Otherwise, delete it for real.  We can't just mark it as
-     * deleted because deleted nodes are only cleaned up while
-     * no one is walking the handlers list.
-     */
-    QLIST_SAFE_REMOVE(node, node_poll);
-    QLIST_REMOVE(node, node);
-    return true;
 }
 
 /* Perform aio_set_fd_handler() in this thread's AioContext */
@@ -109,7 +127,6 @@  static void aio_set_fd_handler_local(AioContext *ctx,
     AioHandler *node;
     AioHandler *new_node = NULL;
     bool is_new = false;
-    bool deleted = false;
     int poll_disable_change;
 
     if (io_poll && !io_poll_ready) {
@@ -166,13 +183,9 @@  static void aio_set_fd_handler_local(AioContext *ctx,
 
     ctx->fdmon_ops->update(ctx, node, new_node);
     if (node) {
-        deleted = aio_remove_fd_handler(ctx, node);
+        aio_remove_fd_handler(ctx, node);
     }
     aio_notify(ctx);
-
-    if (deleted) {
-        g_free(node);
-    }
 }
 
 typedef struct {