Message ID | 20230905145002.46391-1-kwolf@redhat.com |
---|---|
Headers | show |
Series | virtio: Drop out of coroutine context in virtio_load() | expand |
On Tue, Sep 05, 2023 at 04:50:00PM +0200, Kevin Wolf wrote: > This fixes a recently introduced assertion failure that was reported to > happen when migrating virtio-net with a failover. The latent bug that > we're executing code in coroutine context that was never supposed to run > there has existed for a long time. However, the new assertion that > callers of bdrv_graph_rdlock_main_loop() don't run in coroutine context > makes it very visible because it's now always a crash. > > Kevin Wolf (2): > vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn > virtio: Drop out of coroutine context in virtio_load() > > include/migration/vmstate.h | 8 ++++--- > hw/virtio/virtio.c | 45 ++++++++++++++++++++++++++++++++----- > 2 files changed, 45 insertions(+), 8 deletions(-) This looks like a bandaid for a specific instance of this problem rather than a solution that takes care of the root cause. Is it possible to make VMStateInfo.get/put() consistenty coroutine_fn? Stefan
Am 07.09.2023 um 20:42 hat Stefan Hajnoczi geschrieben: > On Tue, Sep 05, 2023 at 04:50:00PM +0200, Kevin Wolf wrote: > > This fixes a recently introduced assertion failure that was reported to > > happen when migrating virtio-net with a failover. The latent bug that > > we're executing code in coroutine context that was never supposed to run > > there has existed for a long time. However, the new assertion that > > callers of bdrv_graph_rdlock_main_loop() don't run in coroutine context > > makes it very visible because it's now always a crash. > > > > Kevin Wolf (2): > > vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn > > virtio: Drop out of coroutine context in virtio_load() > > > > include/migration/vmstate.h | 8 ++++--- > > hw/virtio/virtio.c | 45 ++++++++++++++++++++++++++++++++----- > > 2 files changed, 45 insertions(+), 8 deletions(-) > > This looks like a bandaid for a specific instance of this problem > rather than a solution that takes care of the root cause. > > Is it possible to make VMStateInfo.get/put() consistenty coroutine_fn? I think it is. Note that this doesn't solve the problem, virtio_load() calls functions that must run _outside_ coroutine context. So once the migration code is cleaned up to consistenly run in coroutine context, you can remove the check and the one line for the !qemu_in_coroutine() case from this series. The rest stays as it is. It is not a solution that takes care of the root cause, but I also can't think of one. The problem is that VMState callbacks both read/write the migration stream (which should be done in coroutine context) and set the device state (which can involve functions that must not run in coroutine context). Untangling this, if possible at all, is not easy and certainly not something for stable releases. Kevin
On Tue, Sep 05, 2023 at 04:50:00PM +0200, Kevin Wolf wrote: > This fixes a recently introduced assertion failure that was reported to > happen when migrating virtio-net with a failover. The latent bug that > we're executing code in coroutine context that was never supposed to run > there has existed for a long time. However, the new assertion that > callers of bdrv_graph_rdlock_main_loop() don't run in coroutine context > makes it very visible because it's now always a crash. > > Kevin Wolf (2): > vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn > virtio: Drop out of coroutine context in virtio_load() > > include/migration/vmstate.h | 8 ++++--- > hw/virtio/virtio.c | 45 ++++++++++++++++++++++++++++++++----- > 2 files changed, 45 insertions(+), 8 deletions(-) > > -- > 2.41.0 > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>