Message ID | 20231024163933.516546-1-peterx@redhat.com |
---|---|
State | New |
Headers | show |
Series | [v3] migration: Stop migration immediately in RDMA error paths | expand |
Peter Xu <peterx@redhat.com> wrote: > In multiple places, RDMA errors are handled in a strange way, where it only > sets qemu_file_set_error() but not stop the migration immediately. > > It's not obvious what will happen later if there is already an error. Make > all such failures stop migration immediately. > > Cc: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com> > Cc: Markus Armbruster <armbru@redhat.com> > Cc: Juan Quintela <quintela@redhat.com> > Cc: Fabiano Rosas <farosas@suse.de> > Reported-by: Thomas Huth <thuth@redhat.com> > Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> queued.
Peter Xu <peterx@redhat.com> writes: > In multiple places, RDMA errors are handled in a strange way, where it only > sets qemu_file_set_error() but not stop the migration immediately. > > It's not obvious what will happen later if there is already an error. Make > all such failures stop migration immediately. > > Cc: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com> > Cc: Markus Armbruster <armbru@redhat.com> > Cc: Juan Quintela <quintela@redhat.com> > Cc: Fabiano Rosas <farosas@suse.de> > Reported-by: Thomas Huth <thuth@redhat.com> > Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de>
Peter Xu <peterx@redhat.com> writes: > In multiple places, RDMA errors are handled in a strange way, where it only > sets qemu_file_set_error() but not stop the migration immediately. > > It's not obvious what will happen later if there is already an error. Make > all such failures stop migration immediately. > > Cc: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com> > Cc: Markus Armbruster <armbru@redhat.com> > Cc: Juan Quintela <quintela@redhat.com> > Cc: Fabiano Rosas <farosas@suse.de> > Reported-by: Thomas Huth <thuth@redhat.com> > Signed-off-by: Peter Xu <peterx@redhat.com> Good move! Since this already has competent review, I'll leave it there.
diff --git a/migration/ram.c b/migration/ram.c index 212add4481..6cb8b5cd2f 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3034,11 +3034,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque) ret = rdma_registration_start(f, RAM_CONTROL_SETUP); if (ret < 0) { qemu_file_set_error(f, ret); + return ret; } ret = rdma_registration_stop(f, RAM_CONTROL_SETUP); if (ret < 0) { qemu_file_set_error(f, ret); + return ret; } migration_ops = g_malloc0(sizeof(MigrationOps)); @@ -3104,6 +3106,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) ret = rdma_registration_start(f, RAM_CONTROL_ROUND); if (ret < 0) { qemu_file_set_error(f, ret); + goto out; } t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); @@ -3208,8 +3211,6 @@ static int ram_save_complete(QEMUFile *f, void *opaque) rs->last_stage = !migration_in_colo_state(); WITH_RCU_READ_LOCK_GUARD() { - int rdma_reg_ret; - if (!migration_in_postcopy()) { migration_bitmap_sync_precopy(rs, true); } @@ -3217,6 +3218,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) ret = rdma_registration_start(f, RAM_CONTROL_FINISH); if (ret < 0) { qemu_file_set_error(f, ret); + return ret; } /* try transferring iterative blocks of memory */ @@ -3232,24 +3234,21 @@ static int ram_save_complete(QEMUFile *f, void *opaque) break; } if (pages < 0) { - ret = pages; - break; + qemu_mutex_unlock(&rs->bitmap_mutex); + return pages; } } qemu_mutex_unlock(&rs->bitmap_mutex); ram_flush_compressed_data(rs); - rdma_reg_ret = rdma_registration_stop(f, RAM_CONTROL_FINISH); - if (rdma_reg_ret < 0) { - qemu_file_set_error(f, rdma_reg_ret); + ret = rdma_registration_stop(f, RAM_CONTROL_FINISH); + if (ret < 0) { + qemu_file_set_error(f, ret); + return ret; } } - if (ret < 0) { - return ret; - } - ret = multifd_send_sync_main(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel); if (ret < 0) { return ret;
In multiple places, RDMA errors are handled in a strange way, where it only sets qemu_file_set_error() but not stop the migration immediately. It's not obvious what will happen later if there is already an error. Make all such failures stop migration immediately. Cc: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Cc: Fabiano Rosas <farosas@suse.de> Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> --- v3: - in ram_save_complete() return directly with retval, drop the "ret<0" check after the loop [Juan] This patch is based on Thomas's patch: [PATCH v2] migration/ram: Fix compilation with -Wshadow=local https://lore.kernel.org/r/20231024092220.55305-1-thuth@redhat.com Above patch should have been queued by both Markus and Juan. --- migration/ram.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-)