Message ID | 20231024154008.512222-1-peterx@redhat.com |
---|---|
State | New |
Headers | show |
Series | [v2] migration: Stop migration immediately in RDMA error paths | expand |
Peter Xu <peterx@redhat.com> wrote: > In multiple places, RDMA errors are handled in a strange way, where it only > sets qemu_file_set_error() but not stop the migration immediately. > > It's not obvious what will happen later if there is already an error. Make > all such failures stop migration immediately. > > Cc: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com> > Cc: Markus Armbruster <armbru@redhat.com> > Cc: Juan Quintela <quintela@redhat.com> > Cc: Fabiano Rosas <farosas@suse.de> > Reported-by: Thomas Huth <thuth@redhat.com> > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > > v2: > - One more line squashed into to fix the build error... Please ignore v1, > sorry for the noise. > > This patch is based on Thomas's patch: > > [PATCH v2] migration/ram: Fix compilation with -Wshadow=local > https://lore.kernel.org/r/20231024092220.55305-1-thuth@redhat.com > > Above patch should have been queued by both Markus and Juan. > --- > migration/ram.c | 13 ++++++++----- > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/migration/ram.c b/migration/ram.c > index 212add4481..1473bb593a 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -3034,11 +3034,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque) > ret = rdma_registration_start(f, RAM_CONTROL_SETUP); > if (ret < 0) { > qemu_file_set_error(f, ret); > + return ret; I agree > } > > ret = rdma_registration_stop(f, RAM_CONTROL_SETUP); > if (ret < 0) { > qemu_file_set_error(f, ret); > + return ret; I agree > } > > migration_ops = g_malloc0(sizeof(MigrationOps)); > @@ -3104,6 +3106,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) > ret = rdma_registration_start(f, RAM_CONTROL_ROUND); > if (ret < 0) { > qemu_file_set_error(f, ret); > + goto out; Seems sensible > } > > t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); > @@ -3208,8 +3211,6 @@ static int ram_save_complete(QEMUFile *f, void *opaque) > rs->last_stage = !migration_in_colo_state(); > > WITH_RCU_READ_LOCK_GUARD() { > - int rdma_reg_ret; > - > if (!migration_in_postcopy()) { > migration_bitmap_sync_precopy(rs, true); > } > @@ -3217,6 +3218,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) > ret = rdma_registration_start(f, RAM_CONTROL_FINISH); > if (ret < 0) { > qemu_file_set_error(f, ret); > + break; Please return ret; We can do exactly the same with pages < 0. > } > > /* try transferring iterative blocks of memory */ > @@ -3240,9 +3242,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque) > > ram_flush_compressed_data(rs); > > - rdma_reg_ret = rdma_registration_stop(f, RAM_CONTROL_FINISH); > - if (rdma_reg_ret < 0) { > - qemu_file_set_error(f, rdma_reg_ret); > + ret = rdma_registration_stop(f, RAM_CONTROL_FINISH); > + if (ret < 0) { > + qemu_file_set_error(f, ret); > + break; > } > } And if we return here, we can just drop the: if (ret < 0) { return ret; } At the ext of the loop. Later, Juan.
On Tue, Oct 24, 2023 at 06:16:27PM +0200, Juan Quintela wrote: > Peter Xu <peterx@redhat.com> wrote: > > In multiple places, RDMA errors are handled in a strange way, where it only > > sets qemu_file_set_error() but not stop the migration immediately. > > > > It's not obvious what will happen later if there is already an error. Make > > all such failures stop migration immediately. > > > > Cc: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com> > > Cc: Markus Armbruster <armbru@redhat.com> > > Cc: Juan Quintela <quintela@redhat.com> > > Cc: Fabiano Rosas <farosas@suse.de> > > Reported-by: Thomas Huth <thuth@redhat.com> > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > > > v2: > > - One more line squashed into to fix the build error... Please ignore v1, > > sorry for the noise. > > > > This patch is based on Thomas's patch: > > > > [PATCH v2] migration/ram: Fix compilation with -Wshadow=local > > https://lore.kernel.org/r/20231024092220.55305-1-thuth@redhat.com > > > > Above patch should have been queued by both Markus and Juan. > > --- > > migration/ram.c | 13 ++++++++----- > > 1 file changed, 8 insertions(+), 5 deletions(-) > > > > diff --git a/migration/ram.c b/migration/ram.c > > index 212add4481..1473bb593a 100644 > > --- a/migration/ram.c > > +++ b/migration/ram.c > > @@ -3034,11 +3034,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque) > > ret = rdma_registration_start(f, RAM_CONTROL_SETUP); > > if (ret < 0) { > > qemu_file_set_error(f, ret); > > + return ret; > > I agree > > > } > > > > ret = rdma_registration_stop(f, RAM_CONTROL_SETUP); > > if (ret < 0) { > > qemu_file_set_error(f, ret); > > + return ret; > > I agree > > > } > > > > migration_ops = g_malloc0(sizeof(MigrationOps)); > > @@ -3104,6 +3106,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) > > ret = rdma_registration_start(f, RAM_CONTROL_ROUND); > > if (ret < 0) { > > qemu_file_set_error(f, ret); > > + goto out; > > Seems sensible > > > } > > > > t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); > > @@ -3208,8 +3211,6 @@ static int ram_save_complete(QEMUFile *f, void *opaque) > > rs->last_stage = !migration_in_colo_state(); > > > > WITH_RCU_READ_LOCK_GUARD() { > > - int rdma_reg_ret; > > - > > if (!migration_in_postcopy()) { > > migration_bitmap_sync_precopy(rs, true); > > } > > @@ -3217,6 +3218,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) > > ret = rdma_registration_start(f, RAM_CONTROL_FINISH); > > if (ret < 0) { > > qemu_file_set_error(f, ret); > > + break; > > Please > return ret; > > > We can do exactly the same with pages < 0. > > > } > > > > /* try transferring iterative blocks of memory */ > > @@ -3240,9 +3242,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque) > > > > ram_flush_compressed_data(rs); > > > > - rdma_reg_ret = rdma_registration_stop(f, RAM_CONTROL_FINISH); > > - if (rdma_reg_ret < 0) { > > - qemu_file_set_error(f, rdma_reg_ret); > > + ret = rdma_registration_stop(f, RAM_CONTROL_FINISH); > > + if (ret < 0) { > > + qemu_file_set_error(f, ret); > > + break; > > } > > } > > And if we return here, we can just drop the: > > if (ret < 0) { > return ret; > } > > > At the ext of the loop. IIUC that'll be the same as this patch, but sure thing I'll prepare a v3. Thanks,
Peter Xu <peterx@redhat.com> wrote: > On Tue, Oct 24, 2023 at 06:16:27PM +0200, Juan Quintela wrote: >> Peter Xu <peterx@redhat.com> wrote: >> > In multiple places, RDMA errors are handled in a strange way, where it only >> > sets qemu_file_set_error() but not stop the migration immediately. >> > >> > It's not obvious what will happen later if there is already an error. Make >> > all such failures stop migration immediately. >> > >> > Cc: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com> >> > Cc: Markus Armbruster <armbru@redhat.com> >> > Cc: Juan Quintela <quintela@redhat.com> >> > Cc: Fabiano Rosas <farosas@suse.de> >> > Reported-by: Thomas Huth <thuth@redhat.com> >> > Signed-off-by: Peter Xu <peterx@redhat.com> >> > --- >> > >> > v2: >> > - One more line squashed into to fix the build error... Please ignore v1, >> > sorry for the noise. >> > >> > This patch is based on Thomas's patch: >> > >> > [PATCH v2] migration/ram: Fix compilation with -Wshadow=local >> > https://lore.kernel.org/r/20231024092220.55305-1-thuth@redhat.com >> > >> > Above patch should have been queued by both Markus and Juan. >> > --- >> > migration/ram.c | 13 ++++++++----- >> > 1 file changed, 8 insertions(+), 5 deletions(-) >> > >> > diff --git a/migration/ram.c b/migration/ram.c >> > index 212add4481..1473bb593a 100644 >> > --- a/migration/ram.c >> > +++ b/migration/ram.c >> > @@ -3034,11 +3034,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque) >> > ret = rdma_registration_start(f, RAM_CONTROL_SETUP); >> > if (ret < 0) { >> > qemu_file_set_error(f, ret); >> > + return ret; >> >> I agree >> >> > } >> > >> > ret = rdma_registration_stop(f, RAM_CONTROL_SETUP); >> > if (ret < 0) { >> > qemu_file_set_error(f, ret); >> > + return ret; >> >> I agree >> >> > } >> > >> > migration_ops = g_malloc0(sizeof(MigrationOps)); >> > @@ -3104,6 +3106,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) >> > ret = rdma_registration_start(f, RAM_CONTROL_ROUND); >> > if (ret < 0) { >> > qemu_file_set_error(f, ret); >> > + goto out; >> >> Seems sensible >> >> > } >> > >> > t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); >> > @@ -3208,8 +3211,6 @@ static int ram_save_complete(QEMUFile *f, void *opaque) >> > rs->last_stage = !migration_in_colo_state(); >> > >> > WITH_RCU_READ_LOCK_GUARD() { >> > - int rdma_reg_ret; >> > - >> > if (!migration_in_postcopy()) { >> > migration_bitmap_sync_precopy(rs, true); >> > } >> > @@ -3217,6 +3218,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) >> > ret = rdma_registration_start(f, RAM_CONTROL_FINISH); >> > if (ret < 0) { >> > qemu_file_set_error(f, ret); >> > + break; >> >> Please >> return ret; >> >> >> We can do exactly the same with pages < 0. >> >> > } >> > >> > /* try transferring iterative blocks of memory */ >> > @@ -3240,9 +3242,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque) >> > >> > ram_flush_compressed_data(rs); >> > >> > - rdma_reg_ret = rdma_registration_stop(f, RAM_CONTROL_FINISH); >> > - if (rdma_reg_ret < 0) { >> > - qemu_file_set_error(f, rdma_reg_ret); >> > + ret = rdma_registration_stop(f, RAM_CONTROL_FINISH); >> > + if (ret < 0) { >> > + qemu_file_set_error(f, ret); >> > + break; >> > } >> > } >> >> And if we return here, we can just drop the: >> >> if (ret < 0) { >> return ret; >> } >> >> >> At the ext of the loop. > > IIUC that'll be the same as this patch, No. if you see a break, you need to search for the loop, and go to the end and see what happens. return ret; It is completely obvious what we do in case of error. > but sure thing I'll prepare a v3. Thanks.
diff --git a/migration/ram.c b/migration/ram.c index 212add4481..1473bb593a 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3034,11 +3034,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque) ret = rdma_registration_start(f, RAM_CONTROL_SETUP); if (ret < 0) { qemu_file_set_error(f, ret); + return ret; } ret = rdma_registration_stop(f, RAM_CONTROL_SETUP); if (ret < 0) { qemu_file_set_error(f, ret); + return ret; } migration_ops = g_malloc0(sizeof(MigrationOps)); @@ -3104,6 +3106,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) ret = rdma_registration_start(f, RAM_CONTROL_ROUND); if (ret < 0) { qemu_file_set_error(f, ret); + goto out; } t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); @@ -3208,8 +3211,6 @@ static int ram_save_complete(QEMUFile *f, void *opaque) rs->last_stage = !migration_in_colo_state(); WITH_RCU_READ_LOCK_GUARD() { - int rdma_reg_ret; - if (!migration_in_postcopy()) { migration_bitmap_sync_precopy(rs, true); } @@ -3217,6 +3218,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) ret = rdma_registration_start(f, RAM_CONTROL_FINISH); if (ret < 0) { qemu_file_set_error(f, ret); + break; } /* try transferring iterative blocks of memory */ @@ -3240,9 +3242,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque) ram_flush_compressed_data(rs); - rdma_reg_ret = rdma_registration_stop(f, RAM_CONTROL_FINISH); - if (rdma_reg_ret < 0) { - qemu_file_set_error(f, rdma_reg_ret); + ret = rdma_registration_stop(f, RAM_CONTROL_FINISH); + if (ret < 0) { + qemu_file_set_error(f, ret); + break; } }
In multiple places, RDMA errors are handled in a strange way, where it only sets qemu_file_set_error() but not stop the migration immediately. It's not obvious what will happen later if there is already an error. Make all such failures stop migration immediately. Cc: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Cc: Fabiano Rosas <farosas@suse.de> Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> --- v2: - One more line squashed into to fix the build error... Please ignore v1, sorry for the noise. This patch is based on Thomas's patch: [PATCH v2] migration/ram: Fix compilation with -Wshadow=local https://lore.kernel.org/r/20231024092220.55305-1-thuth@redhat.com Above patch should have been queued by both Markus and Juan. --- migration/ram.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)