Message ID | 20230829214235.69309-3-peterx@redhat.com |
---|---|
State | New |
Headers | show |
Series | migration: Better error handling in rp thread, allow failures in recover | expand |
Peter Xu <peterx@redhat.com> writes: > migrate_set_error() used one error_copy() so it always copy an error. > However that's not the major use case - the major use case is one would > like to pass the error to migrate_set_error() without further touching the > error. > > It can be proved if we see most of the callers are freeing the error > explicitly right afterwards. There're a few outliers (only if when the > caller) where we can use error_copy() explicitly there. > > Reviewed-by: Fabiano Rosas <farosas@suse.de> > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > migration/migration.h | 4 ++-- > migration/channel.c | 1 - > migration/migration.c | 22 ++++++++++++++++------ > migration/multifd.c | 10 ++++------ > migration/postcopy-ram.c | 1 - > migration/ram.c | 1 - > 6 files changed, 22 insertions(+), 17 deletions(-) > > diff --git a/migration/migration.h b/migration/migration.h > index 6eea18db36..76e35a5ecf 100644 > --- a/migration/migration.h > +++ b/migration/migration.h > @@ -465,7 +465,7 @@ bool migration_has_all_channels(void); > > uint64_t migrate_max_downtime(void); > > -void migrate_set_error(MigrationState *s, const Error *error); > +void migrate_set_error(MigrationState *s, Error *error); > > void migrate_fd_connect(MigrationState *s, Error *error_in); > > @@ -510,7 +510,7 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque); > void migration_make_urgent_request(void); > void migration_consume_urgent_request(void); > bool migration_rate_limit(void); > -void migration_cancel(const Error *error); > +void migration_cancel(Error *error); > > void populate_vfio_info(MigrationInfo *info); > void reset_vfio_bytes_transferred(void); > diff --git a/migration/channel.c b/migration/channel.c > index ca3319a309..48b3f6abd6 100644 > --- a/migration/channel.c > +++ b/migration/channel.c > @@ -90,7 +90,6 @@ void migration_channel_connect(MigrationState *s, > } > } > migrate_fd_connect(s, error); > - error_free(error); > } > > > diff --git a/migration/migration.c b/migration/migration.c > index c60064d48e..0f3ca168ed 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -162,7 +162,7 @@ void migration_object_init(void) > dirty_bitmap_mig_init(); > } > > -void migration_cancel(const Error *error) > +void migration_cancel(Error *error) > { > if (error) { > migrate_set_error(current_migration, error); > @@ -1218,11 +1218,22 @@ static void migrate_fd_cleanup_bh(void *opaque) > object_unref(OBJECT(s)); > } > > -void migrate_set_error(MigrationState *s, const Error *error) > +/* > + * Set error for current migration state. The `error' ownership will be > + * moved from the caller to MigrationState, so the caller doesn't need to > + * free the error. > + * > + * If the caller still needs to reference the `error' passed in, one should > + * use error_copy() explicitly. > + */ > +void migrate_set_error(MigrationState *s, Error *error) > { > QEMU_LOCK_GUARD(&s->error_mutex); > if (!s->error) { > - s->error = error_copy(error); > + /* Record the first error triggered */ > + s->error = error; > + } else { > + error_free(error); This will conflict logically with 908927db28 ("migration: Update error description whenever migration fails") which does: + migrate_set_error(s, local_err); + error_report_err(local_err); both functions may now try to free the error. I'm working on top of this series to try to get rid of all of those qemu_file_set_error() we have. I'm trying to use migrate_set_error() whenever possible and only set f->last_error at the very bottom IO functions.
On Tue, Sep 12, 2023 at 04:40:14PM -0300, Fabiano Rosas wrote: > Peter Xu <peterx@redhat.com> writes: > > > migrate_set_error() used one error_copy() so it always copy an error. > > However that's not the major use case - the major use case is one would > > like to pass the error to migrate_set_error() without further touching the > > error. > > > > It can be proved if we see most of the callers are freeing the error > > explicitly right afterwards. There're a few outliers (only if when the > > caller) where we can use error_copy() explicitly there. > > > > Reviewed-by: Fabiano Rosas <farosas@suse.de> > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > migration/migration.h | 4 ++-- > > migration/channel.c | 1 - > > migration/migration.c | 22 ++++++++++++++++------ > > migration/multifd.c | 10 ++++------ > > migration/postcopy-ram.c | 1 - > > migration/ram.c | 1 - > > 6 files changed, 22 insertions(+), 17 deletions(-) > > > > diff --git a/migration/migration.h b/migration/migration.h > > index 6eea18db36..76e35a5ecf 100644 > > --- a/migration/migration.h > > +++ b/migration/migration.h > > @@ -465,7 +465,7 @@ bool migration_has_all_channels(void); > > > > uint64_t migrate_max_downtime(void); > > > > -void migrate_set_error(MigrationState *s, const Error *error); > > +void migrate_set_error(MigrationState *s, Error *error); > > > > void migrate_fd_connect(MigrationState *s, Error *error_in); > > > > @@ -510,7 +510,7 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque); > > void migration_make_urgent_request(void); > > void migration_consume_urgent_request(void); > > bool migration_rate_limit(void); > > -void migration_cancel(const Error *error); > > +void migration_cancel(Error *error); > > > > void populate_vfio_info(MigrationInfo *info); > > void reset_vfio_bytes_transferred(void); > > diff --git a/migration/channel.c b/migration/channel.c > > index ca3319a309..48b3f6abd6 100644 > > --- a/migration/channel.c > > +++ b/migration/channel.c > > @@ -90,7 +90,6 @@ void migration_channel_connect(MigrationState *s, > > } > > } > > migrate_fd_connect(s, error); > > - error_free(error); > > } > > > > > > diff --git a/migration/migration.c b/migration/migration.c > > index c60064d48e..0f3ca168ed 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -162,7 +162,7 @@ void migration_object_init(void) > > dirty_bitmap_mig_init(); > > } > > > > -void migration_cancel(const Error *error) > > +void migration_cancel(Error *error) > > { > > if (error) { > > migrate_set_error(current_migration, error); > > @@ -1218,11 +1218,22 @@ static void migrate_fd_cleanup_bh(void *opaque) > > object_unref(OBJECT(s)); > > } > > > > -void migrate_set_error(MigrationState *s, const Error *error) > > +/* > > + * Set error for current migration state. The `error' ownership will be > > + * moved from the caller to MigrationState, so the caller doesn't need to > > + * free the error. > > + * > > + * If the caller still needs to reference the `error' passed in, one should > > + * use error_copy() explicitly. > > + */ > > +void migrate_set_error(MigrationState *s, Error *error) > > { > > QEMU_LOCK_GUARD(&s->error_mutex); > > if (!s->error) { > > - s->error = error_copy(error); > > + /* Record the first error triggered */ > > + s->error = error; > > + } else { > > + error_free(error); > > This will conflict logically with 908927db28 ("migration: Update error > description whenever migration fails") which does: > > + migrate_set_error(s, local_err); > + error_report_err(local_err); > > both functions may now try to free the error. Indeed, thanks for spotting this. Perhaps I should just drop the error_report_err() if we've set the error already anyway. > > > I'm working on top of this series to try to get rid of all of those > qemu_file_set_error() we have. I'm trying to use migrate_set_error() > whenever possible and only set f->last_error at the very bottom IO > functions. I'll read when it comes.
diff --git a/migration/migration.h b/migration/migration.h index 6eea18db36..76e35a5ecf 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -465,7 +465,7 @@ bool migration_has_all_channels(void); uint64_t migrate_max_downtime(void); -void migrate_set_error(MigrationState *s, const Error *error); +void migrate_set_error(MigrationState *s, Error *error); void migrate_fd_connect(MigrationState *s, Error *error_in); @@ -510,7 +510,7 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque); void migration_make_urgent_request(void); void migration_consume_urgent_request(void); bool migration_rate_limit(void); -void migration_cancel(const Error *error); +void migration_cancel(Error *error); void populate_vfio_info(MigrationInfo *info); void reset_vfio_bytes_transferred(void); diff --git a/migration/channel.c b/migration/channel.c index ca3319a309..48b3f6abd6 100644 --- a/migration/channel.c +++ b/migration/channel.c @@ -90,7 +90,6 @@ void migration_channel_connect(MigrationState *s, } } migrate_fd_connect(s, error); - error_free(error); } diff --git a/migration/migration.c b/migration/migration.c index c60064d48e..0f3ca168ed 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -162,7 +162,7 @@ void migration_object_init(void) dirty_bitmap_mig_init(); } -void migration_cancel(const Error *error) +void migration_cancel(Error *error) { if (error) { migrate_set_error(current_migration, error); @@ -1218,11 +1218,22 @@ static void migrate_fd_cleanup_bh(void *opaque) object_unref(OBJECT(s)); } -void migrate_set_error(MigrationState *s, const Error *error) +/* + * Set error for current migration state. The `error' ownership will be + * moved from the caller to MigrationState, so the caller doesn't need to + * free the error. + * + * If the caller still needs to reference the `error' passed in, one should + * use error_copy() explicitly. + */ +void migrate_set_error(MigrationState *s, Error *error) { QEMU_LOCK_GUARD(&s->error_mutex); if (!s->error) { - s->error = error_copy(error); + /* Record the first error triggered */ + s->error = error; + } else { + error_free(error); } } @@ -1235,7 +1246,7 @@ static void migrate_error_free(MigrationState *s) } } -static void migrate_fd_error(MigrationState *s, const Error *error) +static void migrate_fd_error(MigrationState *s, Error *error) { trace_migrate_fd_error(error_get_pretty(error)); assert(s->to_dst_file == NULL); @@ -1703,7 +1714,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, if (!resume_requested) { yank_unregister_instance(MIGRATION_YANK_INSTANCE); } - migrate_fd_error(s, local_err); + migrate_fd_error(s, error_copy(local_err)); error_propagate(errp, local_err); return; } @@ -2626,7 +2637,6 @@ static MigThrError migration_detect_error(MigrationState *s) if (local_error) { migrate_set_error(s, local_error); - error_free(local_error); } if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret) { diff --git a/migration/multifd.c b/migration/multifd.c index 0f6b203877..69d56104fb 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -551,7 +551,6 @@ void multifd_save_cleanup(void) multifd_send_state->ops->send_cleanup(p, &local_err); if (local_err) { migrate_set_error(migrate_get_current(), local_err); - error_free(local_err); } } qemu_sem_destroy(&multifd_send_state->channels_ready); @@ -750,7 +749,6 @@ out: if (local_err) { trace_multifd_send_error(p->id); multifd_send_terminate_threads(local_err); - error_free(local_err); } /* @@ -883,7 +881,6 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p, */ p->quit = true; object_unref(OBJECT(ioc)); - error_free(err); } static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque) @@ -1148,7 +1145,6 @@ static void *multifd_recv_thread(void *opaque) if (local_err) { multifd_recv_terminate_threads(local_err); - error_free(local_err); } qemu_mutex_lock(&p->mutex); p->running = false; @@ -1240,7 +1236,8 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp) id = multifd_recv_initial_packet(ioc, &local_err); if (id < 0) { - multifd_recv_terminate_threads(local_err); + /* Copy local error because we'll also return it to caller */ + multifd_recv_terminate_threads(error_copy(local_err)); error_propagate_prepend(errp, local_err, "failed to receive packet" " via multifd channel %d: ", @@ -1253,7 +1250,8 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp) if (p->c != NULL) { error_setg(&local_err, "multifd: received id '%d' already setup'", id); - multifd_recv_terminate_threads(local_err); + /* Copy local error because we'll also return it to caller */ + multifd_recv_terminate_threads(error_copy(local_err)); error_propagate(errp, local_err); return; } diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 29aea9456d..8a93b5504d 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -1594,7 +1594,6 @@ postcopy_preempt_send_channel_done(MigrationState *s, { if (local_err) { migrate_set_error(s, local_err); - error_free(local_err); } else { migration_ioc_register_yank(ioc); s->postcopy_qemufile_src = qemu_file_new_output(ioc); diff --git a/migration/ram.c b/migration/ram.c index 9040d66e61..fc7fe0e6e8 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4308,7 +4308,6 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host, */ error_setg(&err, "RAM block '%s' resized during precopy.", rb->idstr); migration_cancel(err); - error_free(err); } switch (ps) {