diff mbox series

[2/9] migration: Let migrate_set_error() take ownership

Message ID 20230829214235.69309-3-peterx@redhat.com
State New
Headers show
Series migration: Better error handling in rp thread, allow failures in recover | expand

Commit Message

Peter Xu Aug. 29, 2023, 9:42 p.m. UTC
migrate_set_error() used one error_copy() so it always copy an error.
However that's not the major use case - the major use case is one would
like to pass the error to migrate_set_error() without further touching the
error.

It can be proved if we see most of the callers are freeing the error
explicitly right afterwards.  There're a few outliers (only if when the
caller) where we can use error_copy() explicitly there.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h    |  4 ++--
 migration/channel.c      |  1 -
 migration/migration.c    | 22 ++++++++++++++++------
 migration/multifd.c      | 10 ++++------
 migration/postcopy-ram.c |  1 -
 migration/ram.c          |  1 -
 6 files changed, 22 insertions(+), 17 deletions(-)

Comments

Fabiano Rosas Sept. 12, 2023, 7:40 p.m. UTC | #1
Peter Xu <peterx@redhat.com> writes:

> migrate_set_error() used one error_copy() so it always copy an error.
> However that's not the major use case - the major use case is one would
> like to pass the error to migrate_set_error() without further touching the
> error.
>
> It can be proved if we see most of the callers are freeing the error
> explicitly right afterwards.  There're a few outliers (only if when the
> caller) where we can use error_copy() explicitly there.
>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.h    |  4 ++--
>  migration/channel.c      |  1 -
>  migration/migration.c    | 22 ++++++++++++++++------
>  migration/multifd.c      | 10 ++++------
>  migration/postcopy-ram.c |  1 -
>  migration/ram.c          |  1 -
>  6 files changed, 22 insertions(+), 17 deletions(-)
>
> diff --git a/migration/migration.h b/migration/migration.h
> index 6eea18db36..76e35a5ecf 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -465,7 +465,7 @@ bool  migration_has_all_channels(void);
>  
>  uint64_t migrate_max_downtime(void);
>  
> -void migrate_set_error(MigrationState *s, const Error *error);
> +void migrate_set_error(MigrationState *s, Error *error);
>  
>  void migrate_fd_connect(MigrationState *s, Error *error_in);
>  
> @@ -510,7 +510,7 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
>  void migration_make_urgent_request(void);
>  void migration_consume_urgent_request(void);
>  bool migration_rate_limit(void);
> -void migration_cancel(const Error *error);
> +void migration_cancel(Error *error);
>  
>  void populate_vfio_info(MigrationInfo *info);
>  void reset_vfio_bytes_transferred(void);
> diff --git a/migration/channel.c b/migration/channel.c
> index ca3319a309..48b3f6abd6 100644
> --- a/migration/channel.c
> +++ b/migration/channel.c
> @@ -90,7 +90,6 @@ void migration_channel_connect(MigrationState *s,
>          }
>      }
>      migrate_fd_connect(s, error);
> -    error_free(error);
>  }
>  
>  
> diff --git a/migration/migration.c b/migration/migration.c
> index c60064d48e..0f3ca168ed 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -162,7 +162,7 @@ void migration_object_init(void)
>      dirty_bitmap_mig_init();
>  }
>  
> -void migration_cancel(const Error *error)
> +void migration_cancel(Error *error)
>  {
>      if (error) {
>          migrate_set_error(current_migration, error);
> @@ -1218,11 +1218,22 @@ static void migrate_fd_cleanup_bh(void *opaque)
>      object_unref(OBJECT(s));
>  }
>  
> -void migrate_set_error(MigrationState *s, const Error *error)
> +/*
> + * Set error for current migration state.  The `error' ownership will be
> + * moved from the caller to MigrationState, so the caller doesn't need to
> + * free the error.
> + *
> + * If the caller still needs to reference the `error' passed in, one should
> + * use error_copy() explicitly.
> + */
> +void migrate_set_error(MigrationState *s, Error *error)
>  {
>      QEMU_LOCK_GUARD(&s->error_mutex);
>      if (!s->error) {
> -        s->error = error_copy(error);
> +        /* Record the first error triggered */
> +        s->error = error;
> +    } else {
> +        error_free(error);

This will conflict logically with 908927db28 ("migration: Update error
description whenever migration fails") which does:

+            migrate_set_error(s, local_err);
+            error_report_err(local_err);

both functions may now try to free the error.


I'm working on top of this series to try to get rid of all of those
qemu_file_set_error() we have. I'm trying to use migrate_set_error()
whenever possible and only set f->last_error at the very bottom IO
functions.
Peter Xu Sept. 12, 2023, 8:14 p.m. UTC | #2
On Tue, Sep 12, 2023 at 04:40:14PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > migrate_set_error() used one error_copy() so it always copy an error.
> > However that's not the major use case - the major use case is one would
> > like to pass the error to migrate_set_error() without further touching the
> > error.
> >
> > It can be proved if we see most of the callers are freeing the error
> > explicitly right afterwards.  There're a few outliers (only if when the
> > caller) where we can use error_copy() explicitly there.
> >
> > Reviewed-by: Fabiano Rosas <farosas@suse.de>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/migration.h    |  4 ++--
> >  migration/channel.c      |  1 -
> >  migration/migration.c    | 22 ++++++++++++++++------
> >  migration/multifd.c      | 10 ++++------
> >  migration/postcopy-ram.c |  1 -
> >  migration/ram.c          |  1 -
> >  6 files changed, 22 insertions(+), 17 deletions(-)
> >
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 6eea18db36..76e35a5ecf 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -465,7 +465,7 @@ bool  migration_has_all_channels(void);
> >  
> >  uint64_t migrate_max_downtime(void);
> >  
> > -void migrate_set_error(MigrationState *s, const Error *error);
> > +void migrate_set_error(MigrationState *s, Error *error);
> >  
> >  void migrate_fd_connect(MigrationState *s, Error *error_in);
> >  
> > @@ -510,7 +510,7 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
> >  void migration_make_urgent_request(void);
> >  void migration_consume_urgent_request(void);
> >  bool migration_rate_limit(void);
> > -void migration_cancel(const Error *error);
> > +void migration_cancel(Error *error);
> >  
> >  void populate_vfio_info(MigrationInfo *info);
> >  void reset_vfio_bytes_transferred(void);
> > diff --git a/migration/channel.c b/migration/channel.c
> > index ca3319a309..48b3f6abd6 100644
> > --- a/migration/channel.c
> > +++ b/migration/channel.c
> > @@ -90,7 +90,6 @@ void migration_channel_connect(MigrationState *s,
> >          }
> >      }
> >      migrate_fd_connect(s, error);
> > -    error_free(error);
> >  }
> >  
> >  
> > diff --git a/migration/migration.c b/migration/migration.c
> > index c60064d48e..0f3ca168ed 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -162,7 +162,7 @@ void migration_object_init(void)
> >      dirty_bitmap_mig_init();
> >  }
> >  
> > -void migration_cancel(const Error *error)
> > +void migration_cancel(Error *error)
> >  {
> >      if (error) {
> >          migrate_set_error(current_migration, error);
> > @@ -1218,11 +1218,22 @@ static void migrate_fd_cleanup_bh(void *opaque)
> >      object_unref(OBJECT(s));
> >  }
> >  
> > -void migrate_set_error(MigrationState *s, const Error *error)
> > +/*
> > + * Set error for current migration state.  The `error' ownership will be
> > + * moved from the caller to MigrationState, so the caller doesn't need to
> > + * free the error.
> > + *
> > + * If the caller still needs to reference the `error' passed in, one should
> > + * use error_copy() explicitly.
> > + */
> > +void migrate_set_error(MigrationState *s, Error *error)
> >  {
> >      QEMU_LOCK_GUARD(&s->error_mutex);
> >      if (!s->error) {
> > -        s->error = error_copy(error);
> > +        /* Record the first error triggered */
> > +        s->error = error;
> > +    } else {
> > +        error_free(error);
> 
> This will conflict logically with 908927db28 ("migration: Update error
> description whenever migration fails") which does:
> 
> +            migrate_set_error(s, local_err);
> +            error_report_err(local_err);
> 
> both functions may now try to free the error.

Indeed, thanks for spotting this.  Perhaps I should just drop the
error_report_err() if we've set the error already anyway.

> 
> 
> I'm working on top of this series to try to get rid of all of those
> qemu_file_set_error() we have. I'm trying to use migrate_set_error()
> whenever possible and only set f->last_error at the very bottom IO
> functions.

I'll read when it comes.
diff mbox series

Patch

diff --git a/migration/migration.h b/migration/migration.h
index 6eea18db36..76e35a5ecf 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -465,7 +465,7 @@  bool  migration_has_all_channels(void);
 
 uint64_t migrate_max_downtime(void);
 
-void migrate_set_error(MigrationState *s, const Error *error);
+void migrate_set_error(MigrationState *s, Error *error);
 
 void migrate_fd_connect(MigrationState *s, Error *error_in);
 
@@ -510,7 +510,7 @@  int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
 void migration_make_urgent_request(void);
 void migration_consume_urgent_request(void);
 bool migration_rate_limit(void);
-void migration_cancel(const Error *error);
+void migration_cancel(Error *error);
 
 void populate_vfio_info(MigrationInfo *info);
 void reset_vfio_bytes_transferred(void);
diff --git a/migration/channel.c b/migration/channel.c
index ca3319a309..48b3f6abd6 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -90,7 +90,6 @@  void migration_channel_connect(MigrationState *s,
         }
     }
     migrate_fd_connect(s, error);
-    error_free(error);
 }
 
 
diff --git a/migration/migration.c b/migration/migration.c
index c60064d48e..0f3ca168ed 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -162,7 +162,7 @@  void migration_object_init(void)
     dirty_bitmap_mig_init();
 }
 
-void migration_cancel(const Error *error)
+void migration_cancel(Error *error)
 {
     if (error) {
         migrate_set_error(current_migration, error);
@@ -1218,11 +1218,22 @@  static void migrate_fd_cleanup_bh(void *opaque)
     object_unref(OBJECT(s));
 }
 
-void migrate_set_error(MigrationState *s, const Error *error)
+/*
+ * Set error for current migration state.  The `error' ownership will be
+ * moved from the caller to MigrationState, so the caller doesn't need to
+ * free the error.
+ *
+ * If the caller still needs to reference the `error' passed in, one should
+ * use error_copy() explicitly.
+ */
+void migrate_set_error(MigrationState *s, Error *error)
 {
     QEMU_LOCK_GUARD(&s->error_mutex);
     if (!s->error) {
-        s->error = error_copy(error);
+        /* Record the first error triggered */
+        s->error = error;
+    } else {
+        error_free(error);
     }
 }
 
@@ -1235,7 +1246,7 @@  static void migrate_error_free(MigrationState *s)
     }
 }
 
-static void migrate_fd_error(MigrationState *s, const Error *error)
+static void migrate_fd_error(MigrationState *s, Error *error)
 {
     trace_migrate_fd_error(error_get_pretty(error));
     assert(s->to_dst_file == NULL);
@@ -1703,7 +1714,7 @@  void qmp_migrate(const char *uri, bool has_blk, bool blk,
         if (!resume_requested) {
             yank_unregister_instance(MIGRATION_YANK_INSTANCE);
         }
-        migrate_fd_error(s, local_err);
+        migrate_fd_error(s, error_copy(local_err));
         error_propagate(errp, local_err);
         return;
     }
@@ -2626,7 +2637,6 @@  static MigThrError migration_detect_error(MigrationState *s)
 
     if (local_error) {
         migrate_set_error(s, local_error);
-        error_free(local_error);
     }
 
     if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret) {
diff --git a/migration/multifd.c b/migration/multifd.c
index 0f6b203877..69d56104fb 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -551,7 +551,6 @@  void multifd_save_cleanup(void)
         multifd_send_state->ops->send_cleanup(p, &local_err);
         if (local_err) {
             migrate_set_error(migrate_get_current(), local_err);
-            error_free(local_err);
         }
     }
     qemu_sem_destroy(&multifd_send_state->channels_ready);
@@ -750,7 +749,6 @@  out:
     if (local_err) {
         trace_multifd_send_error(p->id);
         multifd_send_terminate_threads(local_err);
-        error_free(local_err);
     }
 
     /*
@@ -883,7 +881,6 @@  static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
       */
      p->quit = true;
      object_unref(OBJECT(ioc));
-     error_free(err);
 }
 
 static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
@@ -1148,7 +1145,6 @@  static void *multifd_recv_thread(void *opaque)
 
     if (local_err) {
         multifd_recv_terminate_threads(local_err);
-        error_free(local_err);
     }
     qemu_mutex_lock(&p->mutex);
     p->running = false;
@@ -1240,7 +1236,8 @@  void multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
 
     id = multifd_recv_initial_packet(ioc, &local_err);
     if (id < 0) {
-        multifd_recv_terminate_threads(local_err);
+        /* Copy local error because we'll also return it to caller */
+        multifd_recv_terminate_threads(error_copy(local_err));
         error_propagate_prepend(errp, local_err,
                                 "failed to receive packet"
                                 " via multifd channel %d: ",
@@ -1253,7 +1250,8 @@  void multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
     if (p->c != NULL) {
         error_setg(&local_err, "multifd: received id '%d' already setup'",
                    id);
-        multifd_recv_terminate_threads(local_err);
+        /* Copy local error because we'll also return it to caller */
+        multifd_recv_terminate_threads(error_copy(local_err));
         error_propagate(errp, local_err);
         return;
     }
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 29aea9456d..8a93b5504d 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1594,7 +1594,6 @@  postcopy_preempt_send_channel_done(MigrationState *s,
 {
     if (local_err) {
         migrate_set_error(s, local_err);
-        error_free(local_err);
     } else {
         migration_ioc_register_yank(ioc);
         s->postcopy_qemufile_src = qemu_file_new_output(ioc);
diff --git a/migration/ram.c b/migration/ram.c
index 9040d66e61..fc7fe0e6e8 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4308,7 +4308,6 @@  static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
          */
         error_setg(&err, "RAM block '%s' resized during precopy.", rb->idstr);
         migration_cancel(err);
-        error_free(err);
     }
 
     switch (ps) {