diff mbox series

[for-8.0,2/3] migration: Fix potential race on postcopy_qemufile_src

Message ID 20230326172540.2571110-3-peterx@redhat.com
State New
Headers show
Series migration: Preempt mode fixes for 8.0 | expand

Commit Message

Peter Xu March 26, 2023, 5:25 p.m. UTC
postcopy_qemufile_src object should be owned by one thread, either the main
thread (e.g. when at the beginning, or at the end of migration), or by the
return path thread (when during a preempt enabled postcopy migration).  If
that's not the case the access to the object might be racy.

postcopy_preempt_shutdown_file() can be potentially racy, because it's
called at the end phase of migration on the main thread, however during
which the return path thread hasn't yet been recycled; the recycle happens
in await_return_path_close_on_source() which is after this point.

It means, logically it's posslbe the main thread and the return path thread
are both operating on the same qemufile.  While I don't think qemufile is
thread safe at all.

postcopy_preempt_shutdown_file() used to be needed because that's where we
send EOS to dest so that dest can safely shutdown the preempt thread.

To avoid the possible race, remove this only place that a race can happen.
Instead we figure out another way to safely close the preempt thread on
dest.

The core idea during postcopy on deciding "when to stop" is that dest will
send a postcopy SHUT message to src, telling src that all data is there.
Hence to shut the dest preempt thread maybe better to do it directly on
dest node.

This patch proposed such a way that we change postcopy_prio_thread_created
into PreemptThreadStatus, so that we kick the preempt thread on dest qemu
by a sequence of:

  mis->preempt_thread_status = PREEMPT_THREAD_QUIT;
  qemu_file_shutdown(mis->postcopy_qemufile_dst);

While here shutdown() is probably so far the easiest way to kick preempt
thread from a blocked qemu_get_be64().  Then it reads preempt_thread_status
to make sure it's not a network failure but a willingness to quit the
thread.

We could have avoided that extra status but just rely on migration status.
The problem is postcopy_ram_incoming_cleanup() is just called early enough
so we're still during POSTCOPY_ACTIVE no matter what.. So just make it
simple to have the status introduced.

One flag x-preempt-pre-7-2 is added to keep old pre-7.2 behaviors of
postcopy preempt.

Fixes: 9358982744 ("migration: Send requested page directly in rp-return thread")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/core/machine.c        |  1 +
 migration/migration.c    | 10 ++++++++--
 migration/migration.h    | 34 +++++++++++++++++++++++++++++++++-
 migration/postcopy-ram.c | 20 +++++++++++++++-----
 4 files changed, 57 insertions(+), 8 deletions(-)

Comments

Juan Quintela April 12, 2023, 7:18 p.m. UTC | #1
Peter Xu <peterx@redhat.com> wrote:
> postcopy_qemufile_src object should be owned by one thread, either the main
> thread (e.g. when at the beginning, or at the end of migration), or by the
> return path thread (when during a preempt enabled postcopy migration).  If
> that's not the case the access to the object might be racy.
>
> postcopy_preempt_shutdown_file() can be potentially racy, because it's
> called at the end phase of migration on the main thread, however during
> which the return path thread hasn't yet been recycled; the recycle happens
> in await_return_path_close_on_source() which is after this point.
>
> It means, logically it's posslbe the main thread and the return path thread
> are both operating on the same qemufile.  While I don't think qemufile is
> thread safe at all.
>
> postcopy_preempt_shutdown_file() used to be needed because that's where we
> send EOS to dest so that dest can safely shutdown the preempt thread.
>
> To avoid the possible race, remove this only place that a race can happen.
> Instead we figure out another way to safely close the preempt thread on
> dest.
>
> The core idea during postcopy on deciding "when to stop" is that dest will
> send a postcopy SHUT message to src, telling src that all data is there.
> Hence to shut the dest preempt thread maybe better to do it directly on
> dest node.
>
> This patch proposed such a way that we change postcopy_prio_thread_created
> into PreemptThreadStatus, so that we kick the preempt thread on dest qemu
> by a sequence of:
>
>   mis->preempt_thread_status = PREEMPT_THREAD_QUIT;
>   qemu_file_shutdown(mis->postcopy_qemufile_dst);
>
> While here shutdown() is probably so far the easiest way to kick preempt
> thread from a blocked qemu_get_be64().  Then it reads preempt_thread_status
> to make sure it's not a network failure but a willingness to quit the
> thread.
>
> We could have avoided that extra status but just rely on migration status.
> The problem is postcopy_ram_incoming_cleanup() is just called early enough
> so we're still during POSTCOPY_ACTIVE no matter what.. So just make it
> simple to have the status introduced.
>
> One flag x-preempt-pre-7-2 is added to keep old pre-7.2 behaviors of
> postcopy preempt.
>
> Fixes: 9358982744 ("migration: Send requested page directly in rp-return thread")
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/core/machine.c        |  1 +
>  migration/migration.c    | 10 ++++++++--
>  migration/migration.h    | 34 +++++++++++++++++++++++++++++++++-
>  migration/postcopy-ram.c | 20 +++++++++++++++-----
>  4 files changed, 57 insertions(+), 8 deletions(-)
>

As preempt is pretty new I will ....

Reviewed-by: Juan Quintela <quintela@redhat.com>

But code is quite subtle.
queued.
diff mbox series

Patch

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 45e3d24fdc..cd13b8b0a3 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -42,6 +42,7 @@ 
 GlobalProperty hw_compat_7_2[] = {
     { "e1000e", "migrate-timadj", "off" },
     { "virtio-mem", "x-early-migration", "false" },
+    { "migration", "x-preempt-pre-7-2", "true" },
 };
 const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2);
 
diff --git a/migration/migration.c b/migration/migration.c
index ae2025d9d8..37fc4fb3e2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3464,8 +3464,12 @@  static void migration_completion(MigrationState *s)
         qemu_savevm_state_complete_postcopy(s->to_dst_file);
         qemu_mutex_unlock_iothread();
 
-        /* Shutdown the postcopy fast path thread */
-        if (migrate_postcopy_preempt()) {
+        /*
+         * Shutdown the postcopy fast path thread.  This is only needed
+         * when dest QEMU binary is old (7.1/7.2).  QEMU 8.0+ doesn't need
+         * this.
+         */
+        if (migrate_postcopy_preempt() && s->preempt_pre_7_2) {
             postcopy_preempt_shutdown_file(s);
         }
 
@@ -4443,6 +4447,8 @@  static Property migration_properties[] = {
                       decompress_error_check, true),
     DEFINE_PROP_UINT8("x-clear-bitmap-shift", MigrationState,
                       clear_bitmap_shift, CLEAR_BITMAP_SHIFT_DEFAULT),
+    DEFINE_PROP_BOOL("x-preempt-pre-7-2", MigrationState,
+                     preempt_pre_7_2, false),
 
     /* Migration parameters */
     DEFINE_PROP_UINT8("x-compress-level", MigrationState,
diff --git a/migration/migration.h b/migration/migration.h
index 2da2f8a164..67baba2184 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -65,6 +65,12 @@  typedef struct {
     bool all_zero;
 } PostcopyTmpPage;
 
+typedef enum {
+    PREEMPT_THREAD_NONE = 0,
+    PREEMPT_THREAD_CREATED,
+    PREEMPT_THREAD_QUIT,
+} PreemptThreadStatus;
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
@@ -124,7 +130,12 @@  struct MigrationIncomingState {
     QemuSemaphore postcopy_qemufile_dst_done;
     /* Postcopy priority thread is used to receive postcopy requested pages */
     QemuThread postcopy_prio_thread;
-    bool postcopy_prio_thread_created;
+    /*
+     * Always set by the main vm load thread only, but can be read by the
+     * postcopy preempt thread.  "volatile" makes sure all reads will be
+     * uptodate across cores.
+     */
+    volatile PreemptThreadStatus preempt_thread_status;
     /*
      * Used to sync between the ram load main thread and the fast ram load
      * thread.  It protects postcopy_qemufile_dst, which is the postcopy
@@ -364,6 +375,27 @@  struct MigrationState {
      * do not trigger spurious decompression errors.
      */
     bool decompress_error_check;
+    /*
+     * This variable only affects behavior when postcopy preempt mode is
+     * enabled.
+     *
+     * When set:
+     *
+     * - postcopy preempt src QEMU instance will generate an EOS message at
+     *   the end of migration to shut the preempt channel on dest side.
+     *
+     * When clear:
+     *
+     * - postcopy preempt src QEMU instance will _not_ generate an EOS
+     *   message at the end of migration, the dest qemu will shutdown the
+     *   channel itself.
+     *
+     * NOTE: See message-id <ZBoShWArKDPpX/D7@work-vm> on qemu-devel
+     * mailing list for more information on the possible race.  Everyone
+     * should probably just keep this value untouched after set by the
+     * machine type (or the default).
+     */
+    bool preempt_pre_7_2;
 
     /*
      * This decides the size of guest memory chunk that will be used
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 41c0713650..263bab75ec 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -568,9 +568,14 @@  int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
 {
     trace_postcopy_ram_incoming_cleanup_entry();
 
-    if (mis->postcopy_prio_thread_created) {
+    if (mis->preempt_thread_status == PREEMPT_THREAD_CREATED) {
+        /* Notify the fast load thread to quit */
+        mis->preempt_thread_status = PREEMPT_THREAD_QUIT;
+        if (mis->postcopy_qemufile_dst) {
+            qemu_file_shutdown(mis->postcopy_qemufile_dst);
+        }
         qemu_thread_join(&mis->postcopy_prio_thread);
-        mis->postcopy_prio_thread_created = false;
+        mis->preempt_thread_status = PREEMPT_THREAD_NONE;
     }
 
     if (mis->have_fault_thread) {
@@ -1203,7 +1208,7 @@  int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
          */
         postcopy_thread_create(mis, &mis->postcopy_prio_thread, "fault-fast",
                                postcopy_preempt_thread, QEMU_THREAD_JOINABLE);
-        mis->postcopy_prio_thread_created = true;
+        mis->preempt_thread_status = PREEMPT_THREAD_CREATED;
     }
 
     trace_postcopy_ram_enable_notify();
@@ -1652,6 +1657,11 @@  static void postcopy_pause_ram_fast_load(MigrationIncomingState *mis)
     trace_postcopy_pause_fast_load_continued();
 }
 
+static bool preempt_thread_should_run(MigrationIncomingState *mis)
+{
+    return mis->preempt_thread_status != PREEMPT_THREAD_QUIT;
+}
+
 void *postcopy_preempt_thread(void *opaque)
 {
     MigrationIncomingState *mis = opaque;
@@ -1671,11 +1681,11 @@  void *postcopy_preempt_thread(void *opaque)
 
     /* Sending RAM_SAVE_FLAG_EOS to terminate this thread */
     qemu_mutex_lock(&mis->postcopy_prio_thread_mutex);
-    while (1) {
+    while (preempt_thread_should_run(mis)) {
         ret = ram_load_postcopy(mis->postcopy_qemufile_dst,
                                 RAM_CHANNEL_POSTCOPY);
         /* If error happened, go into recovery routine */
-        if (ret) {
+        if (ret && preempt_thread_should_run(mis)) {
             postcopy_pause_ram_fast_load(mis);
         } else {
             /* We're done */