Message ID | 20230829214235.69309-10-peterx@redhat.com |
---|---|
State | New |
Headers | show |
Series | migration: Better error handling in rp thread, allow failures in recover | expand |
Peter Xu <peterx@redhat.com> writes: Hi, sorry it took me so long to get to this. > Normally the postcopy recover phase should only exist for a super short > period, that's the duration when QEMU is trying to recover from an > interrupted postcopy migration, during which handshake will be carried out > for continuing the procedure with state changes from PAUSED -> RECOVER -> > POSTCOPY_ACTIVE again. > > Here RECOVER phase should be super small, that happens right after the > admin specified a new but working network link for QEMU to reconnect to > dest QEMU. > > However there can still be case where the channel is broken in this small > RECOVER window. > > If it happens, with current code there's no way the src QEMU can got kicked > out of RECOVER stage. No way either to retry the recover in another channel > when established. > > This patch allows the RECOVER phase to fail itself too - we're mostly > ready, just some small things missing, e.g. properly kick the main > migration thread out when sleeping on rp_sem when we found that we're at > RECOVER stage. When this happens, it fails the RECOVER itself, and > rollback to PAUSED stage. Then the user can retry another round of > recovery. > > To make it even stronger, teach QMP command migrate-pause to explicitly > kick src/dst QEMU out when needed, so even if for some reason the migration > thread didn't got kicked out already by a failing rethrn-path thread, the > admin can also kick it out. > > This will be an super, super corner case, but still try to cover that. It would be nice to have a test for this. Being such a corner case, it will be hard to keep this scenario working. I wrote two tests[1] that do the recovery each using a different URI: 1) fd: using a freshly opened file, 2) fd: using a socketpair that simply has nothing on the other end. These might be too far from the original bug, but it seems to exercise some of the same paths: Scenario 1: /x86_64/migration/postcopy/recovery/fail-twice the stacks are: Thread 8 (Thread 0x7fffd5ffe700 (LWP 30282) "live_migration"): qemu_sem_wait ram_dirty_bitmap_sync_all ram_resume_prepare qemu_savevm_state_resume_prepare postcopy_do_resume postcopy_pause migration_detect_error migration_thread Thread 7 (Thread 0x7fffd67ff700 (LWP 30281) "return path"): qemu_sem_wait postcopy_pause_return_path_thread source_return_path_thread This patch seems to fix it, although we cannot call qmp_migrate_recover a second time because the mis state is now in RECOVER: "Migrate recover can only be run when postcopy is paused." Do we maybe need to return the state to PAUSED, or allow qmp_migrate_recover to run in RECOVER, like you did on the src side? Scenario 2: /x86_64/migration/postcopy/recovery/fail-twice/rp Thread 8 (Thread 0x7fffd5ffe700 (LWP 30456) "live_migration"): qemu_sem_wait ram_dirty_bitmap_sync_all ram_resume_prepare qemu_savevm_state_resume_prepare postcopy_do_resume postcopy_pause migration_detect_error migration_thread Thread 7 (Thread 0x7fffd67ff700 (LWP 30455) "return path"): recvmsg qio_channel_socket_readv qio_channel_readv_full qio_channel_read qemu_fill_buffer qemu_peek_byte qemu_get_byte qemu_get_be16 source_return_path_thread Here, with this patch the migration gets stuck unless we call migrate_pause() one more time. After another round of migrate_pause + recover, it finishes properly. 1- hacked-together test: -->8-- From a34685c8795799350665a880cd2ddddbf53c5812 Mon Sep 17 00:00:00 2001 From: Fabiano Rosas <farosas@suse.de> Date: Mon, 11 Sep 2023 20:45:33 -0300 Subject: [PATCH] test patch --- tests/qtest/migration-test.c | 87 ++++++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 1b43df5ca7..4d9d2209c1 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -695,6 +695,7 @@ typedef struct { /* Postcopy specific fields */ void *postcopy_data; bool postcopy_preempt; + int postcopy_recovery_method; } MigrateCommon; static int test_migrate_start(QTestState **from, QTestState **to, @@ -1357,6 +1358,61 @@ static void test_postcopy_preempt_tls_psk(void) } #endif +static void postcopy_recover_fail(QTestState *from, QTestState *to, int method) +{ + int src, dst; + + if (method == 1) { + /* give it some random fd to recover */ + g_autofree char *uri = g_strdup_printf("%s/noop", tmpfs); + src = dst = open(uri, O_CREAT|O_RDWR); + } else if (method == 2) { + int ret, pair1[2], pair2[2]; + + /* create two unrelated socketpairs */ + ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair1); + g_assert_cmpint(ret, ==, 0); + + ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair2); + g_assert_cmpint(ret, ==, 0); + + /* give the guests unpaired ends of the sockets */ + src = pair1[0]; + dst = pair2[0]; + } + + qtest_qmp_fds_assert_success(to, &src, 1, + "{ 'execute': 'getfd'," + " 'arguments': { 'fdname': 'fd-mig' }}"); + + qtest_qmp_fds_assert_success(from, &dst, 1, + "{ 'execute': 'getfd'," + " 'arguments': { 'fdname': 'fd-mig' }}"); + + migrate_recover(to, "fd:fd-mig"); + + wait_for_migration_status(from, "postcopy-paused", + (const char * []) { "failed", "active", + "completed", NULL }); + migrate_qmp(from, "fd:fd-mig", "{'resume': true}"); + + printf("WAIT\n"); + if (method == 2) { + /* This would be issued by the admin upon noticing the hang */ + migrate_pause(from); + } + + wait_for_migration_status(from, "postcopy-paused", + (const char * []) { "failed", "active", + "completed", NULL }); + printf("PAUSED\n"); + + close(src); + if (method == 2) { + close(dst); + } +} + static void test_postcopy_recovery_common(MigrateCommon *args) { QTestState *from, *to; @@ -1396,6 +1452,13 @@ static void test_postcopy_recovery_common(MigrateCommon *args) (const char * []) { "failed", "active", "completed", NULL }); + if (args->postcopy_recovery_method) { + /* fail the recovery */ + postcopy_recover_fail(from, to, args->postcopy_recovery_method); + + /* continue with a good recovery */ + } + /* * Create a new socket to emulate a new channel that is different * from the broken migration channel; tell the destination to @@ -1435,6 +1498,24 @@ static void test_postcopy_recovery_compress(void) test_postcopy_recovery_common(&args); } +static void test_postcopy_recovery_fail(void) +{ + MigrateCommon args = { + .postcopy_recovery_method = 1, + }; + + test_postcopy_recovery_common(&args); +} + +static void test_postcopy_recovery_fail_rp(void) +{ + MigrateCommon args = { + .postcopy_recovery_method = 2, + }; + + test_postcopy_recovery_common(&args); +} + #ifdef CONFIG_GNUTLS static void test_postcopy_recovery_tls_psk(void) { @@ -2825,6 +2906,12 @@ int main(int argc, char **argv) qtest_add_func("/migration/postcopy/recovery/compress/plain", test_postcopy_recovery_compress); } + qtest_add_func("/migration/postcopy/recovery/fail-twice", + test_postcopy_recovery_fail); + + qtest_add_func("/migration/postcopy/recovery/fail-twice/rp", + test_postcopy_recovery_fail_rp); + } qtest_add_func("/migration/bad_dest", test_baddest);
On Mon, Sep 11, 2023 at 09:31:51PM -0300, Fabiano Rosas wrote: > Peter Xu <peterx@redhat.com> writes: > > Hi, sorry it took me so long to get to this. Not a problem. > > > Normally the postcopy recover phase should only exist for a super short > > period, that's the duration when QEMU is trying to recover from an > > interrupted postcopy migration, during which handshake will be carried out > > for continuing the procedure with state changes from PAUSED -> RECOVER -> > > POSTCOPY_ACTIVE again. > > > > Here RECOVER phase should be super small, that happens right after the > > admin specified a new but working network link for QEMU to reconnect to > > dest QEMU. > > > > However there can still be case where the channel is broken in this small > > RECOVER window. > > > > If it happens, with current code there's no way the src QEMU can got kicked > > out of RECOVER stage. No way either to retry the recover in another channel > > when established. > > > > This patch allows the RECOVER phase to fail itself too - we're mostly > > ready, just some small things missing, e.g. properly kick the main > > migration thread out when sleeping on rp_sem when we found that we're at > > RECOVER stage. When this happens, it fails the RECOVER itself, and > > rollback to PAUSED stage. Then the user can retry another round of > > recovery. > > > > To make it even stronger, teach QMP command migrate-pause to explicitly > > kick src/dst QEMU out when needed, so even if for some reason the migration > > thread didn't got kicked out already by a failing rethrn-path thread, the > > admin can also kick it out. > > > > This will be an super, super corner case, but still try to cover that. > > It would be nice to have a test for this. Being such a corner case, it > will be hard to keep this scenario working. Yes makes sense. > > I wrote two tests[1] that do the recovery each using a different URI: > 1) fd: using a freshly opened file, > 2) fd: using a socketpair that simply has nothing on the other end. > > These might be too far from the original bug, but it seems to exercise > some of the same paths: > > Scenario 1: > /x86_64/migration/postcopy/recovery/fail-twice > > the stacks are: > > Thread 8 (Thread 0x7fffd5ffe700 (LWP 30282) "live_migration"): > qemu_sem_wait > ram_dirty_bitmap_sync_all > ram_resume_prepare > qemu_savevm_state_resume_prepare > postcopy_do_resume > postcopy_pause > migration_detect_error > migration_thread > > Thread 7 (Thread 0x7fffd67ff700 (LWP 30281) "return path"): > qemu_sem_wait > postcopy_pause_return_path_thread > source_return_path_thread I guess this is because below path triggers: if (len > 0) { f->buf_size += len; f->total_transferred += len; } else if (len == 0) { qemu_file_set_error_obj(f, -EIO, local_error); <----------- } else { qemu_file_set_error_obj(f, len, local_error); } So the src can always write anything into the tmp file, but any read will return 0 immediately because file offset is always pointing to the file size. > > This patch seems to fix it, although we cannot call qmp_migrate_recover > a second time because the mis state is now in RECOVER: > > "Migrate recover can only be run when postcopy is paused." > > Do we maybe need to return the state to PAUSED, or allow > qmp_migrate_recover to run in RECOVER, like you did on the src side? Ouch, I just noticed that my patch was wrong. I probably need this: ===8<=== From 8c2fb7b4c7488002283c7fb6a5e2aae81b21e04b Mon Sep 17 00:00:00 2001 From: Peter Xu <peterx@redhat.com> Date: Tue, 12 Sep 2023 15:49:54 -0400 Subject: [PATCH] fixup! migration/postcopy: Allow network to fail even during recovery Signed-off-by: Peter Xu <peterx@redhat.com> --- migration/migration.h | 2 +- migration/migration.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index e7f48e736e..7e61e2ece7 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -482,7 +482,7 @@ int migrate_init(MigrationState *s, Error **errp); bool migration_is_blocked(Error **errp); /* True if outgoing migration has entered postcopy phase */ bool migration_in_postcopy(void); -bool migration_postcopy_is_alive(void); +bool migration_postcopy_is_alive(int state); MigrationState *migrate_get_current(void); uint64_t ram_get_total_transferred_pages(void); diff --git a/migration/migration.c b/migration/migration.c index de2146c6fc..a9d381886c 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1349,7 +1349,7 @@ bool migration_in_postcopy(void) } } -bool migration_postcopy_is_alive(void) +bool migration_postcopy_is_alive(int state) { MigrationState *s = migrate_get_current(); @@ -1569,7 +1569,7 @@ void qmp_migrate_pause(Error **errp) MigrationIncomingState *mis = migration_incoming_get_current(); int ret; - if (migration_postcopy_is_alive()) { + if (migration_postcopy_is_alive(ms->state)) { /* Source side, during postcopy */ Error *error = NULL; @@ -1593,7 +1593,7 @@ void qmp_migrate_pause(Error **errp) return; } - if (migration_postcopy_is_alive()) { + if (migration_postcopy_is_alive(mis->state)) { ret = qemu_file_shutdown(mis->from_src_file); if (ret) { error_setg(errp, "Failed to pause destination migration");
On Tue, Sep 12, 2023 at 04:05:27PM -0400, Peter Xu wrote: > Thanks for contributing the test case! > > Do you want me to pick this patch up (with modifications) and repost > together with this series? It'll also work if you want to send a separate > test patch. Let me know! It turns out I found more bug when I was reworking that test case based on yours. E.g., currently we'll crash dest qemu if we really fail during recovery, because we miss: diff --git a/migration/savevm.c b/migration/savevm.c index bb3e99194c..422406e0ee 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2723,7 +2723,8 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis) qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex); } - migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, + /* Current state can be either ACTIVE or RECOVER */ + migrate_set_state(&mis->state, mis->state, MIGRATION_STATUS_POSTCOPY_PAUSED); /* Notify the fault thread for the invalidated file handle */ So in double failure case we'll not set RECOVER to PAUSED, and we'll crash right afterwards, as we'll skip the semaphore: while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { <--- not true, continue qemu_sem_wait(&mis->postcopy_pause_sem_dst); } Now within the new test case I am 100% sure I can kick both sides into RECOVER state (one trick still needed along the way; the test patch will tell soon), then kick them back, then proceed with a successful migration. Let me just repost everything with the new test case. Thanks,
Peter Xu <peterx@redhat.com> writes: >> Scenario 1: >> /x86_64/migration/postcopy/recovery/fail-twice >> >> the stacks are: >> >> Thread 8 (Thread 0x7fffd5ffe700 (LWP 30282) "live_migration"): >> qemu_sem_wait >> ram_dirty_bitmap_sync_all >> ram_resume_prepare >> qemu_savevm_state_resume_prepare >> postcopy_do_resume >> postcopy_pause >> migration_detect_error >> migration_thread >> >> Thread 7 (Thread 0x7fffd67ff700 (LWP 30281) "return path"): >> qemu_sem_wait >> postcopy_pause_return_path_thread >> source_return_path_thread > > I guess this is because below path triggers: > > if (len > 0) { > f->buf_size += len; > f->total_transferred += len; > } else if (len == 0) { > qemu_file_set_error_obj(f, -EIO, local_error); <----------- > } else { > qemu_file_set_error_obj(f, len, local_error); > } > > So the src can always write anything into the tmp file, but any read will > return 0 immediately because file offset is always pointing to the file > size. Yes, a 0 return would mean EOF indeed. >> >> This patch seems to fix it, although we cannot call qmp_migrate_recover >> a second time because the mis state is now in RECOVER: >> >> "Migrate recover can only be run when postcopy is paused." >> >> Do we maybe need to return the state to PAUSED, or allow >> qmp_migrate_recover to run in RECOVER, like you did on the src side? I figured what is going on here (test #1). At postcopy_pause_incoming() the state transition is ACTIVE -> PAUSED, but when the first recovery fails on the incoming side, the transition would have to be RECOVER -> PAUSED. Could you add that change to this patch? > > Ouch, I just noticed that my patch was wrong. > > I probably need this: > > ===8<=== > From 8c2fb7b4c7488002283c7fb6a5e2aae81b21e04b Mon Sep 17 00:00:00 2001 > From: Peter Xu <peterx@redhat.com> > Date: Tue, 12 Sep 2023 15:49:54 -0400 > Subject: [PATCH] fixup! migration/postcopy: Allow network to fail even during > recovery > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > migration/migration.h | 2 +- > migration/migration.c | 6 +++--- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/migration/migration.h b/migration/migration.h > index e7f48e736e..7e61e2ece7 100644 > --- a/migration/migration.h > +++ b/migration/migration.h > @@ -482,7 +482,7 @@ int migrate_init(MigrationState *s, Error **errp); > bool migration_is_blocked(Error **errp); > /* True if outgoing migration has entered postcopy phase */ > bool migration_in_postcopy(void); > -bool migration_postcopy_is_alive(void); > +bool migration_postcopy_is_alive(int state); > MigrationState *migrate_get_current(void); > > uint64_t ram_get_total_transferred_pages(void); > diff --git a/migration/migration.c b/migration/migration.c > index de2146c6fc..a9d381886c 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1349,7 +1349,7 @@ bool migration_in_postcopy(void) > } > } > > -bool migration_postcopy_is_alive(void) > +bool migration_postcopy_is_alive(int state) > { > MigrationState *s = migrate_get_current(); > Note there's a missing hunk here to actually use the 'state'. > @@ -1569,7 +1569,7 @@ void qmp_migrate_pause(Error **errp) > MigrationIncomingState *mis = migration_incoming_get_current(); > int ret; > > - if (migration_postcopy_is_alive()) { > + if (migration_postcopy_is_alive(ms->state)) { > /* Source side, during postcopy */ > Error *error = NULL; > > @@ -1593,7 +1593,7 @@ void qmp_migrate_pause(Error **errp) > return; > } > > - if (migration_postcopy_is_alive()) { > + if (migration_postcopy_is_alive(mis->state)) { > ret = qemu_file_shutdown(mis->from_src_file); > if (ret) { > error_setg(errp, "Failed to pause destination migration"); > -- > 2.41.0 > ===8<=== >> >> >> Scenario 2: >> /x86_64/migration/postcopy/recovery/fail-twice/rp >> >> Thread 8 (Thread 0x7fffd5ffe700 (LWP 30456) "live_migration"): >> qemu_sem_wait >> ram_dirty_bitmap_sync_all >> ram_resume_prepare >> qemu_savevm_state_resume_prepare >> postcopy_do_resume >> postcopy_pause >> migration_detect_error >> migration_thread >> >> Thread 7 (Thread 0x7fffd67ff700 (LWP 30455) "return path"): >> recvmsg >> qio_channel_socket_readv >> qio_channel_readv_full >> qio_channel_read >> qemu_fill_buffer >> qemu_peek_byte >> qemu_get_byte >> qemu_get_be16 >> source_return_path_thread >> >> Here, with this patch the migration gets stuck unless we call >> migrate_pause() one more time. After another round of migrate_pause + >> recover, it finishes properly. Here (test #2), the issue is that the sockets are unpaired, so there's no G_IO_IN to trigger the qio_channel watch callback. The incoming side never calls fd_accept_incoming_migration() and the RP hangs waiting for something. I think there's no other way to unblock aside from the explicit qmp_migrate_pause(). >> >> >> 1- hacked-together test: >> -->8-- >> From a34685c8795799350665a880cd2ddddbf53c5812 Mon Sep 17 00:00:00 2001 >> From: Fabiano Rosas <farosas@suse.de> >> Date: Mon, 11 Sep 2023 20:45:33 -0300 >> Subject: [PATCH] test patch >> >> --- >> tests/qtest/migration-test.c | 87 ++++++++++++++++++++++++++++++++++++ >> 1 file changed, 87 insertions(+) >> >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c >> index 1b43df5ca7..4d9d2209c1 100644 >> --- a/tests/qtest/migration-test.c >> +++ b/tests/qtest/migration-test.c >> @@ -695,6 +695,7 @@ typedef struct { >> /* Postcopy specific fields */ >> void *postcopy_data; >> bool postcopy_preempt; >> + int postcopy_recovery_method; >> } MigrateCommon; >> >> static int test_migrate_start(QTestState **from, QTestState **to, >> @@ -1357,6 +1358,61 @@ static void test_postcopy_preempt_tls_psk(void) >> } >> #endif >> >> +static void postcopy_recover_fail(QTestState *from, QTestState *to, int method) >> +{ >> + int src, dst; >> + >> + if (method == 1) { >> + /* give it some random fd to recover */ >> + g_autofree char *uri = g_strdup_printf("%s/noop", tmpfs); >> + src = dst = open(uri, O_CREAT|O_RDWR); > > This is slightly weird. Yeah, I was just trying to give some sort of bad input that would still be accepted. > > We opened a file, making it RW and pass it over to both QEMUs. > > I think the result can be unpredictable for the reader, that if the reader > reads before any prior writes it'll just quickly fail, or I think it can > also happen that the read is slower than the write so it can read > something.. until it reads to the EOF and fail that fd at some point. > > Not sure whether it'll cause some behavior difference and uncertainty on > the test case. I see qmp_migrate_recover() failing consistently and since the src is only resumed later, it sees an unresponsive dst and the RP goes into the retry path. We could give them both separate files and the result would be more predictable. > > Maybe we drop this method but only keep the below one? > Yeah, maybe this is just too weird. I'm fine either way. [snip] >> qtest_add_func("/migration/bad_dest", test_baddest); > > Thanks for contributing the test case! > > Do you want me to pick this patch up (with modifications) and repost > together with this series? It'll also work if you want to send a separate > test patch. Let me know! You can take it. Or drop it if it ends being too artificial. I'll be focusing on implementing some of the qemu-file.c improvements we discussed in the past on top of this series.
On Tue, Sep 12, 2023 at 07:49:37PM -0300, Fabiano Rosas wrote: > I figured what is going on here (test #1). At postcopy_pause_incoming() > the state transition is ACTIVE -> PAUSED, but when the first recovery > fails on the incoming side, the transition would have to be RECOVER -> > PAUSED. > > Could you add that change to this patch? Yes, and actually, see: https://lore.kernel.org/qemu-devel/20230912222145.731099-11-peterx@redhat.com/ > > -bool migration_postcopy_is_alive(void) > > +bool migration_postcopy_is_alive(int state) > > { > > MigrationState *s = migrate_get_current(); > > > > Note there's a missing hunk here to actually use the 'state'. Yes.. I fixed it in the version I just posted, here: https://lore.kernel.org/qemu-devel/20230912222145.731099-10-peterx@redhat.com/ +bool migration_postcopy_is_alive(int state) +{ + switch (state) { + case MIGRATION_STATUS_POSTCOPY_ACTIVE: + case MIGRATION_STATUS_POSTCOPY_RECOVER: + return true; + default: + return false; + } +} [...] > >> Here, with this patch the migration gets stuck unless we call > >> migrate_pause() one more time. After another round of migrate_pause + > >> recover, it finishes properly. > > Here (test #2), the issue is that the sockets are unpaired, so there's > no G_IO_IN to trigger the qio_channel watch callback. The incoming side > never calls fd_accept_incoming_migration() and the RP hangs waiting for > something. I think there's no other way to unblock aside from the > explicit qmp_migrate_pause(). Exactly, that's the "trick" I mentioned. :) Sorry when replying just now I seem to have jumped over some sections. See: https://lore.kernel.org/qemu-devel/20230912222145.731099-12-peterx@redhat.com/ I put a rich comment for that: + /* + * Write the 1st byte as QEMU_VM_COMMAND (0x8) for the dest socket, to + * emulate the 1st byte of a real recovery, but stops from there to + * keep dest QEMU in RECOVER. This is needed so that we can kick off + * the recover process on dest QEMU (by triggering the G_IO_IN event). + * + * NOTE: this trick is not needed on src QEMUs, because src doesn't + * rely on an pre-existing G_IO_IN event, so it will always trigger the + * upcoming recovery anyway even if it can read nothing. + */ +#define QEMU_VM_COMMAND 0x08 + c = QEMU_VM_COMMAND; + ret = send(pair2[1], &c, 1, 0); + g_assert_cmpint(ret, ==, 1); > We could give them both separate files and the result would be more > predictable. Please have a look at the test patch I posted (note! it's still under your name but I changed it quite a lot with my sign-off). I used your 2nd method to create socket pairs, and hopefully that provides very reliable way to put both src/dst sides into RECOVER state, then kick it out of it using qmp migrate-pause on both sides. > You can take it. Or drop it if it ends being too artificial. I like your suggestion on having the test case, and I hope the new version in above link I posted isn't so artificial; the only part I don't like about that was the "write 1 byte" trick for dest qemu, but that seems still okay. Feel free to go and have a look. Thanks a lot,
diff --git a/migration/migration.h b/migration/migration.h index b6de78dbdd..e86d9d098a 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -482,6 +482,7 @@ void migrate_init(MigrationState *s); bool migration_is_blocked(Error **errp); /* True if outgoing migration has entered postcopy phase */ bool migration_in_postcopy(void); +bool migration_postcopy_is_alive(void); MigrationState *migrate_get_current(void); uint64_t ram_get_total_transferred_pages(void); @@ -522,8 +523,11 @@ void populate_vfio_info(MigrationInfo *info); void reset_vfio_bytes_transferred(void); void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page); -/* Migration thread waiting for return path thread. */ -void migration_rp_wait(MigrationState *s); +/* + * Migration thread waiting for return path thread. Return non-zero if an + * error is detected. + */ +int migration_rp_wait(MigrationState *s); /* * Kick the migration thread waiting for return path messages. NOTE: the * name can be slightly confusing (when read as "kick the rp thread"), just diff --git a/migration/migration.c b/migration/migration.c index 3a5f324781..85462ff1d7 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1349,6 +1349,19 @@ bool migration_in_postcopy(void) } } +bool migration_postcopy_is_alive(void) +{ + MigrationState *s = migrate_get_current(); + + switch (s->state) { + case MIGRATION_STATUS_POSTCOPY_ACTIVE: + case MIGRATION_STATUS_POSTCOPY_RECOVER: + return true; + default: + return false; + } +} + bool migration_in_postcopy_after_devices(MigrationState *s) { return migration_in_postcopy() && s->postcopy_after_devices; @@ -1540,18 +1553,31 @@ void qmp_migrate_pause(Error **errp) MigrationIncomingState *mis = migration_incoming_get_current(); int ret; - if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + if (migration_postcopy_is_alive()) { /* Source side, during postcopy */ + Error *error = NULL; + + /* Tell the core migration that we're pausing */ + error_setg(&error, "Postcopy migration is paused by the user"); + migrate_set_error(ms, error); + qemu_mutex_lock(&ms->qemu_file_lock); ret = qemu_file_shutdown(ms->to_dst_file); qemu_mutex_unlock(&ms->qemu_file_lock); if (ret) { error_setg(errp, "Failed to pause source migration"); } + + /* + * Kick the migration thread out of any waiting windows (on behalf + * of the rp thread). + */ + migration_rp_kick(ms); + return; } - if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + if (migration_postcopy_is_alive()) { ret = qemu_file_shutdown(mis->from_src_file); if (ret) { error_setg(errp, "Failed to pause destination migration"); @@ -1560,7 +1586,7 @@ void qmp_migrate_pause(Error **errp) } error_setg(errp, "migrate-pause is currently only supported " - "during postcopy-active state"); + "during postcopy-active or postcopy-recover state"); } bool migration_is_blocked(Error **errp) @@ -1742,9 +1768,21 @@ void qmp_migrate_continue(MigrationStatus state, Error **errp) qemu_sem_post(&s->pause_sem); } -void migration_rp_wait(MigrationState *s) +int migration_rp_wait(MigrationState *s) { + /* If migration has failure already, ignore the wait */ + if (migrate_has_error(s)) { + return -1; + } + qemu_sem_wait(&s->rp_state.rp_sem); + + /* After wait, double check that there's no failure */ + if (migrate_has_error(s)) { + return -1; + } + + return 0; } void migration_rp_kick(MigrationState *s) @@ -1798,6 +1836,20 @@ static bool postcopy_pause_return_path_thread(MigrationState *s) { trace_postcopy_pause_return_path(); + if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) { + /* + * this will be extremely unlikely: that we got yet another network + * issue during recovering of the 1st network failure.. during this + * period the main migration thread can be waiting on rp_sem for + * this thread to sync with the other side. + * + * When this happens, explicitly kick the migration thread out of + * RECOVER stage and back to PAUSED, so the admin can try + * everything again. + */ + migration_rp_kick(s); + } + qemu_sem_wait(&s->postcopy_pause_rp_sem); trace_postcopy_pause_return_path_continued(); @@ -2503,7 +2555,9 @@ static int postcopy_resume_handshake(MigrationState *s) qemu_savevm_send_postcopy_resume(s->to_dst_file); while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) { - migration_rp_wait(s); + if (migration_rp_wait(s)) { + return -1; + } } if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { diff --git a/migration/ram.c b/migration/ram.c index b5f6d65d84..199fd3e117 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4157,7 +4157,9 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs) /* Wait until all the ramblocks' dirty bitmap synced */ while (qatomic_read(&rs->postcopy_bmap_sync_requested)) { - migration_rp_wait(s); + if (migration_rp_wait(s)) { + return -1; + } } trace_ram_dirty_bitmap_sync_complete();
Normally the postcopy recover phase should only exist for a super short period, that's the duration when QEMU is trying to recover from an interrupted postcopy migration, during which handshake will be carried out for continuing the procedure with state changes from PAUSED -> RECOVER -> POSTCOPY_ACTIVE again. Here RECOVER phase should be super small, that happens right after the admin specified a new but working network link for QEMU to reconnect to dest QEMU. However there can still be case where the channel is broken in this small RECOVER window. If it happens, with current code there's no way the src QEMU can got kicked out of RECOVER stage. No way either to retry the recover in another channel when established. This patch allows the RECOVER phase to fail itself too - we're mostly ready, just some small things missing, e.g. properly kick the main migration thread out when sleeping on rp_sem when we found that we're at RECOVER stage. When this happens, it fails the RECOVER itself, and rollback to PAUSED stage. Then the user can retry another round of recovery. To make it even stronger, teach QMP command migrate-pause to explicitly kick src/dst QEMU out when needed, so even if for some reason the migration thread didn't got kicked out already by a failing rethrn-path thread, the admin can also kick it out. This will be an super, super corner case, but still try to cover that. One can try to test this with two proxy channels for migration: (a) socat unix-listen:/tmp/src.sock,reuseaddr,fork tcp:localhost:10000 (b) socat tcp-listen:10000,reuseaddr,fork unix:/tmp/dst.sock So the migration channel will be: (a) (b) src -> /tmp/src.sock -> tcp:10000 -> /tmp/dst.sock -> dst Then to make QEMU hang at RECOVER stage, one can do below: (1) stop the postcopy using QMP command postcopy-pause (2) kill the 2nd proxy (b) (3) try to recover the postcopy using /tmp/src.sock on src (4) src QEMU will go into RECOVER stage but won't be able to continue from there, because the channel is actually broken at (b) Before this patch, step (4) will make src QEMU stuck in RECOVER stage, without a way to kick the QEMU out or continue the postcopy again. After this patch, (4) will quickly fail qemu and bounce back to PAUSED stage. Admin can also kick QEMU from (4) into PAUSED when needed using migrate-pause when needed. After bouncing back to PAUSED stage, one can recover again. Reported-by: Xiaohui Li <xiaohli@redhat.com> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2111332 Signed-off-by: Peter Xu <peterx@redhat.com> --- migration/migration.h | 8 ++++-- migration/migration.c | 64 +++++++++++++++++++++++++++++++++++++++---- migration/ram.c | 4 ++- 3 files changed, 68 insertions(+), 8 deletions(-)