Message ID | 20230531132400.1129576-10-berrange@redhat.com |
---|---|
State | New |
Headers | show |
Series | tests/qtest: make migration-test massively faster | expand |
On 31/05/2023 15.24, Daniel P. Berrangé wrote: > The migration test cases that actually exercise live migration want to > ensure there is a minimum of two iterations of pre-copy, in order to > exercise the dirty tracking code. > > Historically we've queried the migration status, looking for the > 'dirty-sync-count' value to increment to track iterations. This was > not entirely reliable because often all the data would get transferred > quickly enough that the migration would finish before we wanted it > to. So we massively dropped the bandwidth and max downtime to > guarantee non-convergance. This had the unfortunate side effect convergence > that every migration took at least 30 seconds to run (100 MB of > dirty pages / 3 MB/sec). > > This optimization takes a different approach to ensuring that a > mimimum of two iterations. Rather than waiting for dirty-sync-count minimum > to increment, directly look for an indication that the source VM > has dirtied RAM that has already been transferred. > > On the source VM a magic marker is written just after the 3 MB > offset. The destination VM is now montiored to detect when the monitored ... > @@ -445,6 +459,91 @@ static void migrate_ensure_converge(QTestState *who) > migrate_set_parameter_int(who, "downtime-limit", 30 * 1000); > } > > +/* > + * Our goal is to ensure that we run a single full migration > + * iteration, and also dirty memory, ensuring that at least > + * one further iteration is required. > + * > + * We can't directly synchronize with the start of a migration > + * so we have to apply some tricks monitoring memory that is > + * transferred. > + * > + * Initially we set the migration bandwidth to an insanely > + * low value, with tiny max downtime too. This basically > + * guarantees migration will never complete. > + * > + * This will result in a test that is unacceptably slow though, > + * so we can't let the entire migration pass run at this speed. > + * Our intent is to let it run just long enough that we can > + * prove data prior to the marker has been transferred *AND* > + * also prove this transferred data is dirty again. > + * > + * Before migration starts, we write a 64-bit magic marker > + * into a fixed location in the src VM RAM. > + * > + * Then watch dst memory until the marker appears. This is > + * proof that start_address -> MAGIC_OFFSET_BASE has been > + * transferred. > + * > + * Finally we go back to the source and read a byte just > + * before the marker untill we see it flip in value. This until It's indeed much faster now, thank you very much for tackling this! Tested-by: Thomas Huth <thuth@redhat.com>
On Wed, May 31, 2023 at 02:24:00PM +0100, Daniel P. Berrangé wrote: > The migration test cases that actually exercise live migration want to > ensure there is a minimum of two iterations of pre-copy, in order to > exercise the dirty tracking code. > > Historically we've queried the migration status, looking for the > 'dirty-sync-count' value to increment to track iterations. This was > not entirely reliable because often all the data would get transferred > quickly enough that the migration would finish before we wanted it > to. So we massively dropped the bandwidth and max downtime to > guarantee non-convergance. This had the unfortunate side effect > that every migration took at least 30 seconds to run (100 MB of > dirty pages / 3 MB/sec). > > This optimization takes a different approach to ensuring that a > mimimum of two iterations. Rather than waiting for dirty-sync-count > to increment, directly look for an indication that the source VM > has dirtied RAM that has already been transferred. > > On the source VM a magic marker is written just after the 3 MB > offset. The destination VM is now montiored to detect when the > magic marker is transferred. This gives a guarantee that the > first 3 MB of memory have been transferred. Now the source VM > memory is monitored at exactly the 3MB offset until we observe > a flip in its value. This gives us a guaranteed that the guest > workload has dirtied a byte that has already been transferred. > > Since we're looking at a place that is only 3 MB from the start > of memory, with the 3 MB/sec bandwidth, this test should complete > in 1 second, instead of 30 seconds. > > Once we've proved there is some dirty memory, migration can be > set back to full speed for the remainder of the 1st iteration, > and the entire of the second iteration at which point migration > should be complete. > > On a test machine this further reduces the migration test time > from 8 minutes to 1 minute 40. The outcome is definitely nice, but it does looks slightly hacky to me and make the test slightly more complicated. If it's all about making sure we finish the 1st iteration, can we simply add a src qemu parameter "switchover-hold"? If it's set, src never switchover to dst but keeps the iterations. Then migrate_ensure_non_converge() will be as simple as setting switchover-hold to true. I am even thinking whether there can even be real-life use case for that, e.g., where a user might want to have a pre-heat of a migration of some VM, and trigger it immediately when the admin really wants (the pre-heats moved most of the pages and keep doing so). It'll be also similar to what Avihai proposed here on switchover-ack, just an ack mechanism on the src side: https://lore.kernel.org/r/20230530144821.1557-3-avihaih@nvidia.com
On Thu, Jun 01, 2023 at 11:46:01AM -0400, Peter Xu wrote: > On Wed, May 31, 2023 at 02:24:00PM +0100, Daniel P. Berrangé wrote: > > The migration test cases that actually exercise live migration want to > > ensure there is a minimum of two iterations of pre-copy, in order to > > exercise the dirty tracking code. > > > > Historically we've queried the migration status, looking for the > > 'dirty-sync-count' value to increment to track iterations. This was > > not entirely reliable because often all the data would get transferred > > quickly enough that the migration would finish before we wanted it > > to. So we massively dropped the bandwidth and max downtime to > > guarantee non-convergance. This had the unfortunate side effect > > that every migration took at least 30 seconds to run (100 MB of > > dirty pages / 3 MB/sec). > > > > This optimization takes a different approach to ensuring that a > > mimimum of two iterations. Rather than waiting for dirty-sync-count > > to increment, directly look for an indication that the source VM > > has dirtied RAM that has already been transferred. > > > > On the source VM a magic marker is written just after the 3 MB > > offset. The destination VM is now montiored to detect when the > > magic marker is transferred. This gives a guarantee that the > > first 3 MB of memory have been transferred. Now the source VM > > memory is monitored at exactly the 3MB offset until we observe > > a flip in its value. This gives us a guaranteed that the guest > > workload has dirtied a byte that has already been transferred. > > > > Since we're looking at a place that is only 3 MB from the start > > of memory, with the 3 MB/sec bandwidth, this test should complete > > in 1 second, instead of 30 seconds. > > > > Once we've proved there is some dirty memory, migration can be > > set back to full speed for the remainder of the 1st iteration, > > and the entire of the second iteration at which point migration > > should be complete. > > > > On a test machine this further reduces the migration test time > > from 8 minutes to 1 minute 40. > > The outcome is definitely nice, but it does looks slightly hacky to me and > make the test slightly more complicated. > > If it's all about making sure we finish the 1st iteration, can we simply > add a src qemu parameter "switchover-hold"? If it's set, src never > switchover to dst but keeps the iterations. For *most* of the tests, we want to ensure there are a minimum of 2 iterations. For the XBZRLE test we want to ensure there are a minimum of 3 iterations, so the cache gets workout. > Then migrate_ensure_non_converge() will be as simple as setting > switchover-hold to true. > > I am even thinking whether there can even be real-life use case for that, > e.g., where a user might want to have a pre-heat of a migration of some VM, > and trigger it immediately when the admin really wants (the pre-heats moved > most of the pages and keep doing so). > > It'll be also similar to what Avihai proposed here on switchover-ack, just > an ack mechanism on the src side: > > https://lore.kernel.org/r/20230530144821.1557-3-avihaih@nvidia.com In general I strongly wanted to avoid adding special logic to the migration code that makes it directly synchronize with the test suite, because once we do that I don't think the test suite is a providing coverage of the real world usage scenario. IOW, if we add a switchover-ack feature, we should certainly have *a* test that uses it, but we should not modify other tests because we want to exercise the logic as it would run with apps that don't rely on switchover-ack. Also, this slow migration test is incredibly painful for people right now, so I'd like to see us get a speed up committed to git quickly. I don't want to block it on a feature proposal that might yet take months to merge. With regards, Daniel
On Thu, Jun 01, 2023 at 05:05:23PM +0100, Daniel P. Berrangé wrote: > On Thu, Jun 01, 2023 at 11:46:01AM -0400, Peter Xu wrote: > > On Wed, May 31, 2023 at 02:24:00PM +0100, Daniel P. Berrangé wrote: > > > The migration test cases that actually exercise live migration want to > > > ensure there is a minimum of two iterations of pre-copy, in order to > > > exercise the dirty tracking code. > > > > > > Historically we've queried the migration status, looking for the > > > 'dirty-sync-count' value to increment to track iterations. This was > > > not entirely reliable because often all the data would get transferred > > > quickly enough that the migration would finish before we wanted it > > > to. So we massively dropped the bandwidth and max downtime to > > > guarantee non-convergance. This had the unfortunate side effect > > > that every migration took at least 30 seconds to run (100 MB of > > > dirty pages / 3 MB/sec). > > > > > > This optimization takes a different approach to ensuring that a > > > mimimum of two iterations. Rather than waiting for dirty-sync-count > > > to increment, directly look for an indication that the source VM > > > has dirtied RAM that has already been transferred. > > > > > > On the source VM a magic marker is written just after the 3 MB > > > offset. The destination VM is now montiored to detect when the > > > magic marker is transferred. This gives a guarantee that the > > > first 3 MB of memory have been transferred. Now the source VM > > > memory is monitored at exactly the 3MB offset until we observe > > > a flip in its value. This gives us a guaranteed that the guest > > > workload has dirtied a byte that has already been transferred. > > > > > > Since we're looking at a place that is only 3 MB from the start > > > of memory, with the 3 MB/sec bandwidth, this test should complete > > > in 1 second, instead of 30 seconds. > > > > > > Once we've proved there is some dirty memory, migration can be > > > set back to full speed for the remainder of the 1st iteration, > > > and the entire of the second iteration at which point migration > > > should be complete. > > > > > > On a test machine this further reduces the migration test time > > > from 8 minutes to 1 minute 40. > > > > The outcome is definitely nice, but it does looks slightly hacky to me and > > make the test slightly more complicated. > > > > If it's all about making sure we finish the 1st iteration, can we simply > > add a src qemu parameter "switchover-hold"? If it's set, src never > > switchover to dst but keeps the iterations. > > For *most* of the tests, we want to ensure there are a minimum > of 2 iterations. For the XBZRLE test we want to ensure there are > a minimum of 3 iterations, so the cache gets workout. > > > Then migrate_ensure_non_converge() will be as simple as setting > > switchover-hold to true. > > > > I am even thinking whether there can even be real-life use case for that, > > e.g., where a user might want to have a pre-heat of a migration of some VM, > > and trigger it immediately when the admin really wants (the pre-heats moved > > most of the pages and keep doing so). > > > > It'll be also similar to what Avihai proposed here on switchover-ack, just > > an ack mechanism on the src side: > > > > https://lore.kernel.org/r/20230530144821.1557-3-avihaih@nvidia.com > > In general I strongly wanted to avoid adding special logic to the > migration code that makes it directly synchronize with the test > suite, because once we do that I don't think the test suite is a > providing coverage of the real world usage scenario. The problem is non-live is already not real world usage in most cases. It seems we all agreed that it's the code paths to cover not real world usages in the tests, or maybe not? > > IOW, if we add a switchover-ack feature, we should certainly have > *a* test that uses it, but we should not modify other tests because > we want to exercise the logic as it would run with apps that don't > rely on switchover-ack. > > Also, this slow migration test is incredibly painful for people right > now, so I'd like to see us get a speed up committed to git quickly. > I don't want to block it on a feature proposal that might yet take > months to merge. That'll be trivial, afaict. I just worry that this patch will bring complexity to the test cases, that's another burden we need to carry besides QEMU itself. If you want, I can try to prepare such a patch before this weekend, and if it's complicated enough and take more than next week to review feel free to go ahead with this one. I understand the migration test issue was there for a long time. But still to me it's important on which may be cleaner for the long term too.
On Thu, Jun 01, 2023 at 12:22:36PM -0400, Peter Xu wrote: > On Thu, Jun 01, 2023 at 05:05:23PM +0100, Daniel P. Berrangé wrote: > > On Thu, Jun 01, 2023 at 11:46:01AM -0400, Peter Xu wrote: > > > On Wed, May 31, 2023 at 02:24:00PM +0100, Daniel P. Berrangé wrote: > > > > The migration test cases that actually exercise live migration want to > > > > ensure there is a minimum of two iterations of pre-copy, in order to > > > > exercise the dirty tracking code. > > > > > > > > Historically we've queried the migration status, looking for the > > > > 'dirty-sync-count' value to increment to track iterations. This was > > > > not entirely reliable because often all the data would get transferred > > > > quickly enough that the migration would finish before we wanted it > > > > to. So we massively dropped the bandwidth and max downtime to > > > > guarantee non-convergance. This had the unfortunate side effect > > > > that every migration took at least 30 seconds to run (100 MB of > > > > dirty pages / 3 MB/sec). > > > > > > > > This optimization takes a different approach to ensuring that a > > > > mimimum of two iterations. Rather than waiting for dirty-sync-count > > > > to increment, directly look for an indication that the source VM > > > > has dirtied RAM that has already been transferred. > > > > > > > > On the source VM a magic marker is written just after the 3 MB > > > > offset. The destination VM is now montiored to detect when the > > > > magic marker is transferred. This gives a guarantee that the > > > > first 3 MB of memory have been transferred. Now the source VM > > > > memory is monitored at exactly the 3MB offset until we observe > > > > a flip in its value. This gives us a guaranteed that the guest > > > > workload has dirtied a byte that has already been transferred. > > > > > > > > Since we're looking at a place that is only 3 MB from the start > > > > of memory, with the 3 MB/sec bandwidth, this test should complete > > > > in 1 second, instead of 30 seconds. > > > > > > > > Once we've proved there is some dirty memory, migration can be > > > > set back to full speed for the remainder of the 1st iteration, > > > > and the entire of the second iteration at which point migration > > > > should be complete. > > > > > > > > On a test machine this further reduces the migration test time > > > > from 8 minutes to 1 minute 40. > > > > > > The outcome is definitely nice, but it does looks slightly hacky to me and > > > make the test slightly more complicated. > > > > > > If it's all about making sure we finish the 1st iteration, can we simply > > > add a src qemu parameter "switchover-hold"? If it's set, src never > > > switchover to dst but keeps the iterations. > > > > For *most* of the tests, we want to ensure there are a minimum > > of 2 iterations. For the XBZRLE test we want to ensure there are > > a minimum of 3 iterations, so the cache gets workout. > > > > > Then migrate_ensure_non_converge() will be as simple as setting > > > switchover-hold to true. > > > > > > I am even thinking whether there can even be real-life use case for that, > > > e.g., where a user might want to have a pre-heat of a migration of some VM, > > > and trigger it immediately when the admin really wants (the pre-heats moved > > > most of the pages and keep doing so). > > > > > > It'll be also similar to what Avihai proposed here on switchover-ack, just > > > an ack mechanism on the src side: > > > > > > https://lore.kernel.org/r/20230530144821.1557-3-avihaih@nvidia.com > > > > In general I strongly wanted to avoid adding special logic to the > > migration code that makes it directly synchronize with the test > > suite, because once we do that I don't think the test suite is a > > providing coverage of the real world usage scenario. > > The problem is non-live is already not real world usage in most cases. It > seems we all agreed that it's the code paths to cover not real world usages > in the tests, or maybe not? The cases that i made run non-live are exercising code paths that are only relevant during migration setup, so the non-live vs live status is irrelevant to the code coverage attained for those tests. Other remaining live tests give sufficient coverage for the live scenario > > IOW, if we add a switchover-ack feature, we should certainly have > > *a* test that uses it, but we should not modify other tests because > > we want to exercise the logic as it would run with apps that don't > > rely on switchover-ack. > > > > Also, this slow migration test is incredibly painful for people right > > now, so I'd like to see us get a speed up committed to git quickly. > > I don't want to block it on a feature proposal that might yet take > > months to merge. > > That'll be trivial, afaict. > > I just worry that this patch will bring complexity to the test cases, > that's another burden we need to carry besides QEMU itself. > > If you want, I can try to prepare such a patch before this weekend, and if > it's complicated enough and take more than next week to review feel free to > go ahead with this one. > > I understand the migration test issue was there for a long time. But still > to me it's important on which may be cleaner for the long term too. With regards, Daniel
On Thu, Jun 01, 2023 at 05:36:42PM +0100, Daniel P. Berrangé wrote: > > The problem is non-live is already not real world usage in most cases. It > > seems we all agreed that it's the code paths to cover not real world usages > > in the tests, or maybe not? > > The cases that i made run non-live are exercising code paths that > are only relevant during migration setup, so the non-live vs live > status is irrelevant to the code coverage attained for those tests. > Other remaining live tests give sufficient coverage for the live > scenario IMHO actually I think it's good we discuss about the goal of target of what qtest plays a role here, hitting the issue of over-used CI and test running too long. For migration tests I'm not sure how Juan sees this I'd say it's good enough to make it "just to cover code paths as much as possible". We're never covering real use cases anyway, IMHO, if we start with 128M VM running sequential dirty workload only. We still need to rely on real migration tests carried out elsewhere I believe. So I'd say we're fine to leverage a new parameter especially if the parameter is simple enough (for logical change I only expect a few LOCs with switchover-hold) to help us achieve it. Let's keep that on top anyway, no matter what. Thanks.
Peter Xu <peterx@redhat.com> wrote: > On Wed, May 31, 2023 at 02:24:00PM +0100, Daniel P. Berrangé wrote: >> The migration test cases that actually exercise live migration want to >> ensure there is a minimum of two iterations of pre-copy, in order to >> exercise the dirty tracking code. >> >> Historically we've queried the migration status, looking for the >> 'dirty-sync-count' value to increment to track iterations. This was >> not entirely reliable because often all the data would get transferred >> quickly enough that the migration would finish before we wanted it >> to. So we massively dropped the bandwidth and max downtime to >> guarantee non-convergance. This had the unfortunate side effect >> that every migration took at least 30 seconds to run (100 MB of >> dirty pages / 3 MB/sec). >> >> This optimization takes a different approach to ensuring that a >> mimimum of two iterations. Rather than waiting for dirty-sync-count >> to increment, directly look for an indication that the source VM >> has dirtied RAM that has already been transferred. >> >> On the source VM a magic marker is written just after the 3 MB >> offset. The destination VM is now montiored to detect when the >> magic marker is transferred. This gives a guarantee that the >> first 3 MB of memory have been transferred. Now the source VM >> memory is monitored at exactly the 3MB offset until we observe >> a flip in its value. This gives us a guaranteed that the guest >> workload has dirtied a byte that has already been transferred. >> >> Since we're looking at a place that is only 3 MB from the start >> of memory, with the 3 MB/sec bandwidth, this test should complete >> in 1 second, instead of 30 seconds. >> >> Once we've proved there is some dirty memory, migration can be >> set back to full speed for the remainder of the 1st iteration, >> and the entire of the second iteration at which point migration >> should be complete. >> >> On a test machine this further reduces the migration test time >> from 8 minutes to 1 minute 40. > > The outcome is definitely nice, but it does looks slightly hacky to me and > make the test slightly more complicated. > > If it's all about making sure we finish the 1st iteration, can we simply > add a src qemu parameter "switchover-hold"? If it's set, src never > switchover to dst but keeps the iterations. > > Then migrate_ensure_non_converge() will be as simple as setting > switchover-hold to true. > > I am even thinking whether there can even be real-life use case for that, > e.g., where a user might want to have a pre-heat of a migration of some VM, > and trigger it immediately when the admin really wants (the pre-heats moved > most of the pages and keep doing so). > > It'll be also similar to what Avihai proposed here on switchover-ack, just > an ack mechanism on the src side: > > https://lore.kernel.org/r/20230530144821.1557-3-avihaih@nvidia.com That was basically my idea and that is why I am holding the last two patches and see if I can came with something in the next couple of days. Later, Juan.
On Fri, Jun 02, 2023 at 01:00:10AM +0200, Juan Quintela wrote: > That was basically my idea and that is why I am holding the last two > patches and see if I can came with something in the next couple of days. Ah! ... If you haven't started, please hold off for one day. I'll see whether I can post mine tomorrow.
On Thu, Jun 01, 2023 at 07:43:43PM -0400, Peter Xu wrote: > On Fri, Jun 02, 2023 at 01:00:10AM +0200, Juan Quintela wrote: > > That was basically my idea and that is why I am holding the last two > > patches and see if I can came with something in the next couple of days. > > Ah! ... > > If you haven't started, please hold off for one day. I'll see whether I > can post mine tomorrow. 5 weeks later, and we've still not got a fix merged, and it is soft freeze tomorrow. Since this patch works, could we please consider merging this now. It is a simple revert at a later once when Peter's desired better solution is finally ready, unless Peter's series can be submitted for soft freeze merge today ? With regards, Daniel
On 10/07/2023 11.35, Daniel P. Berrangé wrote: > On Thu, Jun 01, 2023 at 07:43:43PM -0400, Peter Xu wrote: >> On Fri, Jun 02, 2023 at 01:00:10AM +0200, Juan Quintela wrote: >>> That was basically my idea and that is why I am holding the last two >>> patches and see if I can came with something in the next couple of days. >> >> Ah! ... >> >> If you haven't started, please hold off for one day. I'll see whether I >> can post mine tomorrow. > > 5 weeks later, and we've still not got a fix merged, and it is soft > freeze tomorrow. > > Since this patch works, could we please consider merging this now. It is > a simple revert at a later once when Peter's desired better solution is > finally ready, unless Peter's series can be submitted for soft freeze > merge today ? I agree, I'll queue this qtest patch here for my PR-before-softfreeze now - if Peter's work is ready to get merged via the migration tree, this patch here can be easily reverted again. Thomas
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index e033d9f0c1..6ea9011423 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -46,6 +46,20 @@ static bool uffd_feature_thread_id; static bool got_stop; static bool got_resume; +/* + * An initial 3 MB offset is used as that corresponds + * to ~1 sec of data transfer with our bandwidth setting. + */ +#define MAGIC_OFFSET_BASE (3 * 1024 * 1024) +/* + * A further 1k is added to ensure we're not a multiple + * of TEST_MEM_PAGE_SIZE, thus avoid clash with writes + * from the migration guest workload. + */ +#define MAGIC_OFFSET_SHUFFLE 1024 +#define MAGIC_OFFSET (MAGIC_OFFSET_BASE + MAGIC_OFFSET_SHUFFLE) +#define MAGIC_MARKER 0xFEED12345678CAFEULL + /* * Dirtylimit stop working if dirty page rate error * value less than DIRTYLIMIT_TOLERANCE_RANGE @@ -445,6 +459,91 @@ static void migrate_ensure_converge(QTestState *who) migrate_set_parameter_int(who, "downtime-limit", 30 * 1000); } +/* + * Our goal is to ensure that we run a single full migration + * iteration, and also dirty memory, ensuring that at least + * one further iteration is required. + * + * We can't directly synchronize with the start of a migration + * so we have to apply some tricks monitoring memory that is + * transferred. + * + * Initially we set the migration bandwidth to an insanely + * low value, with tiny max downtime too. This basically + * guarantees migration will never complete. + * + * This will result in a test that is unacceptably slow though, + * so we can't let the entire migration pass run at this speed. + * Our intent is to let it run just long enough that we can + * prove data prior to the marker has been transferred *AND* + * also prove this transferred data is dirty again. + * + * Before migration starts, we write a 64-bit magic marker + * into a fixed location in the src VM RAM. + * + * Then watch dst memory until the marker appears. This is + * proof that start_address -> MAGIC_OFFSET_BASE has been + * transferred. + * + * Finally we go back to the source and read a byte just + * before the marker untill we see it flip in value. This + * is proof that start_address -> MAGIC_OFFSET_BASE + * is now dirty again. + * + * IOW, we're guaranteed at least a 2nd migration pass + * at this point. + * + * We can now let migration run at full speed to finish + * the test + */ +static void migrate_prepare_for_dirty_mem(QTestState *from) +{ + /* + * The guest workflow iterates from start_address to + * end_address, writing 1 byte every TEST_MEM_PAGE_SIZE + * bytes. + * + * IOW, if we write to mem at a point which is NOT + * a multiple of TEST_MEM_PAGE_SIZE, our write won't + * conflict with the migration workflow. + * + * We put in a marker here, that we'll use to determine + * when the data has been transferred to the dst. + */ + qtest_writeq(from, start_address + MAGIC_OFFSET, MAGIC_MARKER); +} + +static void migrate_wait_for_dirty_mem(QTestState *from, + QTestState *to) +{ + uint64_t watch_address = start_address + MAGIC_OFFSET_BASE; + uint64_t marker_address = start_address + MAGIC_OFFSET; + uint8_t watch_byte; + + /* + * Wait for the MAGIC_MARKER to get transferred, as an + * indicator that a migration pass has made some known + * amount of progress. + */ + do { + usleep(1000 * 10); + } while (qtest_readq(to, marker_address) != MAGIC_MARKER); + + /* + * Now ensure that already transferred bytes are + * dirty again from the guest workload. Note the + * guest byte value will wrap around and by chance + * match the original watch_byte. This is harmless + * as we'll eventually see a different value if we + * keep watching + */ + watch_byte = qtest_readb(from, watch_address); + do { + usleep(1000 * 10); + } while (qtest_readb(from, watch_address) == watch_byte); +} + + static void migrate_pause(QTestState *who) { qtest_qmp_assert_success(who, "{ 'execute': 'migrate-pause' }"); @@ -577,7 +676,10 @@ typedef struct { MIG_TEST_FAIL_DEST_QUIT_ERR, } result; - /* Optional: set number of migration passes to wait for, if live==true */ + /* + * Optional: set number of migration passes to wait for, if live==true. + * If zero, then merely wait for a few MB of dirty data + */ unsigned int iterations; /* Optional: whether the guest CPUs should be running during migration */ @@ -1158,12 +1260,14 @@ static int migrate_postcopy_prepare(QTestState **from_ptr, migrate_ensure_non_converge(from); + migrate_prepare_for_dirty_mem(from); + /* Wait for the first serial output from the source */ wait_for_serial("src_serial"); migrate_qmp(from, uri, "{}"); - wait_for_migration_pass(from); + migrate_wait_for_dirty_mem(from, to); *from_ptr = from; *to_ptr = to; @@ -1398,14 +1502,8 @@ static void test_precopy_common(MigrateCommon *args) } if (args->live) { - /* - * Testing live migration, we want to ensure that some - * memory is re-dirtied after being transferred, so that - * we exercise logic for dirty page handling. We achieve - * this with a ridiculosly low bandwidth that guarantees - * non-convergance. - */ migrate_ensure_non_converge(from); + migrate_prepare_for_dirty_mem(from); } else { /* * Testing non-live migration, we allow it to run at @@ -1440,13 +1538,16 @@ static void test_precopy_common(MigrateCommon *args) } } else { if (args->live) { - if (args->iterations) { - while (args->iterations--) { - wait_for_migration_pass(from); - } - } else { + /* + * For initial iteration(s) we must do a full pass, + * but for the final iteration, we need only wait + * for some dirty mem before switching to converge + */ + while (args->iterations > 1) { wait_for_migration_pass(from); + args->iterations--; } + migrate_wait_for_dirty_mem(from, to); migrate_ensure_converge(from); @@ -1573,6 +1674,9 @@ static void test_ignore_shared(void) return; } + migrate_ensure_non_converge(from); + migrate_prepare_for_dirty_mem(from); + migrate_set_capability(from, "x-ignore-shared", true); migrate_set_capability(to, "x-ignore-shared", true); @@ -1581,7 +1685,7 @@ static void test_ignore_shared(void) migrate_qmp(from, uri, "{}"); - wait_for_migration_pass(from); + migrate_wait_for_dirty_mem(from, to); if (!got_stop) { qtest_qmp_eventwait(from, "STOP"); @@ -2273,6 +2377,7 @@ static void test_multifd_tcp_cancel(void) } migrate_ensure_non_converge(from); + migrate_prepare_for_dirty_mem(from); migrate_set_parameter_int(from, "multifd-channels", 16); migrate_set_parameter_int(to, "multifd-channels", 16); @@ -2291,7 +2396,7 @@ static void test_multifd_tcp_cancel(void) migrate_qmp(from, uri, "{}"); - wait_for_migration_pass(from); + migrate_wait_for_dirty_mem(from, to); migrate_cancel(from); @@ -2320,11 +2425,13 @@ static void test_multifd_tcp_cancel(void) wait_for_migration_status(from, "cancelled", NULL); - migrate_ensure_converge(from); + migrate_ensure_non_converge(from); migrate_qmp(from, uri, "{}"); - wait_for_migration_pass(from); + migrate_wait_for_dirty_mem(from, to); + + migrate_ensure_converge(from); if (!got_stop) { qtest_qmp_eventwait(from, "STOP");
The migration test cases that actually exercise live migration want to ensure there is a minimum of two iterations of pre-copy, in order to exercise the dirty tracking code. Historically we've queried the migration status, looking for the 'dirty-sync-count' value to increment to track iterations. This was not entirely reliable because often all the data would get transferred quickly enough that the migration would finish before we wanted it to. So we massively dropped the bandwidth and max downtime to guarantee non-convergance. This had the unfortunate side effect that every migration took at least 30 seconds to run (100 MB of dirty pages / 3 MB/sec). This optimization takes a different approach to ensuring that a mimimum of two iterations. Rather than waiting for dirty-sync-count to increment, directly look for an indication that the source VM has dirtied RAM that has already been transferred. On the source VM a magic marker is written just after the 3 MB offset. The destination VM is now montiored to detect when the magic marker is transferred. This gives a guarantee that the first 3 MB of memory have been transferred. Now the source VM memory is monitored at exactly the 3MB offset until we observe a flip in its value. This gives us a guaranteed that the guest workload has dirtied a byte that has already been transferred. Since we're looking at a place that is only 3 MB from the start of memory, with the 3 MB/sec bandwidth, this test should complete in 1 second, instead of 30 seconds. Once we've proved there is some dirty memory, migration can be set back to full speed for the remainder of the 1st iteration, and the entire of the second iteration at which point migration should be complete. On a test machine this further reduces the migration test time from 8 minutes to 1 minute 40. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- tests/qtest/migration-test.c | 143 ++++++++++++++++++++++++++++++----- 1 file changed, 125 insertions(+), 18 deletions(-)