mbox series

[v4,00/10] tests/qtest: make migration-test massively faster

Message ID 20230601161347.1803440-1-berrange@redhat.com
Headers show
Series tests/qtest: make migration-test massively faster | expand

Message

Daniel P. Berrangé June 1, 2023, 4:13 p.m. UTC
This makes migration-test faster by observing that most of the pre-copy
tests don't need to be doing a live migration. They get sufficient code
coverage with the guest CPUs paused.

On my machine this cuts the overall execution time of migration-test
from 13 minutes, down to 8 minutes, without sacrificing any noticeable
code coverage.

Of the tests which do still run in live mode, some need to guarantee
a certain number of iterions. This is achieved by running the 1
iteration with an incredibly small bandwidth and max downtime to
prevent convergance, and watching query-migrate for the reported
iteration to increment. This guarantees that all the tests take at
least 30 seconds to run per iteration required.

Watching for the iteration counter to flip is inefficient and not
actually needed, except on the final iteration before starting
convergance. On this final iteration we merely need to prove that
some amount of already transferred data has been made dirty again.
This in turn will guarantee that a further iteration is required
beyond the current one. This proof is easy to achieve by monitoring
the values at two distinct addresses in guest RAM, and can cut the
30 second duration down to 1 second for one of the iterations.

After this this second optimization the runtime is reduced from
8 minutes, down to 1 minute 40 seconds, which is pretty decent given
the amount of coverage we're getting.

Daniel P. Berrangé (10):
  tests/qtest: add various qtest_qmp_assert_success() variants
  tests/qtest: add support for callback to receive QMP events
  tests/qtest: get rid of 'qmp_command' helper in migration test
  tests/qtest: get rid of some 'qtest_qmp' usage in migration test
  tests/qtest: switch to using event callbacks for STOP event
  tests/qtest: replace wait_command() with qtest_qmp_assert_success
  tests/qtest: capture RESUME events during migration
  tests/qtest: distinguish src/dst migration VM stop/resume events
  tests/qtest: make more migration pre-copy scenarios run non-live
  tests/qtest: massively speed up migration-test

 tests/qtest/libqtest.c          | 115 +++++++-
 tests/qtest/libqtest.h          | 158 ++++++++++-
 tests/qtest/migration-helpers.c | 103 ++-----
 tests/qtest/migration-helpers.h |  16 +-
 tests/qtest/migration-test.c    | 472 ++++++++++++++++++++------------
 5 files changed, 586 insertions(+), 278 deletions(-)

Comments

Thomas Huth July 3, 2023, 4:37 p.m. UTC | #1
On 01/06/2023 18.13, Daniel P. Berrangé wrote:
> This makes migration-test faster by observing that most of the pre-copy
> tests don't need to be doing a live migration. They get sufficient code
> coverage with the guest CPUs paused.
> 
> On my machine this cuts the overall execution time of migration-test
> from 13 minutes, down to 8 minutes, without sacrificing any noticeable
> code coverage.
> 
> Of the tests which do still run in live mode, some need to guarantee
> a certain number of iterions. This is achieved by running the 1
> iteration with an incredibly small bandwidth and max downtime to
> prevent convergance, and watching query-migrate for the reported
> iteration to increment. This guarantees that all the tests take at
> least 30 seconds to run per iteration required.
> 
> Watching for the iteration counter to flip is inefficient and not
> actually needed, except on the final iteration before starting
> convergance. On this final iteration we merely need to prove that
> some amount of already transferred data has been made dirty again.
> This in turn will guarantee that a further iteration is required
> beyond the current one. This proof is easy to achieve by monitoring
> the values at two distinct addresses in guest RAM, and can cut the
> 30 second duration down to 1 second for one of the iterations.
> 
> After this this second optimization the runtime is reduced from
> 8 minutes, down to 1 minute 40 seconds, which is pretty decent given
> the amount of coverage we're getting.

It's now ~1 week until the soft freeze, and the migration test still run for 
~8 minutes. This is still quite annoying. Could we please get one of the 
solutions merged before the soft freeze, either Daniel's or Peter's ?

  Thanks,
   Thomas