Message ID | 20220628105434.295905-2-berrange@redhat.com |
---|---|
State | New |
Headers | show |
Series | tests: improve reliability of migration test | expand |
On 28/06/2022 12:54, Daniel P. Berrangé wrote: > Currently the wait_for_migration_fail and wait_for_migration_complete > functions will spin in an infinite loop checking query-migrate status > to detect a specific change/goal. This is fine when everything goes > to plan, but when the unusual happens, these will hang the test suite > forever. > > Any normally executing migration test case normally takes < 1 second > for a state change, with exception of the autoconverge test which > takes about 5 seconds. Taking into account possibility of people > running tests inside TCG, allowing a factor of x20 slowdown gives > a reasonable worst case of 120 seconds. Anything taking longer than > this is a strong sign that the test has hung, or the test should be > rewritten to be faster. > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > --- > tests/qtest/migration-helpers.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c > index a6aa59e4e6..e81e831c85 100644 > --- a/tests/qtest/migration-helpers.c > +++ b/tests/qtest/migration-helpers.c > @@ -15,6 +15,14 @@ > > #include "migration-helpers.h" > > +/* > + * Number of seconds we wait when looking for migration > + * status changes, to avoid test suite hanging forever > + * when things go wrong. Needs to be higher enough to > + * avoid false positives on loaded hosts. > + */ > +#define MIGRATION_STATUS_WAIT_TIMEOUT 120 > + > bool got_stop; > > static void check_stop_event(QTestState *who) > @@ -166,8 +174,11 @@ static bool check_migration_status(QTestState *who, const char *goal, > void wait_for_migration_status(QTestState *who, > const char *goal, const char **ungoals) > { > + g_test_timer_start(); > while (!check_migration_status(who, goal, ungoals)) { > usleep(1000); > + > + g_assert(g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT); > } > } > > @@ -178,6 +189,7 @@ void wait_for_migration_complete(QTestState *who) > > void wait_for_migration_fail(QTestState *from, bool allow_active) > { > + g_test_timer_start(); > QDict *rsp_return; > char *status; > bool failed; > @@ -193,6 +205,8 @@ void wait_for_migration_fail(QTestState *from, bool allow_active) > g_assert(result); > failed = !strcmp(status, "failed"); > g_free(status); > + > + g_assert(g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT); > } while (!failed); > > /* Is the machine currently running? */ Reviewed-by: Laurent Vivier <laurent@vivier.eu>
On 28/06/2022 12.54, Daniel P. Berrangé wrote: > Currently the wait_for_migration_fail and wait_for_migration_complete > functions will spin in an infinite loop checking query-migrate status > to detect a specific change/goal. This is fine when everything goes > to plan, but when the unusual happens, these will hang the test suite > forever. > > Any normally executing migration test case normally takes < 1 second > for a state change, with exception of the autoconverge test which > takes about 5 seconds. Taking into account possibility of people > running tests inside TCG, allowing a factor of x20 slowdown gives > a reasonable worst case of 120 seconds. Anything taking longer than > this is a strong sign that the test has hung, or the test should be > rewritten to be faster. > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > --- > tests/qtest/migration-helpers.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c > index a6aa59e4e6..e81e831c85 100644 > --- a/tests/qtest/migration-helpers.c > +++ b/tests/qtest/migration-helpers.c > @@ -15,6 +15,14 @@ > > #include "migration-helpers.h" > > +/* > + * Number of seconds we wait when looking for migration > + * status changes, to avoid test suite hanging forever > + * when things go wrong. Needs to be higher enough to > + * avoid false positives on loaded hosts. > + */ > +#define MIGRATION_STATUS_WAIT_TIMEOUT 120 Reviewed-by: Thomas Huth <thuth@redhat.com> ... but I think I'd suggest to use an even longer timeout - sometimes people try to run the tests with TCI, or on heavy loaded CI machines, and then you can get really really long timeouts. Maybe 240 or even 300 seconds?
diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c index a6aa59e4e6..e81e831c85 100644 --- a/tests/qtest/migration-helpers.c +++ b/tests/qtest/migration-helpers.c @@ -15,6 +15,14 @@ #include "migration-helpers.h" +/* + * Number of seconds we wait when looking for migration + * status changes, to avoid test suite hanging forever + * when things go wrong. Needs to be higher enough to + * avoid false positives on loaded hosts. + */ +#define MIGRATION_STATUS_WAIT_TIMEOUT 120 + bool got_stop; static void check_stop_event(QTestState *who) @@ -166,8 +174,11 @@ static bool check_migration_status(QTestState *who, const char *goal, void wait_for_migration_status(QTestState *who, const char *goal, const char **ungoals) { + g_test_timer_start(); while (!check_migration_status(who, goal, ungoals)) { usleep(1000); + + g_assert(g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT); } } @@ -178,6 +189,7 @@ void wait_for_migration_complete(QTestState *who) void wait_for_migration_fail(QTestState *from, bool allow_active) { + g_test_timer_start(); QDict *rsp_return; char *status; bool failed; @@ -193,6 +205,8 @@ void wait_for_migration_fail(QTestState *from, bool allow_active) g_assert(result); failed = !strcmp(status, "failed"); g_free(status); + + g_assert(g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT); } while (!failed); /* Is the machine currently running? */
Currently the wait_for_migration_fail and wait_for_migration_complete functions will spin in an infinite loop checking query-migrate status to detect a specific change/goal. This is fine when everything goes to plan, but when the unusual happens, these will hang the test suite forever. Any normally executing migration test case normally takes < 1 second for a state change, with exception of the autoconverge test which takes about 5 seconds. Taking into account possibility of people running tests inside TCG, allowing a factor of x20 slowdown gives a reasonable worst case of 120 seconds. Anything taking longer than this is a strong sign that the test has hung, or the test should be rewritten to be faster. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- tests/qtest/migration-helpers.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)