Message ID | 20240525131241.378473-4-npiggin@gmail.com |
---|---|
State | New |
Headers | show |
Series | Fix s390x flic migration and add some more qtests | expand |
Nicholas Piggin <npiggin@gmail.com> writes: > This was said to be broken on aarch64, but if it works on others, > let's try enable it. It's already starting to bitrot... Yeah, look at the state of this... I don't know what the issue was on aarch64, but I'm all for enabling this test globally and then we deal with the breakage if it ever comes. I don't think it will. However, there is an issue here still on all archs - which might very well have been the original issue - which is the fact that the containers on the Gitlab CI have limits on shared memory usage. Unfortunately we cannot enable this test for the CI, so it needs a check on the GITLAB_CI environment variable. There's also the cpr-reboot test which got put under "flaky", that has the same issue. That one should also have been under GITLAB_CI. From that discussion: "We have an issue with this test on CI: $ df -h /dev/shm Filesystem Size Used Avail Use% Mounted on shm 64M 0 64M 0% /dev/shm These are shared CI runners, so AFAICT there's no way to increase the shared memory size. Reducing the memory for this single test also wouldn't work because we can run migration-test for different archs in parallel + there's the ivshmem_test which uses 4M. Maybe just leave it out of CI? Laptops will probably have enough shared memory to not hit this. If we add a warning comment to the test, might be enough." -- https://lore.kernel.org/all/87ttq5fvh7.fsf@suse.de > > Cc: Yury Kotov <yury-kotov@yandex-team.ru> > Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > tests/qtest/migration-test.c | 13 ++++++++----- > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c > index 7987faaded..2bcdc33b7c 100644 > --- a/tests/qtest/migration-test.c > +++ b/tests/qtest/migration-test.c > @@ -1862,14 +1862,15 @@ static void test_precopy_unix_tls_x509_override_host(void) > #endif /* CONFIG_TASN1 */ > #endif /* CONFIG_GNUTLS */ > > -#if 0 > -/* Currently upset on aarch64 TCG */ > static void test_ignore_shared(void) > { > g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); > QTestState *from, *to; > + MigrateStart args = { > + .use_shmem = true, > + }; > > - if (test_migrate_start(&from, &to, uri, false, true, NULL, NULL)) { > + if (test_migrate_start(&from, &to, uri, &args)) { > return; > } > > @@ -1898,7 +1899,6 @@ static void test_ignore_shared(void) > > test_migrate_end(from, to, true); > } > -#endif > > static void * > test_migrate_xbzrle_start(QTestState *from, > @@ -3537,7 +3537,10 @@ int main(int argc, char **argv) > #endif /* CONFIG_TASN1 */ > #endif /* CONFIG_GNUTLS */ > > - /* migration_test_add("/migration/ignore_shared", test_ignore_shared); */ > + if (strcmp(arch, "aarch64") == 0) { /* Currently upset on aarch64 TCG */ > + migration_test_add("/migration/ignore_shared", test_ignore_shared); > + } > + > #ifndef _WIN32 > migration_test_add("/migration/precopy/fd/tcp", > test_migrate_precopy_fd_socket);
On Mon, May 27, 2024 at 09:42:28AM -0300, Fabiano Rosas wrote: > However, there is an issue here still on all archs - which might very > well have been the original issue - which is the fact that the > containers on the Gitlab CI have limits on shared memory usage. > Unfortunately we cannot enable this test for the CI, so it needs a check > on the GITLAB_CI environment variable. Another option is we teach migration-test to detect whether memory_size of shmem is available, skip if not. It can be a sequence of: memfd_create() fallocate() ret = madvise(MADV_POPULATE_WRITE) To be run at the entry of migration-test, and skip all use_shmem=true tests if ret != 0, or any step failed above. Thanks,
Peter Xu <peterx@redhat.com> writes: > On Mon, May 27, 2024 at 09:42:28AM -0300, Fabiano Rosas wrote: >> However, there is an issue here still on all archs - which might very >> well have been the original issue - which is the fact that the >> containers on the Gitlab CI have limits on shared memory usage. >> Unfortunately we cannot enable this test for the CI, so it needs a check >> on the GITLAB_CI environment variable. > > Another option is we teach migration-test to detect whether memory_size of > shmem is available, skip if not. It can be a sequence of: > > memfd_create() > fallocate() > ret = madvise(MADV_POPULATE_WRITE) > > To be run at the entry of migration-test, and skip all use_shmem=true tests > if ret != 0, or any step failed above. There are actually two issues: 1) Trying to run a test that needs more shmem than available in the container. This is covered well by your suggestion. 2) Trying to use some shmem while another test has already consumed all shmem. I'm not sure if this can be done reliably as the tests run in parallel.
On Mon, May 27, 2024 at 12:11:45PM -0300, Fabiano Rosas wrote: > Peter Xu <peterx@redhat.com> writes: > > > On Mon, May 27, 2024 at 09:42:28AM -0300, Fabiano Rosas wrote: > >> However, there is an issue here still on all archs - which might very > >> well have been the original issue - which is the fact that the > >> containers on the Gitlab CI have limits on shared memory usage. > >> Unfortunately we cannot enable this test for the CI, so it needs a check > >> on the GITLAB_CI environment variable. > > > > Another option is we teach migration-test to detect whether memory_size of > > shmem is available, skip if not. It can be a sequence of: > > > > memfd_create() > > fallocate() > > ret = madvise(MADV_POPULATE_WRITE) > > > > To be run at the entry of migration-test, and skip all use_shmem=true tests > > if ret != 0, or any step failed above. > > There are actually two issues: > > 1) Trying to run a test that needs more shmem than available in the > container. This is covered well by your suggestion. > > 2) Trying to use some shmem while another test has already consumed all > shmem. I'm not sure if this can be done reliably as the tests run in > parallel. Maybe we can also make that check to be per-test, then when use_shmem=true the test populates the shmem file before using, skip if population fails. And if it succeeded, using that file in that test should be reliable.
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 7987faaded..2bcdc33b7c 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -1862,14 +1862,15 @@ static void test_precopy_unix_tls_x509_override_host(void) #endif /* CONFIG_TASN1 */ #endif /* CONFIG_GNUTLS */ -#if 0 -/* Currently upset on aarch64 TCG */ static void test_ignore_shared(void) { g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); QTestState *from, *to; + MigrateStart args = { + .use_shmem = true, + }; - if (test_migrate_start(&from, &to, uri, false, true, NULL, NULL)) { + if (test_migrate_start(&from, &to, uri, &args)) { return; } @@ -1898,7 +1899,6 @@ static void test_ignore_shared(void) test_migrate_end(from, to, true); } -#endif static void * test_migrate_xbzrle_start(QTestState *from, @@ -3537,7 +3537,10 @@ int main(int argc, char **argv) #endif /* CONFIG_TASN1 */ #endif /* CONFIG_GNUTLS */ - /* migration_test_add("/migration/ignore_shared", test_ignore_shared); */ + if (strcmp(arch, "aarch64") == 0) { /* Currently upset on aarch64 TCG */ + migration_test_add("/migration/ignore_shared", test_ignore_shared); + } + #ifndef _WIN32 migration_test_add("/migration/precopy/fd/tcp", test_migrate_precopy_fd_socket);
This was said to be broken on aarch64, but if it works on others, let's try enable it. It's already starting to bitrot... Cc: Yury Kotov <yury-kotov@yandex-team.ru> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- tests/qtest/migration-test.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)