Message ID | 20230606144551.24367-4-farosas@suse.de |
---|---|
State | New |
Headers | show |
Series | migration: Fix multifd cancel test | expand |
Fabiano Rosas <farosas@suse.de> wrote: > We've found the source of flakiness in this test, so re-enable it. > > Signed-off-by: Fabiano Rosas <farosas@suse.de> > --- > tests/qtest/migration-test.c | 10 ++-------- > 1 file changed, 2 insertions(+), 8 deletions(-) > > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c > index b0c355bbd9..800ad23b75 100644 > --- a/tests/qtest/migration-test.c > +++ b/tests/qtest/migration-test.c > @@ -2778,14 +2778,8 @@ int main(int argc, char **argv) > } > qtest_add_func("/migration/multifd/tcp/plain/none", > test_multifd_tcp_none); > - /* > - * This test is flaky and sometimes fails in CI and otherwise: > - * don't run unless user opts in via environment variable. > - */ > - if (getenv("QEMU_TEST_FLAKY_TESTS")) { > - qtest_add_func("/migration/multifd/tcp/plain/cancel", > - test_multifd_tcp_cancel); > - } > + qtest_add_func("/migration/multifd/tcp/plain/cancel", > + test_multifd_tcp_cancel); > qtest_add_func("/migration/multifd/tcp/plain/zlib", > test_multifd_tcp_zlib); > #ifdef CONFIG_ZSTD Reviewed-by: Juan Quintela <quintela@redhat.com> There was another failure with migration test that I will post during the rest of the day. It needs both to get it right. Later, Juan.
On Wed, Jun 07, 2023 at 10:27:15AM +0200, Juan Quintela wrote: > Fabiano Rosas <farosas@suse.de> wrote: > > We've found the source of flakiness in this test, so re-enable it. > > > > Signed-off-by: Fabiano Rosas <farosas@suse.de> > > --- > > tests/qtest/migration-test.c | 10 ++-------- > > 1 file changed, 2 insertions(+), 8 deletions(-) > > > > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c > > index b0c355bbd9..800ad23b75 100644 > > --- a/tests/qtest/migration-test.c > > +++ b/tests/qtest/migration-test.c > > @@ -2778,14 +2778,8 @@ int main(int argc, char **argv) > > } > > qtest_add_func("/migration/multifd/tcp/plain/none", > > test_multifd_tcp_none); > > - /* > > - * This test is flaky and sometimes fails in CI and otherwise: > > - * don't run unless user opts in via environment variable. > > - */ > > - if (getenv("QEMU_TEST_FLAKY_TESTS")) { > > - qtest_add_func("/migration/multifd/tcp/plain/cancel", > > - test_multifd_tcp_cancel); > > - } > > + qtest_add_func("/migration/multifd/tcp/plain/cancel", > > + test_multifd_tcp_cancel); > > qtest_add_func("/migration/multifd/tcp/plain/zlib", > > test_multifd_tcp_zlib); > > #ifdef CONFIG_ZSTD > > Reviewed-by: Juan Quintela <quintela@redhat.com> > > > There was another failure with migration test that I will post during > the rest of the day. It needs both to get it right. This one didn't yet land upstream. I'm not sure, but maybe Juan was saying about this change: commit d2026ee117147893f8d80f060cede6d872ecbd7f Author: Juan Quintela <quintela@trasno.org> Date: Wed Apr 26 12:20:36 2023 +0200 multifd: Fix the number of channels ready Fabiano, did you try to reproduce multifd-cancel with current master? I'm wondering whether this test has already been completely fixed, then maybe we can pick up this patch now.
Peter Xu <peterx@redhat.com> writes: > On Wed, Jun 07, 2023 at 10:27:15AM +0200, Juan Quintela wrote: >> Fabiano Rosas <farosas@suse.de> wrote: >> > We've found the source of flakiness in this test, so re-enable it. >> > >> > Signed-off-by: Fabiano Rosas <farosas@suse.de> >> > --- >> > tests/qtest/migration-test.c | 10 ++-------- >> > 1 file changed, 2 insertions(+), 8 deletions(-) >> > >> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c >> > index b0c355bbd9..800ad23b75 100644 >> > --- a/tests/qtest/migration-test.c >> > +++ b/tests/qtest/migration-test.c >> > @@ -2778,14 +2778,8 @@ int main(int argc, char **argv) >> > } >> > qtest_add_func("/migration/multifd/tcp/plain/none", >> > test_multifd_tcp_none); >> > - /* >> > - * This test is flaky and sometimes fails in CI and otherwise: >> > - * don't run unless user opts in via environment variable. >> > - */ >> > - if (getenv("QEMU_TEST_FLAKY_TESTS")) { >> > - qtest_add_func("/migration/multifd/tcp/plain/cancel", >> > - test_multifd_tcp_cancel); >> > - } >> > + qtest_add_func("/migration/multifd/tcp/plain/cancel", >> > + test_multifd_tcp_cancel); >> > qtest_add_func("/migration/multifd/tcp/plain/zlib", >> > test_multifd_tcp_zlib); >> > #ifdef CONFIG_ZSTD >> >> Reviewed-by: Juan Quintela <quintela@redhat.com> >> >> >> There was another failure with migration test that I will post during >> the rest of the day. It needs both to get it right. > > This one didn't yet land upstream. I'm not sure, but maybe Juan was saying > about this change: > > commit d2026ee117147893f8d80f060cede6d872ecbd7f > Author: Juan Quintela <quintela@trasno.org> > Date: Wed Apr 26 12:20:36 2023 +0200 > > multifd: Fix the number of channels ready That's not it. It was something in the test itself around the fact that we use two sets of: from/to. There was supposed to be a situation where we'd start 'to2' while 'to' was still running and that would cause issues (possibly with sockets). I think what might have happened is that someone merged a fix through another tree and Juan didn't notice. I think this is the one: commit f2d063e61ee2026700ab44bef967f663e976bec8 Author: Xuzhou Cheng <xuzhou.cheng@windriver.com> Date: Fri Oct 28 12:57:32 2022 +0800 tests/qtest: migration-test: Make sure QEMU process "to" exited after migration is canceled Make sure QEMU process "to" exited before launching another target for migration in the test_multifd_tcp_cancel case. Signed-off-by: Xuzhou Cheng <xuzhou.cheng@windriver.com> Signed-off-by: Bin Meng <bin.meng@windriver.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20221028045736.679903-8-bin.meng@windriver.com> Signed-off-by: Thomas Huth <thuth@redhat.com> > Fabiano, did you try to reproduce multifd-cancel with current master? I'm > wondering whether this test has already been completely fixed, then maybe > we can pick up this patch now. Yes, let's merge it. I have kept it enabled during testing of all of the recent race conditions we've debugged and haven't seen it fail. Current master also looks fine.
On Mon, Jan 08, 2024 at 11:26:04AM -0300, Fabiano Rosas wrote: > Peter Xu <peterx@redhat.com> writes: > > > On Wed, Jun 07, 2023 at 10:27:15AM +0200, Juan Quintela wrote: > >> Fabiano Rosas <farosas@suse.de> wrote: > >> > We've found the source of flakiness in this test, so re-enable it. > >> > > >> > Signed-off-by: Fabiano Rosas <farosas@suse.de> > >> > --- > >> > tests/qtest/migration-test.c | 10 ++-------- > >> > 1 file changed, 2 insertions(+), 8 deletions(-) > >> > > >> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c > >> > index b0c355bbd9..800ad23b75 100644 > >> > --- a/tests/qtest/migration-test.c > >> > +++ b/tests/qtest/migration-test.c > >> > @@ -2778,14 +2778,8 @@ int main(int argc, char **argv) > >> > } > >> > qtest_add_func("/migration/multifd/tcp/plain/none", > >> > test_multifd_tcp_none); > >> > - /* > >> > - * This test is flaky and sometimes fails in CI and otherwise: > >> > - * don't run unless user opts in via environment variable. > >> > - */ > >> > - if (getenv("QEMU_TEST_FLAKY_TESTS")) { > >> > - qtest_add_func("/migration/multifd/tcp/plain/cancel", > >> > - test_multifd_tcp_cancel); > >> > - } > >> > + qtest_add_func("/migration/multifd/tcp/plain/cancel", > >> > + test_multifd_tcp_cancel); > >> > qtest_add_func("/migration/multifd/tcp/plain/zlib", > >> > test_multifd_tcp_zlib); > >> > #ifdef CONFIG_ZSTD > >> > >> Reviewed-by: Juan Quintela <quintela@redhat.com> > >> > >> > >> There was another failure with migration test that I will post during > >> the rest of the day. It needs both to get it right. > > > > This one didn't yet land upstream. I'm not sure, but maybe Juan was saying > > about this change: > > > > commit d2026ee117147893f8d80f060cede6d872ecbd7f > > Author: Juan Quintela <quintela@trasno.org> > > Date: Wed Apr 26 12:20:36 2023 +0200 > > > > multifd: Fix the number of channels ready > > That's not it. It was something in the test itself around the fact that > we use two sets of: from/to. There was supposed to be a situation where > we'd start 'to2' while 'to' was still running and that would cause > issues (possibly with sockets). > > I think what might have happened is that someone merged a fix through > another tree and Juan didn't notice. I think this is the one: > > commit f2d063e61ee2026700ab44bef967f663e976bec8 > Author: Xuzhou Cheng <xuzhou.cheng@windriver.com> > Date: Fri Oct 28 12:57:32 2022 +0800 > > tests/qtest: migration-test: Make sure QEMU process "to" exited after migration is canceled > > Make sure QEMU process "to" exited before launching another target > for migration in the test_multifd_tcp_cancel case. > > Signed-off-by: Xuzhou Cheng <xuzhou.cheng@windriver.com> > Signed-off-by: Bin Meng <bin.meng@windriver.com> > Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> > Message-Id: <20221028045736.679903-8-bin.meng@windriver.com> > Signed-off-by: Thomas Huth <thuth@redhat.com> Hmm, i see. > > > Fabiano, did you try to reproduce multifd-cancel with current master? I'm > > wondering whether this test has already been completely fixed, then maybe > > we can pick up this patch now. > > Yes, let's merge it. I have kept it enabled during testing of all of the > recent race conditions we've debugged and haven't seen it fail. Current > master also looks fine. It needs a trivial touchup, but then I queued it. Thanks,
On 09/01/2024 03.12, Peter Xu wrote: > On Mon, Jan 08, 2024 at 11:26:04AM -0300, Fabiano Rosas wrote: >> Peter Xu <peterx@redhat.com> writes: >> >>> On Wed, Jun 07, 2023 at 10:27:15AM +0200, Juan Quintela wrote: >>>> Fabiano Rosas <farosas@suse.de> wrote: >>>>> We've found the source of flakiness in this test, so re-enable it. >>>>> >>>>> Signed-off-by: Fabiano Rosas <farosas@suse.de> >>>>> --- >>>>> tests/qtest/migration-test.c | 10 ++-------- >>>>> 1 file changed, 2 insertions(+), 8 deletions(-) >>>>> >>>>> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c >>>>> index b0c355bbd9..800ad23b75 100644 >>>>> --- a/tests/qtest/migration-test.c >>>>> +++ b/tests/qtest/migration-test.c >>>>> @@ -2778,14 +2778,8 @@ int main(int argc, char **argv) >>>>> } >>>>> qtest_add_func("/migration/multifd/tcp/plain/none", >>>>> test_multifd_tcp_none); >>>>> - /* >>>>> - * This test is flaky and sometimes fails in CI and otherwise: >>>>> - * don't run unless user opts in via environment variable. >>>>> - */ >>>>> - if (getenv("QEMU_TEST_FLAKY_TESTS")) { >>>>> - qtest_add_func("/migration/multifd/tcp/plain/cancel", >>>>> - test_multifd_tcp_cancel); >>>>> - } >>>>> + qtest_add_func("/migration/multifd/tcp/plain/cancel", >>>>> + test_multifd_tcp_cancel); >>>>> qtest_add_func("/migration/multifd/tcp/plain/zlib", >>>>> test_multifd_tcp_zlib); >>>>> #ifdef CONFIG_ZSTD >>>> >>>> Reviewed-by: Juan Quintela <quintela@redhat.com> >>>> >>>> >>>> There was another failure with migration test that I will post during >>>> the rest of the day. It needs both to get it right. >>> >>> This one didn't yet land upstream. I'm not sure, but maybe Juan was saying >>> about this change: >>> >>> commit d2026ee117147893f8d80f060cede6d872ecbd7f >>> Author: Juan Quintela <quintela@trasno.org> >>> Date: Wed Apr 26 12:20:36 2023 +0200 >>> >>> multifd: Fix the number of channels ready >> >> That's not it. It was something in the test itself around the fact that >> we use two sets of: from/to. There was supposed to be a situation where >> we'd start 'to2' while 'to' was still running and that would cause >> issues (possibly with sockets). >> >> I think what might have happened is that someone merged a fix through >> another tree and Juan didn't notice. I think this is the one: >> >> commit f2d063e61ee2026700ab44bef967f663e976bec8 >> Author: Xuzhou Cheng <xuzhou.cheng@windriver.com> >> Date: Fri Oct 28 12:57:32 2022 +0800 >> >> tests/qtest: migration-test: Make sure QEMU process "to" exited after migration is canceled >> >> Make sure QEMU process "to" exited before launching another target >> for migration in the test_multifd_tcp_cancel case. >> >> Signed-off-by: Xuzhou Cheng <xuzhou.cheng@windriver.com> >> Signed-off-by: Bin Meng <bin.meng@windriver.com> >> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> >> Message-Id: <20221028045736.679903-8-bin.meng@windriver.com> >> Signed-off-by: Thomas Huth <thuth@redhat.com> > > Hmm, i see. Sorry for that :-( Maybe it's better if we remove the migration-test from the qtest section in MAINTAINERS? Since the migration test is very well maintained already, there's IMHO no need for picking up the patches via the qtest tree, so something like this should prevent these problems: diff --git a/MAINTAINERS b/MAINTAINERS --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3269,6 +3269,7 @@ F: tests/qtest/ F: docs/devel/qgraph.rst F: docs/devel/qtest.rst X: tests/qtest/bios-tables-test* +X: tests/qtest/migration-* Device Fuzzing M: Alexander Bulekov <alxndr@bu.edu> (as you can see, we're doing it in a similar way for the bios tables test already) If you agree, I can send out a proper patch for this later today. Thomas
Hi, Thomas, On Tue, Jan 09, 2024 at 08:21:53AM +0100, Thomas Huth wrote: > Sorry for that :-( Not at all! I actually appreciate more people looking after it. > Maybe it's better if we remove the migration-test from > the qtest section in MAINTAINERS? Since the migration test is very well > maintained already, there's IMHO no need for picking up the patches via the > qtest tree, so something like this should prevent these problems: > > diff --git a/MAINTAINERS b/MAINTAINERS > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -3269,6 +3269,7 @@ F: tests/qtest/ > F: docs/devel/qgraph.rst > F: docs/devel/qtest.rst > X: tests/qtest/bios-tables-test* > +X: tests/qtest/migration-* > > Device Fuzzing > M: Alexander Bulekov <alxndr@bu.edu> > > (as you can see, we're doing it in a similar way for the bios tables test > already) > > If you agree, I can send out a proper patch for this later today. Currently the file is covered by both groups of people, which is the best condition to me: $ ./scripts/get_maintainer.pl -f tests/qtest/migration-test.c Peter Xu <peterx@redhat.com> (maintainer:Migration) Fabiano Rosas <farosas@suse.de> (maintainer:Migration) Thomas Huth <thuth@redhat.com> (maintainer:qtest) Laurent Vivier <lvivier@redhat.com> (maintainer:qtest) Paolo Bonzini <pbonzini@redhat.com> (reviewer:qtest) qemu-devel@nongnu.org (open list:All patches CC here) It makes sense to me e.g. when qtest reworks the framework, and we'd like migration-test.c to be covered in that same reworks series and reviewed/pulled together, for example, then those can go via qtest's tree directly. If patch submitter follows the MAINTAINERS file it means all of us will be in the loop and that's the perfect condition, IMHO. It's just that this patch didn't have any migration people copied, which caused a very slight confusion. It'll be great in that case if qtest maintainers can help submitters to copy us if the submitters forgot to do so. I think we should do the same when there's major changes for qtest framework for a new migration test. Would that work the best for us? Thanks,
On 09/01/2024 08.48, Peter Xu wrote: > Hi, Thomas, > > On Tue, Jan 09, 2024 at 08:21:53AM +0100, Thomas Huth wrote: >> Sorry for that :-( > > Not at all! I actually appreciate more people looking after it. > >> Maybe it's better if we remove the migration-test from >> the qtest section in MAINTAINERS? Since the migration test is very well >> maintained already, there's IMHO no need for picking up the patches via the >> qtest tree, so something like this should prevent these problems: >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -3269,6 +3269,7 @@ F: tests/qtest/ >> F: docs/devel/qgraph.rst >> F: docs/devel/qtest.rst >> X: tests/qtest/bios-tables-test* >> +X: tests/qtest/migration-* >> >> Device Fuzzing >> M: Alexander Bulekov <alxndr@bu.edu> >> >> (as you can see, we're doing it in a similar way for the bios tables test >> already) >> >> If you agree, I can send out a proper patch for this later today. > > Currently the file is covered by both groups of people, which is the best > condition to me: > > $ ./scripts/get_maintainer.pl -f tests/qtest/migration-test.c > Peter Xu <peterx@redhat.com> (maintainer:Migration) > Fabiano Rosas <farosas@suse.de> (maintainer:Migration) > Thomas Huth <thuth@redhat.com> (maintainer:qtest) > Laurent Vivier <lvivier@redhat.com> (maintainer:qtest) > Paolo Bonzini <pbonzini@redhat.com> (reviewer:qtest) > qemu-devel@nongnu.org (open list:All patches CC here) > > It makes sense to me e.g. when qtest reworks the framework, and we'd like > migration-test.c to be covered in that same reworks series and > reviewed/pulled together, for example, then those can go via qtest's tree > directly. > > If patch submitter follows the MAINTAINERS file it means all of us will be > in the loop and that's the perfect condition, IMHO. It's just that this > patch didn't have any migration people copied, which caused a very slight > confusion. > > It'll be great in that case if qtest maintainers can help submitters to > copy us if the submitters forgot to do so. I think we should do the same > when there's major changes for qtest framework for a new migration test. > Would that work the best for us? Ok, makes sense, let's try it that way! Thomas
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index b0c355bbd9..800ad23b75 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -2778,14 +2778,8 @@ int main(int argc, char **argv) } qtest_add_func("/migration/multifd/tcp/plain/none", test_multifd_tcp_none); - /* - * This test is flaky and sometimes fails in CI and otherwise: - * don't run unless user opts in via environment variable. - */ - if (getenv("QEMU_TEST_FLAKY_TESTS")) { - qtest_add_func("/migration/multifd/tcp/plain/cancel", - test_multifd_tcp_cancel); - } + qtest_add_func("/migration/multifd/tcp/plain/cancel", + test_multifd_tcp_cancel); qtest_add_func("/migration/multifd/tcp/plain/zlib", test_multifd_tcp_zlib); #ifdef CONFIG_ZSTD
We've found the source of flakiness in this test, so re-enable it. Signed-off-by: Fabiano Rosas <farosas@suse.de> --- tests/qtest/migration-test.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)