Message ID | 20200828085852.1861153-1-stli@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | Loosen the limits of time/tst-cpuclock1. | expand |
* Stefan Liebler via Libc-alpha: > Starting with the commit 04deeaa9ea74b0679dfc9d9155a37b6425f19a9f "Fix > time/tst-cpuclock1 intermitent failures" (2020-07-11), this test fails > quite often on s390x/s390 with one/multiple of those: "before - after" > / "nanosleep time" / "dead - after" ourside reasonable range. How much value do these cpuclock tests actually have? Maybe we should just remove them, given that their test objective is so difficult to model in current execution environments. Maybe we could measure clock discontinuities on the CPU-burning thread and detect CPU stealing this way. But I'm not sure if this would be worth it. Thanks, Florian
Quoting Florian Weimer via Libc-alpha (2020-08-28 09:29:45) > * Stefan Liebler via Libc-alpha: > > > Starting with the commit 04deeaa9ea74b0679dfc9d9155a37b6425f19a9f "Fix > > time/tst-cpuclock1 intermitent failures" (2020-07-11), this test fails > > quite often on s390x/s390 with one/multiple of those: "before - after" > > / "nanosleep time" / "dead - after" ourside reasonable range. > > How much value do these cpuclock tests actually have? Maybe we should > just remove them, given that their test objective is so difficult to > model in current execution environments. AFAICS this test was moved from rt/ to time/ when the clock_* functions were migrated to libc.so. IMHO this test lost it's purpose during this migration. I can see why it was needed on rt/, with strict time measures, but on time/ it's just testing clock_* API. I this case I don't see a downside to flex the time requirements. A good thing to add for this test is a comment on the beginning stating the purpose of it. :) > > Maybe we could measure clock discontinuities on the CPU-burning thread > and detect CPU stealing this way. But I'm not sure if this would be > worth it. What we want to achieve with this? --- Lucas A. M. Magalhães
* Lucas A. M. Magalhaes: > Quoting Florian Weimer via Libc-alpha (2020-08-28 09:29:45) >> * Stefan Liebler via Libc-alpha: >> >> > Starting with the commit 04deeaa9ea74b0679dfc9d9155a37b6425f19a9f "Fix >> > time/tst-cpuclock1 intermitent failures" (2020-07-11), this test fails >> > quite often on s390x/s390 with one/multiple of those: "before - after" >> > / "nanosleep time" / "dead - after" ourside reasonable range. >> >> How much value do these cpuclock tests actually have? Maybe we should >> just remove them, given that their test objective is so difficult to >> model in current execution environments. > > AFAICS this test was moved from rt/ to time/ when the clock_* functions > were migrated to libc.so. IMHO this test lost it's purpose during this > migration. I can see why it was needed on rt/, with strict time > measures, but on time/ it's just testing clock_* API. I this case I don't > see a downside to flex the time requirements. The location in the tree does not impact the test objectives. The move was done to reflect that for quite a few releases now, clock_gettime has been provided by libc, not librt. >> Maybe we could measure clock discontinuities on the CPU-burning thread >> and detect CPU stealing this way. But I'm not sure if this would be >> worth it. > > What we want to achieve with this? I think the goal is to ensure that glibc and the kernel agree about the definition of the clocks, i.e. that the clock is indeed a CPU time clock and does not measure wall-clock time. Thanks, Florian
On 8/31/20 2:59 PM, Florian Weimer wrote: > * Lucas A. M. Magalhaes: > >> Quoting Florian Weimer via Libc-alpha (2020-08-28 09:29:45) >>> * Stefan Liebler via Libc-alpha: >>> >>>> Starting with the commit 04deeaa9ea74b0679dfc9d9155a37b6425f19a9f "Fix >>>> time/tst-cpuclock1 intermitent failures" (2020-07-11), this test fails >>>> quite often on s390x/s390 with one/multiple of those: "before - after" >>>> / "nanosleep time" / "dead - after" ourside reasonable range. >>> >>> How much value do these cpuclock tests actually have? Maybe we should >>> just remove them, given that their test objective is so difficult to >>> model in current execution environments. >> >> AFAICS this test was moved from rt/ to time/ when the clock_* functions >> were migrated to libc.so. IMHO this test lost it's purpose during this >> migration. I can see why it was needed on rt/, with strict time >> measures, but on time/ it's just testing clock_* API. I this case I don't >> see a downside to flex the time requirements. > > The location in the tree does not impact the test objectives. The move > was done to reflect that for quite a few releases now, clock_gettime > has been provided by libc, not librt. > >>> Maybe we could measure clock discontinuities on the CPU-burning thread >>> and detect CPU stealing this way. But I'm not sure if this would be >>> worth it. >> >> What we want to achieve with this? > > I think the goal is to ensure that glibc and the kernel agree about the > definition of the clocks, i.e. that the clock is indeed a CPU time clock > and does not measure wall-clock time. > > Thanks, > Florian > Thanks for your comments. How do we want to proceed here: - Shall we just loosen the limits? - Shall we remove the whole test? - Shall we remove only the first check which compares nanosleep vs clock_gettime (child_clock, before|after)? Bye. Stefan
* Stefan Liebler: > How do we want to proceed here: > - Shall we just loosen the limits? > - Shall we remove the whole test? > - Shall we remove only the first check which compares nanosleep vs > clock_gettime (child_clock, before|after)? I lean towards removing both time/tst-cpuclock1 and time/tst-cpuclock2. Thanks, Florian
Quoting Florian Weimer (2020-09-21 08:28:31) > * Stefan Liebler: > > > How do we want to proceed here: > > - Shall we just loosen the limits? > > - Shall we remove the whole test? > > - Shall we remove only the first check which compares nanosleep vs > > clock_gettime (child_clock, before|after)? > > I lean towards removing both time/tst-cpuclock1 and time/tst-cpuclock2. > I don't oppose against removing them, also. --- Lucas A. M. Magalhães
On 29/09/2020 10:53, Lucas A. M. Magalhaes via Libc-alpha wrote: > Quoting Florian Weimer (2020-09-21 08:28:31) >> * Stefan Liebler: >> >>> How do we want to proceed here: >>> - Shall we just loosen the limits? >>> - Shall we remove the whole test? >>> - Shall we remove only the first check which compares nanosleep vs >>> clock_gettime (child_clock, before|after)? >> >> I lean towards removing both time/tst-cpuclock1 and time/tst-cpuclock2. >> > > I don't oppose against removing them, also. > I also lean to remove these tests. If we need to keep adjusting the time limits depending of the underlying architecture the tests might loose their intention to check the interface and/or not indicate a possible regression.
On 9/29/20 10:01 AM, Adhemerval Zanella via Libc-alpha wrote: > > > On 29/09/2020 10:53, Lucas A. M. Magalhaes via Libc-alpha wrote: >> Quoting Florian Weimer (2020-09-21 08:28:31) >>> * Stefan Liebler: >>> >>>> How do we want to proceed here: >>>> - Shall we just loosen the limits? >>>> - Shall we remove the whole test? >>>> - Shall we remove only the first check which compares nanosleep vs >>>> clock_gettime (child_clock, before|after)? >>> >>> I lean towards removing both time/tst-cpuclock1 and time/tst-cpuclock2. >>> >> >> I don't oppose against removing them, also. >> > > I also lean to remove these tests. If we need to keep adjusting the time > limits depending of the underlying architecture the tests might loose > their intention to check the interface and/or not indicate a possible > regression. The tests should be removed because they contain *non-timing* related regression tests for: * clock_getcpuclockid vs. ENOSYS / ESRCH / EPERM * clock_getcpuclockid vs. valid child * clock_gettime of dead child where clock is no longer valid I don't see any other tests that test for that. If we want we can just strip out the time-dependent parts of the tests?
On 29/09/2020 14:22, Carlos O'Donell wrote: > On 9/29/20 10:01 AM, Adhemerval Zanella via Libc-alpha wrote: >> >> >> On 29/09/2020 10:53, Lucas A. M. Magalhaes via Libc-alpha wrote: >>> Quoting Florian Weimer (2020-09-21 08:28:31) >>>> * Stefan Liebler: >>>> >>>>> How do we want to proceed here: >>>>> - Shall we just loosen the limits? >>>>> - Shall we remove the whole test? >>>>> - Shall we remove only the first check which compares nanosleep vs >>>>> clock_gettime (child_clock, before|after)? >>>> >>>> I lean towards removing both time/tst-cpuclock1 and time/tst-cpuclock2. >>>> >>> >>> I don't oppose against removing them, also. >>> >> >> I also lean to remove these tests. If we need to keep adjusting the time >> limits depending of the underlying architecture the tests might loose >> their intention to check the interface and/or not indicate a possible >> regression. > > The tests should be removed because they contain *non-timing* related > regression tests for: I think you meant 'should *not* be remove* based on the points below. > > * clock_getcpuclockid vs. ENOSYS / ESRCH / EPERM > * clock_getcpuclockid vs. valid child > * clock_gettime of dead child where clock is no longer valid > > I don't see any other tests that test for that. > > If we want we can just strip out the time-dependent parts of the tests? > This is better idea indeed.
On 9/30/20 1:48 PM, Adhemerval Zanella via Libc-alpha wrote: > > > On 29/09/2020 14:22, Carlos O'Donell wrote: >> On 9/29/20 10:01 AM, Adhemerval Zanella via Libc-alpha wrote: >>> >>> >>> On 29/09/2020 10:53, Lucas A. M. Magalhaes via Libc-alpha wrote: >>>> Quoting Florian Weimer (2020-09-21 08:28:31) >>>>> * Stefan Liebler: >>>>> >>>>>> How do we want to proceed here: >>>>>> - Shall we just loosen the limits? >>>>>> - Shall we remove the whole test? >>>>>> - Shall we remove only the first check which compares nanosleep vs >>>>>> clock_gettime (child_clock, before|after)? >>>>> >>>>> I lean towards removing both time/tst-cpuclock1 and time/tst-cpuclock2. >>>>> >>>> >>>> I don't oppose against removing them, also. >>>> >>> >>> I also lean to remove these tests. If we need to keep adjusting the time >>> limits depending of the underlying architecture the tests might loose >>> their intention to check the interface and/or not indicate a possible >>> regression. >> >> The tests should be removed because they contain *non-timing* related >> regression tests for: > > I think you meant 'should *not* be remove* based on the points below. > >> >> * clock_getcpuclockid vs. ENOSYS / ESRCH / EPERM >> * clock_getcpuclockid vs. valid child >> * clock_gettime of dead child where clock is no longer valid >> >> I don't see any other tests that test for that. >> >> If we want we can just strip out the time-dependent parts of the tests? >> > > This is better idea indeed. > Hi, Sorry for the long delay. According to all the feedback, I've kept the test itself and removed two of the time-dependent checks. Please have a look at v2 of the patch: "[PATCH v2] Loosen the limits of time/tst-cpuclock1." https://sourceware.org/pipermail/libc-alpha/2020-October/118779.html Thanks, Stefan
diff = after - before" With much workload on the machine, the child won't make much progess and it can fall much beyond the minimum limit. Thus I've set this limit to 0.45%. Even with this limit, the test fails for 11..90 times. If set to 0.30% it fails only 1..8 times. Does this check makes sense at all? Afterwards the parent process sleeps with clock_nanosleep (child_clock, 0.1s): diff = afterns - after The test currently also allows 0.9 * 0.1s. Shouldn't we test the hard limit of 1.0 * 0.1s as minimum limit? Depending on the workload, the maximum limit can exceed the 1.2 * 0.1s. Therefore I've set the upper limit to 1.35. For "dead - after", the parent process kills the child process and waits long enough to let the child finish dying. Then it gets the time of the child: diff = dead - after Note that diff also contains the time for the previous clock_nanosleep. Thus you'll often see both fails at the same time. I've set the upper limit to 1.4. --- time/tst-cpuclock1.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/time/tst-cpuclock1.c b/time/tst-cpuclock1.c index 1ac611a92b..52a3485145 100644 --- a/time/tst-cpuclock1.c +++ b/time/tst-cpuclock1.c @@ -161,7 +161,7 @@ do_test (void) percentile of values and use those values for our testing allowed range. */ struct timespec diff = timespec_sub (support_timespec_normalize (after), support_timespec_normalize (before)); - if (!support_timespec_check_in_range (sleeptime, diff, .9, 1.1)) + if (!support_timespec_check_in_range (sleeptime, diff, .45, 1.1)) { printf ("before - after %ju.%.9ju outside reasonable range\n", (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec); @@ -197,7 +197,7 @@ do_test (void) allowed range. */ diff = timespec_sub (support_timespec_normalize (afterns), support_timespec_normalize (after)); - if (!support_timespec_check_in_range (sleeptime, diff, .9, 1.2)) + if (!support_timespec_check_in_range (sleeptime, diff, 1.0, 1.35)) { printf ("nanosleep time %ju.%.9ju outside reasonable range\n", (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec); @@ -240,7 +240,7 @@ do_test (void) diff = timespec_sub (support_timespec_normalize (dead), support_timespec_normalize (after)); sleeptime.tv_nsec = 100000000; - if (!support_timespec_check_in_range (sleeptime, diff, .9, 1.2)) + if (!support_timespec_check_in_range (sleeptime, diff, 1.0, 1.4)) { printf ("dead - after %ju.%.9ju outside reasonable range\n", (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec);