Message ID | 20240612112311.10334-1-chrubis@suse.cz |
---|---|
State | Changes Requested |
Headers | show |
Series | [v2] sched: starvation: Autocallibrate the timeout | expand |
Hi Cyril, > Instead of hardcoding the values we attempt to measure the CPU speed and > set the timeout accordingly. Given that the difference in the duration > of the test when the kernel is buggy is about 30x we do not have to have > a precise callibration, just very rough estimate if we are running on a > server or small ARM board would suffice. > So we attempt to measure how long does a bussy loop take and base the s/bussy/busy/ > default timeout on that. On x86_64 CPUs the resulting timeout is about > double of the value of the actual test runtime and works fine, but we > need to make sure that the coeficient we divide the result from > callibrate works for small boards too. So please run the test on as many > machines as you can and report if we need to make the dividor smaller or > not. This looks much better: ppc64le has Timeout per run is 0h 00m 30s (1m 10s runtime), others (x68_64, s390x, aarch64) have ~ 1 min or less. Tested-by: Petr Vorel <pvorel@suse.cz> ... > +static int callibrate(void) > +{ > + int i; > + struct timespec start, stop; > + long long diff; > + > + for (i = 0; i < CALLIBRATE_LOOPS; i++) { > + __asm__ __volatile__ ("" : "+g" (i) : :); > + } > + > + SAFE_CLOCK_GETTIME(CLOCK_MONOTONIC_RAW, &start); > + > + for (i = 0; i < CALLIBRATE_LOOPS; i++) { > + __asm__ __volatile__ ("" : "+g" (i) : :); > + } > + > + SAFE_CLOCK_GETTIME(CLOCK_MONOTONIC_RAW, &stop); > + > + diff = tst_timespec_diff_us(stop, start); > + > + tst_res(TINFO, "CPU did %i loops in %llius", CALLIBRATE_LOOPS, diff); > + > + return diff; > +} > static int wait_for_pid(pid_t pid) > { > @@ -59,8 +88,11 @@ static void setup(void) > if (tst_parse_long(str_loop, &loop, 1, LONG_MAX)) > tst_brk(TBROK, "Invalid number of loop number '%s'", str_loop); > - if (tst_parse_int(str_timeout, &timeout, 1, INT_MAX)) > + if (tst_parse_int(str_timeout, &timeout, 1, INT_MAX)) { nit: {} brackets are not needed. Reviewed-by: Petr Vorel <pvorel@suse.cz> Kind regards, Petr > tst_brk(TBROK, "Invalid number of timeout '%s'", str_timeout); > + } else { > + timeout = callibrate() / 1000; > + } > tst_set_max_runtime(timeout); > }
Hi! > > Instead of hardcoding the values we attempt to measure the CPU speed and > > set the timeout accordingly. Given that the difference in the duration > > of the test when the kernel is buggy is about 30x we do not have to have > > a precise callibration, just very rough estimate if we are running on a > > server or small ARM board would suffice. > > > So we attempt to measure how long does a bussy loop take and base the > s/bussy/busy/ > > > default timeout on that. On x86_64 CPUs the resulting timeout is about > > double of the value of the actual test runtime and works fine, but we > > need to make sure that the coeficient we divide the result from > > callibrate works for small boards too. So please run the test on as many > > machines as you can and report if we need to make the dividor smaller or > > not. > > This looks much better: ppc64le has Timeout per run is 0h 00m 30s (1m 10s > runtime), others (x68_64, s390x, aarch64) have ~ 1 min or less. > > Tested-by: Petr Vorel <pvorel@suse.cz> And I've realized that we also have to check if we had exhausted our runtime in the testcase, because if someone overrides the timeout the test will stil pass even when it shouldn't. I will send a separate patch for that.
On Wed, 12 Jun 2024 at 13:23, Cyril Hrubis <chrubis@suse.cz> wrote: > > Instead of hardcoding the values we attempt to measure the CPU speed and > set the timeout accordingly. Given that the difference in the duration > of the test when the kernel is buggy is about 30x we do not have to have > a precise callibration, just very rough estimate if we are running on a > server or small ARM board would suffice. > > So we attempt to measure how long does a bussy loop take and base the > default timeout on that. On x86_64 CPUs the resulting timeout is about > double of the value of the actual test runtime and works fine, but we > need to make sure that the coeficient we divide the result from > callibrate works for small boards too. So please run the test on as many > machines as you can and report if we need to make the dividor smaller or > not. > > CC: Steve <xufeifei1@oppo.com> > Signed-off-by: Cyril Hrubis <chrubis@suse.cz> I've tested this patch on 1 arm HW and 2 arm64 HW's. see the results below. arm HW: ===== [1;36mstarvation[0m ===== command: starvation [ 43.278533] /usr/local/bin/kirk[356]: starting test starvation (starvation) tst_test.c:1734: TINFO: LTP version: 20180118-5676-g4696467fb tst_test.c:1618: TINFO: Timeout per run is 0h 05m 24s starvation.c:86: TPASS: sched_setaffinity(0, sizeof(mask), &mask) returned 0 starvation.c:54: TINFO: CPU did 100000000 loops in 89172us tst_test.c:1626: TINFO: Updating max runtime to 0h 01m 29s tst_test.c:1618: TINFO: Timeout per run is 0h 06m 53s starvation.c:132: TPASS: wait_for_pid(child_pid) passed Summary: passed 2 failed 0 broken 0 skipped 0 warnings 0 Duration: 1m 2s arm64 HW1: ===== [1;36mstarvation[0m ===== command: starvation [ 49.213159] /usr/local/bin/kirk[362]: starting test starvation (starvation) [ 49.213159] /usr/local/bin/kirk[362]: starting test starvation (starvation) tst_test.c:1734: TINFO: LTP version: 20180118-5676-g4696467fb tst_test.c:1618: TINFO: Timeout per run is 0h 05m 24s starvation.c:86: TPASS: sched_setaffinity(0, sizeof(mask), &mask) returned 0 starvation.c:54: TINFO: CPU did 100000000 loops in 141688us tst_test.c:1626: TINFO: Updating max runtime to 0h 02m 21s tst_test.c:1618: TINFO: Timeout per run is 0h 07m 45s starvation.c:132: TPASS: wait_for_pid(child_pid) passed Summary: passed 2 failed 0 broken 0 skipped 0 warnings 0 Duration: 1m 32s arm64 HW2: ===== [1;36mstarvation[0m ===== command: starvation [ 25.384826] /usr/local/bin/kirk[775]: starting test starvation (starvation) tst_test.c:1734: TINFO: LTP version: 20180118-5676-g4696467fb tst_test.c:1618: TINFO: Timeout per run is 0h 05m 24s starvation.c:86: TPASS: sched_setaffinity(0, sizeof(mask), &mask) returned 0 starvation.c:54: TINFO: CPU did 100000000 loops in 113584us tst_test.c:1626: TINFO: Updating max runtime to 0h 01m 53s tst_test.c:1618: TINFO: Timeout per run is 0h 07m 17s starvation.c:132: TPASS: wait_for_pid(child_pid) passed Summary: passed 2 failed 0 broken 0 skipped 0 warnings 0 Duration: 1m 15s Tested-by: Anders Roxell <anders.roxell@linaro.org> Cheers, Anders
Hi Cyril, > Instead of hardcoding the values we attempt to measure the CPU speed and > set the timeout accordingly. Given that the difference in the duration > of the test when the kernel is buggy is about 30x we do not have to have > a precise callibration, just very rough estimate if we are running on a > server or small ARM board would suffice. > So we attempt to measure how long does a bussy loop take and base the > default timeout on that. On x86_64 CPUs the resulting timeout is about > double of the value of the actual test runtime and works fine, but we > need to make sure that the coeficient we divide the result from > callibrate works for small boards too. So please run the test on as many > machines as you can and report if we need to make the dividor smaller or > not. While applying this new patch, the test result of 32bit kernel is still failed. Please refers following log: external/ltp/lib/tst_test.c:1690: TINFO: LTP version: 20230929 external/ltp/lib/tst_test.c:1576: TINFO: Timeout per run is 0h 00m 30s external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:90: TPASS: sched_setaffinity(0, sizeof(mask), &mask) returned 0 external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:58: TINFO: CPU did 100000000 loops in 198306us external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:99: TINFO: cal timeout: 198 external/ltp/lib/tst_test.c:1583: TINFO: Updating max runtime to 0h 03m 18s external/ltp/lib/tst_test.c:1576: TINFO: Timeout per run is 0h 03m 48s Test timeouted, sending SIGKILL! After using "time ./starvation" to check the timecost, we can see the timecost is about 26 mins and 48 seconds
Hi! > > Instead of hardcoding the values we attempt to measure the CPU speed and > > set the timeout accordingly. Given that the difference in the duration > > of the test when the kernel is buggy is about 30x we do not have to have > > a precise callibration, just very rough estimate if we are running on a > > server or small ARM board would suffice. > > > So we attempt to measure how long does a bussy loop take and base the > > default timeout on that. On x86_64 CPUs the resulting timeout is about > > double of the value of the actual test runtime and works fine, but we > > need to make sure that the coeficient we divide the result from > > callibrate works for small boards too. So please run the test on as many > > machines as you can and report if we need to make the dividor smaller or > > not. > > While applying this new patch, the test result of 32bit kernel is still failed. Please refers following log: > > external/ltp/lib/tst_test.c:1690: TINFO: LTP version: 20230929 > external/ltp/lib/tst_test.c:1576: TINFO: Timeout per run is 0h 00m 30s > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:90: TPASS: sched_setaffinity(0, sizeof(mask), &mask) returned 0 > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:58: TINFO: CPU did 100000000 loops in 198306us > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:99: TINFO: cal timeout: 198 > external/ltp/lib/tst_test.c:1583: TINFO: Updating max runtime to 0h 03m 18s > external/ltp/lib/tst_test.c:1576: TINFO: Timeout per run is 0h 03m 48s > Test timeouted, sending SIGKILL! > > After using "time ./starvation" to check the timecost, we can see the timecost is about 26 mins and 48 seconds This is really strange. I tested the code on RPi Zero with 32bit BCM2835 where the CPU is even slower and runtime limit is set to about 5 minutes, yet the test finished in a minute. That's with resonably recent kernel 6.1 though. What kernel version do you use? Do you run LTP with background tasks to keep the CPU bussy? Do you have any custom patches for kernel scheduller applied? It really looks like something unexpected is happening at your end.
> Hi! > > > Instead of hardcoding the values we attempt to measure the CPU speed and > > > set the timeout accordingly. Given that the difference in the duration > > > of the test when the kernel is buggy is about 30x we do not have to have > > > a precise callibration, just very rough estimate if we are running on a > > > server or small ARM board would suffice. > > > So we attempt to measure how long does a bussy loop take and base the > > > default timeout on that. On x86_64 CPUs the resulting timeout is about > > > double of the value of the actual test runtime and works fine, but we > > > need to make sure that the coeficient we divide the result from > > > callibrate works for small boards too. So please run the test on as many > > > machines as you can and report if we need to make the dividor smaller or > > > not. > > While applying this new patch, the test result of 32bit kernel is still failed. Please refers following log: > > > > external/ltp/lib/tst_test.c:1690: TINFO: LTP version: 20230929 > > external/ltp/lib/tst_test.c:1576: TINFO: Timeout per run is 0h 00m 30s > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:90: TPASS: sched_setaffinity(0, sizeof(mask), &mask) returned 0 > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:58: TINFO: CPU did 100000000 loops in 198306us > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:99: TINFO: cal timeout: 198 > > external/ltp/lib/tst_test.c:1583: TINFO: Updating max runtime to 0h 03m 18s > > external/ltp/lib/tst_test.c:1576: TINFO: Timeout per run is 0h 03m 48s > > Test timeouted, sending SIGKILL! > > > > After using "time ./starvation" to check the timecost, we can see the timecost is about 26 mins and 48 seconds > This is really strange. I tested the code on RPi Zero with 32bit BCM2835 > where the CPU is even slower and runtime limit is set to about 5 > minutes, yet the test finished in a minute. That's with resonably recent > kernel 6.1 though. > What kernel version do you use? We are now using kernel 5.15 > Do you run LTP with background tasks to keep the CPU bussy? Yes, we are testing LTP with Android VTS. And the test environment is under Android Launcher. > Do you have any custom patches for kernel scheduller applied? I think there is no any patch applied for kernel scheduller. > It really looks like something unexpected is happening at your end. Actually, the same kernel version using 64bit kernel can pass the ltp test. 1. for 32bit kernel, we have this log 32bit kernel external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:161: TINFO: do_test by pid 4523 external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4523 external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 4524 external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4524 external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 0 external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:176: TINFO: go loop, current pid is 4523 external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:145: TINFO: current ppid is 4523, current pid is 4524, go to child() start main pid is 4523, child pid is 4524, and we only see child pid is working (checking by top) 4524 root 20 0 14M 472K 0 R 85.7 0.0 0:14.93 starvation_v4 -t 1000000 -l 1000000 <-- cpu_load by top 2. for 64bit kernel, we have this log 64bit kernel external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:161: TINFO: do_test by pid 9753 external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 9753 external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 9754 external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 9754 external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 0 external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:176: TINFO: go loop, current pid is 9753 external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:145: TINFO: current ppid is 9753, current pid is 9754, go to child() start main pid is 9753, child pid is 9754, and we see both main and child pid are working (checking by top) 9754 root 20 0 16M 472K 0 R 51.4 0.0 0:05.51 starvation_v4 -t 1000000 -l 1000000 <-- cpu_load by top 9753 root 20 0 16M 464K 0 R 45.7 0.0 0:05.70 starvation_v4 -t 1000000 -l 1000000 <-- cpu_load by top
Hi! > > What kernel version do you use? > We are now using kernel 5.15 > > > Do you run LTP with background tasks to keep the CPU bussy? > Yes, we are testing LTP with Android VTS. And the test environment is under Android Launcher. > > > Do you have any custom patches for kernel scheduller applied? > I think there is no any patch applied for kernel scheduller. > > > It really looks like something unexpected is happening at your end. > > Actually, the same kernel version using 64bit kernel can pass the ltp test. > > > > 1. for 32bit kernel, we have this log > > 32bit kernel > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:161: TINFO: do_test by pid 4523 > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4523 > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 4524 > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4524 > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 0 > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:176: TINFO: go loop, current pid is 4523 > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:145: TINFO: current ppid is 4523, current pid is 4524, go to child() start > > main pid is 4523, child pid is 4524, and we only see child pid is working (checking by top) > > 4524 root 20 0 14M 472K 0 R 85.7 0.0 0:14.93 starvation_v4 -t 1000000 -l 1000000 <-- cpu_load by top So this looks like we managed to reproduce the issue the test was trying to catch. The child is hogging the CPU and the parent process cannot proceed. And I suppose that this happens only on 32bit because 32bit is less tested these days.
Hi Cyril, all, > Hi! > > > What kernel version do you use? > > We are now using kernel 5.15 > > > Do you run LTP with background tasks to keep the CPU bussy? > > Yes, we are testing LTP with Android VTS. And the test environment is under Android Launcher. > > > Do you have any custom patches for kernel scheduller applied? > > I think there is no any patch applied for kernel scheduller. > > > It really looks like something unexpected is happening at your end. > > Actually, the same kernel version using 64bit kernel can pass the ltp test. > > 1. for 32bit kernel, we have this log > > 32bit kernel > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:161: TINFO: do_test by pid 4523 > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4523 > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 4524 > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4524 > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 0 > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:176: TINFO: go loop, current pid is 4523 > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:145: TINFO: current ppid is 4523, current pid is 4524, go to child() start > > main pid is 4523, child pid is 4524, and we only see child pid is working (checking by top) > > 4524 root 20 0 14M 472K 0 R 85.7 0.0 0:14.93 starvation_v4 -t 1000000 -l 1000000 <-- cpu_load by top > So this looks like we managed to reproduce the issue the test was trying > to catch. The child is hogging the CPU and the parent process cannot > proceed. And I suppose that this happens only on 32bit because 32bit is > less tested these days. I guess we can merge this, right? Or we still not sure whether this is really kernel bug not related to the change? Kind regards, Petr
Hi! > > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:161: TINFO: do_test by pid 4523 > > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4523 > > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 4524 > > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4524 > > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 0 > > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:176: TINFO: go loop, current pid is 4523 > > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:145: TINFO: current ppid is 4523, current pid is 4524, go to child() start > > > > main pid is 4523, child pid is 4524, and we only see child pid is working (checking by top) > > > > 4524 root 20 0 14M 472K 0 R 85.7 0.0 0:14.93 starvation_v4 -t 1000000 -l 1000000 <-- cpu_load by top > > > So this looks like we managed to reproduce the issue the test was trying > > to catch. The child is hogging the CPU and the parent process cannot > > proceed. And I suppose that this happens only on 32bit because 32bit is > > less tested these days. > > I guess we can merge this, right? > Or we still not sure whether this is really kernel bug not related to the > change? It stil needs a runtime check at the end of the loop in a case that timeout is multiplied. I will have to send another version, thanks for reminding me.
diff --git a/testcases/kernel/sched/cfs-scheduler/starvation.c b/testcases/kernel/sched/cfs-scheduler/starvation.c index eb9fd6ff5..0bb820f2f 100644 --- a/testcases/kernel/sched/cfs-scheduler/starvation.c +++ b/testcases/kernel/sched/cfs-scheduler/starvation.c @@ -21,11 +21,40 @@ #include <sched.h> #include "tst_test.h" +#include "tst_safe_clocks.h" +#include "tst_timer.h" static char *str_loop; -static long loop = 2000000; +static long loop = 1000000; static char *str_timeout; -static int timeout = 240; +static int timeout; + +#define CALLIBRATE_LOOPS 100000000 + +static int callibrate(void) +{ + int i; + struct timespec start, stop; + long long diff; + + for (i = 0; i < CALLIBRATE_LOOPS; i++) { + __asm__ __volatile__ ("" : "+g" (i) : :); + } + + SAFE_CLOCK_GETTIME(CLOCK_MONOTONIC_RAW, &start); + + for (i = 0; i < CALLIBRATE_LOOPS; i++) { + __asm__ __volatile__ ("" : "+g" (i) : :); + } + + SAFE_CLOCK_GETTIME(CLOCK_MONOTONIC_RAW, &stop); + + diff = tst_timespec_diff_us(stop, start); + + tst_res(TINFO, "CPU did %i loops in %llius", CALLIBRATE_LOOPS, diff); + + return diff; +} static int wait_for_pid(pid_t pid) { @@ -59,8 +88,11 @@ static void setup(void) if (tst_parse_long(str_loop, &loop, 1, LONG_MAX)) tst_brk(TBROK, "Invalid number of loop number '%s'", str_loop); - if (tst_parse_int(str_timeout, &timeout, 1, INT_MAX)) + if (tst_parse_int(str_timeout, &timeout, 1, INT_MAX)) { tst_brk(TBROK, "Invalid number of timeout '%s'", str_timeout); + } else { + timeout = callibrate() / 1000; + } tst_set_max_runtime(timeout); }
Instead of hardcoding the values we attempt to measure the CPU speed and set the timeout accordingly. Given that the difference in the duration of the test when the kernel is buggy is about 30x we do not have to have a precise callibration, just very rough estimate if we are running on a server or small ARM board would suffice. So we attempt to measure how long does a bussy loop take and base the default timeout on that. On x86_64 CPUs the resulting timeout is about double of the value of the actual test runtime and works fine, but we need to make sure that the coeficient we divide the result from callibrate works for small boards too. So please run the test on as many machines as you can and report if we need to make the dividor smaller or not. CC: Steve <xufeifei1@oppo.com> Signed-off-by: Cyril Hrubis <chrubis@suse.cz> --- Changes in v2: - use different style of the bussy loop .../kernel/sched/cfs-scheduler/starvation.c | 38 +++++++++++++++++-- 1 file changed, 35 insertions(+), 3 deletions(-)