[v2] sched: starvation: Autocallibrate the timeout

Message ID	20240612112311.10334-1-chrubis@suse.cz
State	Changes Requested
Headers	show Return-Path: <ltp-bounces+incoming=patchwork.ozlabs.org@lists.linux.it> From: Cyril Hrubis <chrubis@suse.cz> To: ltp@lists.linux.it Date: Wed, 12 Jun 2024 13:23:11 +0200 Message-ID: <20240612112311.10334-1-chrubis@suse.cz> MIME-Version: 1.0 Subject: [LTP] [PATCH v2] sched: starvation: Autocallibrate the timeout Precedence: list Cc: Steve <xufeifei1@oppo.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ltp-bounces+incoming=patchwork.ozlabs.org@lists.linux.it Sender: "ltp" <ltp-bounces+incoming=patchwork.ozlabs.org@lists.linux.it>
Series	[v2] sched: starvation: Autocallibrate the timeout \| expand [v2] sched: starvation: Autocallibrate the timeout

Cyril Hrubis June 12, 2024, 11:23 a.m. UTC

Instead of hardcoding the values we attempt to measure the CPU speed and
set the timeout accordingly. Given that the difference in the duration
of the test when the kernel is buggy is about 30x we do not have to have
a precise callibration, just very rough estimate if we are running on a
server or small ARM board would suffice.

So we attempt to measure how long does a bussy loop take and base the
default timeout on that. On x86_64 CPUs the resulting timeout is about
double of the value of the actual test runtime and works fine, but we
need to make sure that the coeficient we divide the result from
callibrate works for small boards too. So please run the test on as many
machines as you can and report if we need to make the dividor smaller or
not.

CC: Steve <xufeifei1@oppo.com>
Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
---

Changes in v2:

- use different style of the bussy loop

 .../kernel/sched/cfs-scheduler/starvation.c   | 38 +++++++++++++++++--
 1 file changed, 35 insertions(+), 3 deletions(-)

Petr Vorel June 12, 2024, 12:21 p.m. UTC | #1

Hi Cyril,

> Instead of hardcoding the values we attempt to measure the CPU speed and
> set the timeout accordingly. Given that the difference in the duration
> of the test when the kernel is buggy is about 30x we do not have to have
> a precise callibration, just very rough estimate if we are running on a
> server or small ARM board would suffice.

> So we attempt to measure how long does a bussy loop take and base the
s/bussy/busy/

> default timeout on that. On x86_64 CPUs the resulting timeout is about
> double of the value of the actual test runtime and works fine, but we
> need to make sure that the coeficient we divide the result from
> callibrate works for small boards too. So please run the test on as many
> machines as you can and report if we need to make the dividor smaller or
> not.

This looks much better: ppc64le has Timeout per run is 0h 00m 30s (1m 10s
runtime), others (x68_64, s390x, aarch64) have ~ 1 min or less.

Tested-by: Petr Vorel <pvorel@suse.cz>

...
> +static int callibrate(void)
> +{
> +	int i;
> +	struct timespec start, stop;
> +	long long diff;
> +
> +	for (i = 0; i < CALLIBRATE_LOOPS; i++) {
> +		__asm__ __volatile__ ("" : "+g" (i) : :);
> +	}
> +
> +	SAFE_CLOCK_GETTIME(CLOCK_MONOTONIC_RAW, &start);
> +
> +	for (i = 0; i < CALLIBRATE_LOOPS; i++) {
> +		__asm__ __volatile__ ("" : "+g" (i) : :);
> +	}
> +
> +	SAFE_CLOCK_GETTIME(CLOCK_MONOTONIC_RAW, &stop);
> +
> +	diff = tst_timespec_diff_us(stop, start);
> +
> +	tst_res(TINFO, "CPU did %i loops in %llius", CALLIBRATE_LOOPS, diff);
> +
> +	return diff;
> +}

>  static int wait_for_pid(pid_t pid)
>  {
> @@ -59,8 +88,11 @@ static void setup(void)
>  	if (tst_parse_long(str_loop, &loop, 1, LONG_MAX))
>  		tst_brk(TBROK, "Invalid number of loop number '%s'", str_loop);

> -	if (tst_parse_int(str_timeout, &timeout, 1, INT_MAX))
> +	if (tst_parse_int(str_timeout, &timeout, 1, INT_MAX)) {
nit: {} brackets are not needed.

Reviewed-by: Petr Vorel <pvorel@suse.cz>

Kind regards,
Petr

>  		tst_brk(TBROK, "Invalid number of timeout '%s'", str_timeout);
> +	} else {
> +		timeout = callibrate() / 1000;
> +	}

>  	tst_set_max_runtime(timeout);
>  }

Cyril Hrubis June 12, 2024, 6:22 p.m. UTC | #2

Hi!
> > Instead of hardcoding the values we attempt to measure the CPU speed and
> > set the timeout accordingly. Given that the difference in the duration
> > of the test when the kernel is buggy is about 30x we do not have to have
> > a precise callibration, just very rough estimate if we are running on a
> > server or small ARM board would suffice.
> 
> > So we attempt to measure how long does a bussy loop take and base the
> s/bussy/busy/
> 
> > default timeout on that. On x86_64 CPUs the resulting timeout is about
> > double of the value of the actual test runtime and works fine, but we
> > need to make sure that the coeficient we divide the result from
> > callibrate works for small boards too. So please run the test on as many
> > machines as you can and report if we need to make the dividor smaller or
> > not.
> 
> This looks much better: ppc64le has Timeout per run is 0h 00m 30s (1m 10s
> runtime), others (x68_64, s390x, aarch64) have ~ 1 min or less.
> 
> Tested-by: Petr Vorel <pvorel@suse.cz>

And I've realized that we also have to check if we had exhausted our
runtime in the testcase, because if someone overrides the timeout the
test will stil pass even when it shouldn't. I will send a separate patch
for that.

Anders Roxell June 13, 2024, 11:22 a.m. UTC | #3

On Wed, 12 Jun 2024 at 13:23, Cyril Hrubis <chrubis@suse.cz> wrote:
>
> Instead of hardcoding the values we attempt to measure the CPU speed and
> set the timeout accordingly. Given that the difference in the duration
> of the test when the kernel is buggy is about 30x we do not have to have
> a precise callibration, just very rough estimate if we are running on a
> server or small ARM board would suffice.
>
> So we attempt to measure how long does a bussy loop take and base the
> default timeout on that. On x86_64 CPUs the resulting timeout is about
> double of the value of the actual test runtime and works fine, but we
> need to make sure that the coeficient we divide the result from
> callibrate works for small boards too. So please run the test on as many
> machines as you can and report if we need to make the dividor smaller or
> not.
>
> CC: Steve <xufeifei1@oppo.com>
> Signed-off-by: Cyril Hrubis <chrubis@suse.cz>

I've tested this patch on 1 arm HW and 2 arm64 HW's. see the results below.

arm HW:
===== [1;36mstarvation[0m =====
command: starvation
[   43.278533] /usr/local/bin/kirk[356]: starting test starvation (starvation)
tst_test.c:1734: TINFO: LTP version: 20180118-5676-g4696467fb
tst_test.c:1618: TINFO: Timeout per run is 0h 05m 24s
starvation.c:86: TPASS: sched_setaffinity(0, sizeof(mask), &mask) returned 0
starvation.c:54: TINFO: CPU did 100000000 loops in 89172us
tst_test.c:1626: TINFO: Updating max runtime to 0h 01m 29s
tst_test.c:1618: TINFO: Timeout per run is 0h 06m 53s
starvation.c:132: TPASS: wait_for_pid(child_pid) passed
Summary:
passed   2
failed   0
broken   0
skipped  0
warnings 0
Duration: 1m 2s

arm64 HW1:
===== [1;36mstarvation[0m =====
command: starvation
[   49.213159] /usr/local/bin/kirk[362]: starting test starvation (starvation)
[   49.213159] /usr/local/bin/kirk[362]: starting test starvation (starvation)
tst_test.c:1734: TINFO: LTP version: 20180118-5676-g4696467fb
tst_test.c:1618: TINFO: Timeout per run is 0h 05m 24s
starvation.c:86: TPASS: sched_setaffinity(0, sizeof(mask), &mask) returned 0
starvation.c:54: TINFO: CPU did 100000000 loops in 141688us
tst_test.c:1626: TINFO: Updating max runtime to 0h 02m 21s
tst_test.c:1618: TINFO: Timeout per run is 0h 07m 45s
starvation.c:132: TPASS: wait_for_pid(child_pid) passed
Summary:
passed   2
failed   0
broken   0
skipped  0
warnings 0
Duration: 1m 32s

arm64 HW2:
===== [1;36mstarvation[0m =====
command: starvation
[   25.384826] /usr/local/bin/kirk[775]: starting test starvation (starvation)
tst_test.c:1734: TINFO: LTP version: 20180118-5676-g4696467fb
tst_test.c:1618: TINFO: Timeout per run is 0h 05m 24s
starvation.c:86: TPASS: sched_setaffinity(0, sizeof(mask), &mask) returned 0
starvation.c:54: TINFO: CPU did 100000000 loops in 113584us
tst_test.c:1626: TINFO: Updating max runtime to 0h 01m 53s
tst_test.c:1618: TINFO: Timeout per run is 0h 07m 17s
starvation.c:132: TPASS: wait_for_pid(child_pid) passed
Summary:
passed   2
failed   0
broken   0
skipped  0
warnings 0
Duration: 1m 15s

Tested-by: Anders Roxell <anders.roxell@linaro.org>

Cheers,
Anders

joe.liu June 27, 2024, 9:28 a.m. UTC | #4

Hi Cyril,

> Instead of hardcoding the values we attempt to measure the CPU speed and
> set the timeout accordingly. Given that the difference in the duration
> of the test when the kernel is buggy is about 30x we do not have to have
> a precise callibration, just very rough estimate if we are running on a
> server or small ARM board would suffice.

> So we attempt to measure how long does a bussy loop take and base the
> default timeout on that. On x86_64 CPUs the resulting timeout is about
> double of the value of the actual test runtime and works fine, but we
> need to make sure that the coeficient we divide the result from
> callibrate works for small boards too. So please run the test on as many
> machines as you can and report if we need to make the dividor smaller or
> not.

While applying this new patch, the test result of 32bit kernel is still failed. Please refers following log:

external/ltp/lib/tst_test.c:1690: TINFO: LTP version: 20230929
external/ltp/lib/tst_test.c:1576: TINFO: Timeout per run is 0h 00m 30s
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:90: TPASS: sched_setaffinity(0, sizeof(mask), &mask) returned 0
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:58: TINFO: CPU did 100000000 loops in 198306us
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:99: TINFO: cal timeout: 198
external/ltp/lib/tst_test.c:1583: TINFO: Updating max runtime to 0h 03m 18s
external/ltp/lib/tst_test.c:1576: TINFO: Timeout per run is 0h 03m 48s
Test timeouted, sending SIGKILL!

After using "time ./starvation" to check the timecost, we can see the timecost is about 26 mins and 48 seconds

Cyril Hrubis June 28, 2024, 12:49 p.m. UTC | #5

Hi!
> > Instead of hardcoding the values we attempt to measure the CPU speed and
> > set the timeout accordingly. Given that the difference in the duration
> > of the test when the kernel is buggy is about 30x we do not have to have
> > a precise callibration, just very rough estimate if we are running on a
> > server or small ARM board would suffice.
> 
> > So we attempt to measure how long does a bussy loop take and base the
> > default timeout on that. On x86_64 CPUs the resulting timeout is about
> > double of the value of the actual test runtime and works fine, but we
> > need to make sure that the coeficient we divide the result from
> > callibrate works for small boards too. So please run the test on as many
> > machines as you can and report if we need to make the dividor smaller or
> > not.
> 
> While applying this new patch, the test result of 32bit kernel is still failed. Please refers following log:
> 
> external/ltp/lib/tst_test.c:1690: TINFO: LTP version: 20230929
> external/ltp/lib/tst_test.c:1576: TINFO: Timeout per run is 0h 00m 30s
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:90: TPASS: sched_setaffinity(0, sizeof(mask), &mask) returned 0
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:58: TINFO: CPU did 100000000 loops in 198306us
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:99: TINFO: cal timeout: 198
> external/ltp/lib/tst_test.c:1583: TINFO: Updating max runtime to 0h 03m 18s
> external/ltp/lib/tst_test.c:1576: TINFO: Timeout per run is 0h 03m 48s
> Test timeouted, sending SIGKILL!
> 
> After using "time ./starvation" to check the timecost, we can see the timecost is about 26 mins and 48 seconds

This is really strange. I tested the code on RPi Zero with 32bit BCM2835
where the CPU is even slower and runtime limit is set to about 5
minutes, yet the test finished in a minute. That's with resonably recent
kernel 6.1 though.

What kernel version do you use?

Do you run LTP with background tasks to keep the CPU bussy?

Do you have any custom patches for kernel scheduller applied?

It really looks like something unexpected is happening at your end.

joe.liu July 1, 2024, 9:50 a.m. UTC | #6

> Hi!
> > > Instead of hardcoding the values we attempt to measure the CPU speed and
> > > set the timeout accordingly. Given that the difference in the duration
> > > of the test when the kernel is buggy is about 30x we do not have to have
> > > a precise callibration, just very rough estimate if we are running on a
> > > server or small ARM board would suffice.

> > > So we attempt to measure how long does a bussy loop take and base the
> > > default timeout on that. On x86_64 CPUs the resulting timeout is about
> > > double of the value of the actual test runtime and works fine, but we
> > > need to make sure that the coeficient we divide the result from
> > > callibrate works for small boards too. So please run the test on as many
> > > machines as you can and report if we need to make the dividor smaller or
> > > not.
 
> > While applying this new patch, the test result of 32bit kernel is still failed. Please refers following log:
> > 
> > external/ltp/lib/tst_test.c:1690: TINFO: LTP version: 20230929
> > external/ltp/lib/tst_test.c:1576: TINFO: Timeout per run is 0h 00m 30s
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:90: TPASS: sched_setaffinity(0, sizeof(mask), &mask) returned 0
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:58: TINFO: CPU did 100000000 loops in 198306us
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:99: TINFO: cal timeout: 198
> > external/ltp/lib/tst_test.c:1583: TINFO: Updating max runtime to 0h 03m 18s
> > external/ltp/lib/tst_test.c:1576: TINFO: Timeout per run is 0h 03m 48s
> > Test timeouted, sending SIGKILL!
> > 
> > After using "time ./starvation" to check the timecost, we can see the timecost is about 26 mins and 48 seconds

> This is really strange. I tested the code on RPi Zero with 32bit BCM2835
> where the CPU is even slower and runtime limit is set to about 5
> minutes, yet the test finished in a minute. That's with resonably recent
> kernel 6.1 though.

> What kernel version do you use?
We are now using kernel 5.15

> Do you run LTP with background tasks to keep the CPU bussy?
Yes, we are testing LTP with Android VTS. And the test environment is under Android Launcher.

> Do you have any custom patches for kernel scheduller applied?
I think there is no any patch applied for kernel scheduller.

> It really looks like something unexpected is happening at your end.

Actually, the same kernel version using 64bit kernel can pass the ltp test.



1. for 32bit kernel, we have this log

32bit kernel
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:161: TINFO: do_test by pid 4523
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4523
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 4524
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4524
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 0
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:176: TINFO: go loop, current pid is 4523
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:145: TINFO: current ppid is 4523, current pid is 4524, go to child() start

main pid is 4523, child pid is 4524, and we only see child pid is working (checking by top)

4524 root         20   0  14M 472K    0 R 85.7   0.0   0:14.93 starvation_v4 -t 1000000 -l 1000000 <-- cpu_load by top




2. for 64bit kernel, we have this log

64bit kernel
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:161: TINFO: do_test by pid 9753
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 9753
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 9754
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 9754
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 0
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:176: TINFO: go loop, current pid is 9753
external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:145: TINFO: current ppid is 9753, current pid is 9754, go to child() start

main pid is 9753, child pid is 9754, and we see both main and child pid are working (checking by top)

9754 root         20   0  16M 472K    0 R 51.4   0.0   0:05.51 starvation_v4 -t 1000000 -l 1000000 <-- cpu_load by top
9753 root         20   0  16M 464K    0 R 45.7   0.0   0:05.70 starvation_v4 -t 1000000 -l 1000000 <-- cpu_load by top

Cyril Hrubis July 2, 2024, 2:47 p.m. UTC | #7

Hi!
> > What kernel version do you use?
> We are now using kernel 5.15
> 
> > Do you run LTP with background tasks to keep the CPU bussy?
> Yes, we are testing LTP with Android VTS. And the test environment is under Android Launcher.
> 
> > Do you have any custom patches for kernel scheduller applied?
> I think there is no any patch applied for kernel scheduller.
> 
> > It really looks like something unexpected is happening at your end.
> 
> Actually, the same kernel version using 64bit kernel can pass the ltp test.
> 
> 
> 
> 1. for 32bit kernel, we have this log
> 
> 32bit kernel
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:161: TINFO: do_test by pid 4523
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4523
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 4524
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4524
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 0
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:176: TINFO: go loop, current pid is 4523
> external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:145: TINFO: current ppid is 4523, current pid is 4524, go to child() start
> 
> main pid is 4523, child pid is 4524, and we only see child pid is working (checking by top)
> 
> 4524 root         20   0  14M 472K    0 R 85.7   0.0   0:14.93 starvation_v4 -t 1000000 -l 1000000 <-- cpu_load by top

So this looks like we managed to reproduce the issue the test was trying
to catch. The child is hogging the CPU and the parent process cannot
proceed. And I suppose that this happens only on 32bit because 32bit is
less tested these days.

Petr Vorel July 10, 2024, 3:55 p.m. UTC | #8

Hi Cyril,
all,

> Hi!
> > > What kernel version do you use?
> > We are now using kernel 5.15

> > > Do you run LTP with background tasks to keep the CPU bussy?
> > Yes, we are testing LTP with Android VTS. And the test environment is under Android Launcher.

> > > Do you have any custom patches for kernel scheduller applied?
> > I think there is no any patch applied for kernel scheduller.

> > > It really looks like something unexpected is happening at your end.

> > Actually, the same kernel version using 64bit kernel can pass the ltp test.



> > 1. for 32bit kernel, we have this log

> > 32bit kernel
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:161: TINFO: do_test by pid 4523
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4523
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 4524
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4524
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 0
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:176: TINFO: go loop, current pid is 4523
> > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:145: TINFO: current ppid is 4523, current pid is 4524, go to child() start

> > main pid is 4523, child pid is 4524, and we only see child pid is working (checking by top)

> > 4524 root         20   0  14M 472K    0 R 85.7   0.0   0:14.93 starvation_v4 -t 1000000 -l 1000000 <-- cpu_load by top

> So this looks like we managed to reproduce the issue the test was trying
> to catch. The child is hogging the CPU and the parent process cannot
> proceed. And I suppose that this happens only on 32bit because 32bit is
> less tested these days.

I guess we can merge this, right?
Or we still not sure whether this is really kernel bug not related to the
change?

Kind regards,
Petr

Cyril Hrubis July 10, 2024, 6:12 p.m. UTC | #9

Hi!
> > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:161: TINFO: do_test by pid 4523
> > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4523
> > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 4524
> > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:166: TINFO: main pid is 4524
> > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:167: TINFO: child pid is 0
> > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:176: TINFO: go loop, current pid is 4523
> > > external/ltp/testcases/kernel/sched/cfs-scheduler/starvation.c:145: TINFO: current ppid is 4523, current pid is 4524, go to child() start
> 
> > > main pid is 4523, child pid is 4524, and we only see child pid is working (checking by top)
> 
> > > 4524 root         20   0  14M 472K    0 R 85.7   0.0   0:14.93 starvation_v4 -t 1000000 -l 1000000 <-- cpu_load by top
> 
> > So this looks like we managed to reproduce the issue the test was trying
> > to catch. The child is hogging the CPU and the parent process cannot
> > proceed. And I suppose that this happens only on 32bit because 32bit is
> > less tested these days.
> 
> I guess we can merge this, right?
> Or we still not sure whether this is really kernel bug not related to the
> change?

It stil needs a runtime check at the end of the loop in a case that
timeout is multiplied. I will have to send another version, thanks for
reminding me.

[v2] sched: starvation: Autocallibrate the timeout

Commit Message

Comments

Patch