From patchwork Fri Oct 23 13:07:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Liebler X-Patchwork-Id: 1386738 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=sourceware.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=y3+1mUmc; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CHkzW3QLFz9sSs for ; Sat, 24 Oct 2020 00:07:35 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9489B386F433; Fri, 23 Oct 2020 13:07:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9489B386F433 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1603458452; bh=66gGjihx5E1fSBUoG1Fbzeu6EmDzC7r0uQiuy5QUTqE=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=y3+1mUmcu3Qh6ZNTJHofKpTuP6ssseD1ZebLsQtnNcIAnRja018C5Mls5ZnF2xRhr 9ZU5C+Ge4EGEVGU3d5A1LNHYVBC0S7rhjkWDUwBAzxCSfeIKV9eUIaFKGjEk6v4Qz4 oWF1miBx2pBDPJjvaJZGvy1CEa5pkdw8iJnX2Bn8= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 821D63858022 for ; Fri, 23 Oct 2020 13:07:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 821D63858022 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 09ND2HUs172294 for ; Fri, 23 Oct 2020 09:07:30 -0400 Received: from ppma03fra.de.ibm.com (6b.4a.5195.ip4.static.sl-reverse.com [149.81.74.107]) by mx0b-001b2d01.pphosted.com with ESMTP id 34btftj99q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 23 Oct 2020 09:07:29 -0400 Received: from pps.filterd (ppma03fra.de.ibm.com [127.0.0.1]) by ppma03fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 09NCw3IL024496 for ; Fri, 23 Oct 2020 13:07:27 GMT Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by ppma03fra.de.ibm.com with ESMTP id 347r87ueft-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 23 Oct 2020 13:07:26 +0000 Received: from d06av24.portsmouth.uk.ibm.com (mk.ibm.com [9.149.105.60]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 09ND7OaV31129894 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 23 Oct 2020 13:07:24 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 186F642054; Fri, 23 Oct 2020 13:07:24 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F3C654204F; Fri, 23 Oct 2020 13:07:23 +0000 (GMT) Received: from t35lp56.lnxne.boe (unknown [9.152.108.100]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 23 Oct 2020 13:07:23 +0000 (GMT) To: libc-alpha@sourceware.org Subject: [PATCH] Remove timing related checks of time/tst-cpuclock1 Date: Fri, 23 Oct 2020 15:07:06 +0200 Message-Id: <20201023130706.2279203-1-stli@linux.ibm.com> X-Mailer: git-send-email 2.23.0 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.737 definitions=2020-10-23_04:2020-10-23, 2020-10-23 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 priorityscore=1501 mlxlogscore=999 malwarescore=0 suspectscore=1 adultscore=0 spamscore=0 bulkscore=0 phishscore=0 clxscore=1015 mlxscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010230088 X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Stefan Liebler via Libc-alpha From: Stefan Liebler Reply-To: Stefan Liebler Cc: Stefan Liebler Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" Starting with the commit 04deeaa9ea74b0679dfc9d9155a37b6425f19a9f "Fix time/tst-cpuclock1 intermitent failures" (2020-07-11), this test fails quite often on s390x/s390 with one/multiple of those: "before - after" / "nanosleep time" / "dead - after" ourside reasonable range. On a zVM/kvm guest the CPUs are shared between multiple guests. And even on the lpar (kvm host) the CPUs are usually shared between multiple lpars. The defined CPUs for a lpar/zVM-system could also have lower weights compared to other lpars which let the steal time further grow. Usually I build (-j$(nproc)) and test (PARALLELMFLAGS="-j$(nproc)") glibc multiple times, e.g. with different GCCs, on various lpars or zVM guests at the same time. During this time, I've run the test for 13500 times and obvserved the following fails: ~600x "before - after" ~60x "nanosleep time" ~70x "dead - after" I've also observed a lot of "before - after" fails on a intel kvm-guest while building/testing glibc on it. The mentioned commit has tighten the limits of valid tv_nsec ranges: "before - after" (expected: 500000000): - 100000000 ... 600000000 + 450000000 ... 550000000 "nanosleep time" (expected: 100000000): - 100000000 ... 200000000 + 090000000 ... 120000000 "dead - after" (expected: 100000000): - ... 200000000 + 090000000 ... 120000000 The test itself forks a child process which chew_cpu (user- and kernel-space). The parent process sleeps with nanosleep(0.5s) and measures the child_clock time: Reviewed-by: Carlos O'Donell diff = after - before With much workload on the machine, the child won't make much progess and it can fall much beyond the minimum limit. Afterwards the parent process sleeps with clock_nanosleep (child_clock, 0.1s): diff = afterns - after The test currently also allows 0.9 * 0.1s which would be an error. Depending on the workload, the maximum limit can exceed the 1.2 * 0.1s. For "dead - after", the parent process kills the child process and waits long enough to let the child finish dying. Then it gets the time of the child: diff = dead - after Note that diff also contains the time for the previous clock_nanosleep. Thus you'll often see both fails at the same time. After discussion on the mailing list, we've decided to keep the functional checks for the clock* functions and remove the timing related checks as those are prone to false positives. --- time/tst-cpuclock1.c | 65 ++++---------------------------------------- 1 file changed, 6 insertions(+), 59 deletions(-) diff --git a/time/tst-cpuclock1.c b/time/tst-cpuclock1.c index 1ac611a92b..f40b590111 100644 --- a/time/tst-cpuclock1.c +++ b/time/tst-cpuclock1.c @@ -26,7 +26,6 @@ #include #include #include -#include /* This function is intended to rack up both user and system time. */ static void @@ -125,7 +124,7 @@ do_test (void) child, (unsigned long int) child_clock, (uintmax_t) res.tv_sec, (uintmax_t) res.tv_nsec); - struct timespec before, after; + struct timespec before; if (clock_gettime (child_clock, &before) < 0) { printf ("clock_gettime on live PID %d clock %lx => %s\n", @@ -137,38 +136,7 @@ do_test (void) printf ("live PID %d before sleep => %ju.%.9ju\n", child, (uintmax_t) before.tv_sec, (uintmax_t) before.tv_nsec); - struct timespec sleeptime = { .tv_nsec = 500000000 }; - if (nanosleep (&sleeptime, NULL) != 0) - { - perror ("nanosleep"); - result = 1; - goto done; - } - - if (clock_gettime (child_clock, &after) < 0) - { - printf ("clock_gettime on live PID %d clock %lx => %s\n", - child, (unsigned long int) child_clock, strerror (errno)); - result = 1; - goto done; - } - /* Should be close to 0.5. */ - printf ("live PID %d after sleep => %ju.%.9ju\n", - child, (uintmax_t) after.tv_sec, (uintmax_t) after.tv_nsec); - - /* The bound values are empirically defined by testing this code over high cpu - usage and different nice values. Of all the values we keep the 90th - percentile of values and use those values for our testing allowed range. */ - struct timespec diff = timespec_sub (support_timespec_normalize (after), - support_timespec_normalize (before)); - if (!support_timespec_check_in_range (sleeptime, diff, .9, 1.1)) - { - printf ("before - after %ju.%.9ju outside reasonable range\n", - (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec); - result = 1; - } - - sleeptime.tv_nsec = 100000000; + struct timespec sleeptime = { .tv_nsec = 100000000 }; e = clock_nanosleep (child_clock, 0, &sleeptime, NULL); if (e == EINVAL || e == ENOTSUP || e == ENOSYS) { @@ -191,18 +159,9 @@ do_test (void) } else { - /* The bound values are empirically defined by testing this code over - high cpu usage and different nice values. Of all the values we keep - the 90th percentile of values and use those values for our testing - allowed range. */ - diff = timespec_sub (support_timespec_normalize (afterns), - support_timespec_normalize (after)); - if (!support_timespec_check_in_range (sleeptime, diff, .9, 1.2)) - { - printf ("nanosleep time %ju.%.9ju outside reasonable range\n", - (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec); - result = 1; - } + printf ("live PID %d after sleep => %ju.%.9ju\n", + child, (uintmax_t) afterns.tv_sec, + (uintmax_t) afterns.tv_nsec); } } @@ -231,21 +190,9 @@ do_test (void) result = 1; goto done; } - /* Should be close to 0.6. */ + /* Should be close to 0.1. */ printf ("dead PID %d => %ju.%.9ju\n", child, (uintmax_t) dead.tv_sec, (uintmax_t) dead.tv_nsec); - /* The bound values are empirically defined by testing this code over high cpu - usage and different nice values. Of all the values we keep the 90th - percentile of values and use those values for our testing allowed range. */ - diff = timespec_sub (support_timespec_normalize (dead), - support_timespec_normalize (after)); - sleeptime.tv_nsec = 100000000; - if (!support_timespec_check_in_range (sleeptime, diff, .9, 1.2)) - { - printf ("dead - after %ju.%.9ju outside reasonable range\n", - (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec); - result = 1; - } /* Now reap the child and verify that its clock is no longer valid. */ {