From patchwork Mon Feb 3 08:17:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiufu Guo X-Patchwork-Id: 1232631 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-518734-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha1 header.s=default header.b=hAognrZ/; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48B1075fHTz9sPK for ; Mon, 3 Feb 2020 19:17:25 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; q=dns; s=default; b=X4i3PGQetm7d k66fQkRuS/A5064ALydhkxiMPDFZ6PX1zUtFgPhunUA8OyrC36X0BTTVsVxCG9Rz 6MgHvuQcjPTz3wsGZDtkP/L6XkFxiJNoxOglIfgH5ORubzn9b/SsbvPfIE//E7Hc jBw3iC4MQbBqTPyziUY2Uz5eI5vihTU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; s=default; bh=BDHK72bv+pAoNThip6 sh4Q6VLJc=; b=hAognrZ/41cyY4vf7DvFfTCA4gl0o0Ax4236NGW8oj04u6v+Kj rcOEG+iXFWnviCTaurSP9q+IM6yB1rSvHMH3BFRTo0E19yDsQRt/U2OYAGaXgC30 4do1kF7dXUYvBe7YoZ81rslUfY9eB+tdfXboMHpcOl93NkY9pYFZLJ87g= Received: (qmail 106600 invoked by alias); 3 Feb 2020 08:17:17 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 106592 invoked by uid 89); 3 Feb 2020 08:17:16 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-23.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy=drafted, 1.7, 2018-11, 201811 X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 03 Feb 2020 08:17:15 +0000 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 0138AKkG030672 for ; Mon, 3 Feb 2020 03:17:13 -0500 Received: from e06smtp04.uk.ibm.com (e06smtp04.uk.ibm.com [195.75.94.100]) by mx0a-001b2d01.pphosted.com with ESMTP id 2xxfht0du9-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 03 Feb 2020 03:17:13 -0500 Received: from localhost by e06smtp04.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 3 Feb 2020 08:17:11 -0000 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp04.uk.ibm.com (192.168.101.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Mon, 3 Feb 2020 08:17:08 -0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 0138H6pA56492110 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 3 Feb 2020 08:17:06 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9295611C06E; Mon, 3 Feb 2020 08:17:06 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 917B511C058; Mon, 3 Feb 2020 08:17:05 +0000 (GMT) Received: from genoa.aus.stglabs.ibm.com (unknown [9.40.192.157]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 3 Feb 2020 08:17:05 +0000 (GMT) From: Jiufu Guo To: gcc-patches@gcc.gnu.org Cc: guojiufu@linux.ibm.com, wschmidt@linux.ibm.com, segher@kernel.crashing.org, pthaugen@us.ibm.com, hubicka@ucw.cz Subject: [PATCH] correct COUNT and PROB for unrolled loop Date: Mon, 3 Feb 2020 16:17:02 +0800 x-cbid: 20020308-0016-0000-0000-000002E32A91 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20020308-0017-0000-0000-000033460065 Message-Id: <1580717822-6073-1-git-send-email-guojiufu@linux.ibm.com> X-IsSubscribed: yes Hi, PR68212 mentioned that the COUNT of unrolled loop was not correct, and comments of this PR also mentioned that loop become 'cold'. The patches of the PR fixed part of the issue. With reference the patch (https://gcc.gnu.org/ml/gcc-patches/2018-11/msg02368.html) and comment (https://gcc.gnu.org/ml/gcc-patches/2018-11/msg02380.html), below patch is drafted to fix other part of this issue. The following patch fixes the wrong COUNT/PROB of unrolled loop. And the patch handles the case where unrolling in unreliable count number can cause a loop to no longer look hot and therefor not get aligned. This patch corrects the PROB of loop exit edge, and corrects RPOB/COUNT of latch block, and the loop count after last peeling. This patch scale by profile_probability::likely () if unrolled count gets unrealistically small. Bootstrap/regtest on powerpc64le with no new regressions. And spec2017 result is fine: a couple INT benchmarks that showed around 1.7% improvement, everything else was +/- <= 1%. Ok for trunk? Jiufu Guo 2020-02-03 Jiufu Guo Pat Haugen PR rtl-optimization/68212 * cfgloopmanip.c (duplicate_loop_to_header_edge): Correct COUNT/PROB for unrolled/peeled blocks. testsuite/ChangeLog: 2020-02-03 Jiufu Guo Pat Haugen PR rtl-optimization/68212 * gcc.dg/pr68212.c: New test. --- gcc/cfgloopmanip.c | 53 ++++++++++++++++++++++++++++++++++++++++-- gcc/testsuite/gcc.dg/pr68212.c | 13 +++++++++++ 2 files changed, 64 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr68212.c diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c index 727e951..ded0046 100644 --- a/gcc/cfgloopmanip.c +++ b/gcc/cfgloopmanip.c @@ -31,6 +31,7 @@ along with GCC; see the file COPYING3. If not see #include "gimplify-me.h" #include "tree-ssa-loop-manip.h" #include "dumpfile.h" +#include "cfgrtl.h" static void copy_loops_to (class loop **, int, class loop *); @@ -1258,14 +1259,30 @@ duplicate_loop_to_header_edge (class loop *loop, edge e, /* If original loop is executed COUNT_IN times, the unrolled loop will account SCALE_MAIN_DEN times. */ scale_main = count_in.probability_in (scale_main_den); + + /* If we are guessing at the number of iterations and count_in + becomes unrealistically small, reset probability. */ + if (!(count_in.reliable_p () || loop->any_estimate)) + { + profile_count new_count_in = count_in.apply_probability (scale_main); + profile_count preheader_count = loop_preheader_edge (loop)->count (); + if (new_count_in.apply_scale (1, 10) < preheader_count) + scale_main = profile_probability::likely (); + } + scale_act = scale_main * prob_pass_main; } else { + profile_count new_loop_count; profile_count preheader_count = e->count (); - for (i = 0; i < ndupl; i++) - scale_main = scale_main * scale_step[i]; scale_act = preheader_count.probability_in (count_in); + /* Compute final preheader count after peeling NDUPL copies. */ + for (i = 0; i < ndupl; i++) + preheader_count = preheader_count.apply_probability (scale_step[i]); + /* Subtract out exit(s) from peeled copies. */ + new_loop_count = count_in - (e->count () - preheader_count); + scale_main = new_loop_count.probability_in (count_in); } } @@ -1381,6 +1398,38 @@ duplicate_loop_to_header_edge (class loop *loop, edge e, scale_bbs_frequencies (new_bbs, n, scale_act); scale_act = scale_act * scale_step[j]; } + + /* Need to update PROB of exit edge and corresponding COUNT. */ + if (orig && is_latch && (!bitmap_bit_p (wont_exit, j + 1)) + && bbs_to_scale) + { + edge new_exit = new_spec_edges[SE_ORIG]; + profile_count new_count_in = new_exit->src->count; + profile_count preheader_count = loop_preheader_edge (loop)->count (); + edge e; + edge_iterator ei; + + FOR_EACH_EDGE (e, ei, new_exit->src->succs) + if (e != new_exit) + break; + + gcc_assert (e && e != new_exit); + + new_exit->probability = preheader_count.probability_in (new_count_in); + e->probability = new_exit->probability.invert (); + + profile_count new_latch_count + = new_exit->src->count.apply_probability (e->probability); + profile_count old_latch_count = e->dest->count; + + EXECUTE_IF_SET_IN_BITMAP (bbs_to_scale, 0, i, bi) + scale_bbs_frequencies_profile_count (new_bbs + i, 1, + new_latch_count, + old_latch_count); + + if (current_ir_type () != IR_GIMPLE) + update_br_prob_note (e->src); + } } free (new_bbs); free (orig_loops); diff --git a/gcc/testsuite/gcc.dg/pr68212.c b/gcc/testsuite/gcc.dg/pr68212.c new file mode 100644 index 0000000..f3b7c22 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr68212.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-tree-vectorize -funroll-loops --param max-unroll-times=4 -fdump-rtl-alignments -fdump-rtl-loop2_unroll" } */ + +void foo(long int *a, long int *b, long int n) +{ + long int i; + + for (i = 0; i < n; i++) + a[i] = *b; +} + +/* { dg-final { scan-rtl-dump-times "internal loop alignment added" 1 "alignments"} } */ +/* { dg-final { scan-rtl-dump-times "REG_BR_PROB 937042044" 1 "loop2_unroll"} } */