From patchwork Thu Apr 29 09:50:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiufu Guo X-Patchwork-Id: 1471626 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=D0KuqlLF; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FW9jz2QWRz9sWY for ; Thu, 29 Apr 2021 19:51:03 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B234F3A3E92A; Thu, 29 Apr 2021 09:51:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B234F3A3E92A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1619689860; bh=TES/dFL6giXJY7DOi5BxmUcHmmhhXk273C/y1x8sZWY=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=D0KuqlLFs3cQkqA2Ym+cUAgdernRS9Pqsk+z6xOtKcYGs/JuxzsDBROYDAU08Ughk FwiAgVQwMGk0Q8zMQkXnAksQA7ageoiQjaWJoI9VNkZh+yRpHilGi7pigd/zXnu7lI kS3jj6vbS8RpvZiylbL5Geidu1nIFyRk+q4NtGTw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 473E73958C0B for ; Thu, 29 Apr 2021 09:50:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 473E73958C0B Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 13T9XvYq188104; Thu, 29 Apr 2021 05:50:56 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 387sj8sv1v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 29 Apr 2021 05:50:56 -0400 Received: from m0098417.ppops.net (m0098417.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 13T9ZdTm193237; Thu, 29 Apr 2021 05:50:55 -0400 Received: from ppma03fra.de.ibm.com (6b.4a.5195.ip4.static.sl-reverse.com [149.81.74.107]) by mx0a-001b2d01.pphosted.com with ESMTP id 387sj8sv1d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 29 Apr 2021 05:50:55 -0400 Received: from pps.filterd (ppma03fra.de.ibm.com [127.0.0.1]) by ppma03fra.de.ibm.com (8.16.0.43/8.16.0.43) with SMTP id 13T9WOO4024899; Thu, 29 Apr 2021 09:50:54 GMT Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by ppma03fra.de.ibm.com with ESMTP id 384ay8sagh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 29 Apr 2021 09:50:53 +0000 Received: from d06av24.portsmouth.uk.ibm.com (mk.ibm.com [9.149.105.60]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 13T9oo8G53477656 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 29 Apr 2021 09:50:50 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E782A42041; Thu, 29 Apr 2021 09:50:49 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D5D744203F; Thu, 29 Apr 2021 09:50:48 +0000 (GMT) Received: from pike.rch.stglabs.ibm.com (unknown [9.5.12.127]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 29 Apr 2021 09:50:48 +0000 (GMT) To: gcc-patches@gcc.gnu.org Subject: [PATCH] split loop for NE condition. Date: Thu, 29 Apr 2021 17:50:48 +0800 Message-Id: <20210429095048.76239-1-guojiufu@linux.ibm.com> X-Mailer: git-send-email 2.17.1 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: _BdM-PLdyfqVTgFXUe7BmWQ-C4JDIy76 X-Proofpoint-ORIG-GUID: _MsHrJR3Q02IajoSFVn7e9Rfuzrseyn2 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.761 definitions=2021-04-29_05:2021-04-28, 2021-04-29 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1011 suspectscore=0 mlxlogscore=999 impostorscore=0 spamscore=0 phishscore=0 bulkscore=0 malwarescore=0 adultscore=0 lowpriorityscore=0 priorityscore=1501 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104060000 definitions=main-2104290067 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jiufu Guo via Gcc-patches From: Jiufu Guo Reply-To: Jiufu Guo Cc: rguenther@suse.de, segher@kernel.crashing.org, jlaw@tachyum.com, wschmidt@linux.ibm.com, dje.gcc@gmail.com Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" When there is the possibility that overflow may happen on the loop index, a few optimizations would not happen. For example code: foo (int *a, int *b, unsigned k, unsigned n) { while (++k != n) a[k] = b[k] + 1; } For this code, if "l > n", overflow may happen. if "l < n" at begining, it could be optimized (e.g. vectorization). We can split the loop into two loops: while (++k > n) a[k] = b[k] + 1; while (l++ < n) a[k] = b[k] + 1; then for the second loop, it could be optimized. This patch is spltting this kind of small loop to achieve better performance. Bootstrap and regtest pass on ppc64le. Is this ok for trunk? Thanks! Jiufu Guo. gcc/ChangeLog: 2021-04-29 Jiufu Guo * params.opt (max-insns-ne-cond-split): New. * tree-ssa-loop-split.c (connect_loop_phis): Add new param. (get_ne_cond_branch): New function. (split_ne_loop): New function. (split_loop_on_ne_cond): New function. (tree_ssa_split_loops): Use split_loop_on_ne_cond. gcc/testsuite/ChangeLog: 2021-04-29 Jiufu Guo * gcc.dg/loop-split1.c: New test. --- gcc/params.opt | 4 + gcc/testsuite/gcc.dg/loop-split1.c | 28 ++++ gcc/tree-ssa-loop-split.c | 219 ++++++++++++++++++++++++++++- 3 files changed, 247 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/loop-split1.c diff --git a/gcc/params.opt b/gcc/params.opt index 2e4cbdd7a71..900b59b5136 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -766,6 +766,10 @@ Min. ratio of insns to prefetches to enable prefetching for a loop with an unkno Common Joined UInteger Var(param_min_loop_cond_split_prob) Init(30) IntegerRange(0, 100) Param Optimization The minimum threshold for probability of semi-invariant condition statement to trigger loop split. +-param=max-insns-ne-cond-split= +Common Joined UInteger Var(param_max_insn_ne_cond_split) Init(64) Param Optimization +The maximum threshold for insnstructions number of a loop with ne condition to split. + -param=min-nondebug-insn-uid= Common Joined UInteger Var(param_min_nondebug_insn_uid) Param The minimum UID to be used for a nondebug insn. diff --git a/gcc/testsuite/gcc.dg/loop-split1.c b/gcc/testsuite/gcc.dg/loop-split1.c new file mode 100644 index 00000000000..4c466aa9f54 --- /dev/null +++ b/gcc/testsuite/gcc.dg/loop-split1.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */ + +void +foo (int *a, int *b, unsigned l, unsigned n) +{ + while (++l != n) + a[l] = b[l] + 1; +} + +void +foo1 (int *a, int *b, unsigned l, unsigned n) +{ + while (l++ != n) + a[l] = b[l] + 1; +} + +unsigned +foo2 (char *a, char *b, unsigned l, unsigned n) +{ + while (++l != n) + if (a[l] != b[l]) + break; + + return l; +} + +/* { dg-final { scan-tree-dump-times "Loop split" 3 "lsplit" } } */ diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c index b80b6a75e62..a6d28078e5e 100644 --- a/gcc/tree-ssa-loop-split.c +++ b/gcc/tree-ssa-loop-split.c @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3. If not see #include "cfghooks.h" #include "gimple-fold.h" #include "gimplify-me.h" +#include "tree-ssa-loop-ivopts.h" /* This file implements two kinds of loop splitting. @@ -233,7 +234,8 @@ easy_exit_values (class loop *loop) this. The loops need to fulfill easy_exit_values(). */ static void -connect_loop_phis (class loop *loop1, class loop *loop2, edge new_e) +connect_loop_phis (class loop *loop1, class loop *loop2, edge new_e, + bool use_prev = false) { basic_block rest = loop_preheader_edge (loop2)->src; gcc_assert (new_e->dest == rest); @@ -248,13 +250,14 @@ connect_loop_phis (class loop *loop1, class loop *loop2, edge new_e) !gsi_end_p (psi_first); gsi_next (&psi_first), gsi_next (&psi_second)) { - tree init, next, new_init; + tree init, next, new_init, prev; use_operand_p op; gphi *phi_first = psi_first.phi (); gphi *phi_second = psi_second.phi (); init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste); next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn); + prev = PHI_RESULT (phi_first); op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde); gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op))); @@ -279,7 +282,7 @@ connect_loop_phis (class loop *loop1, class loop *loop2, edge new_e) gphi * newphi = create_phi_node (new_init, rest); add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION); - add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION); + add_phi_arg (newphi, use_prev ? prev : next, new_e, UNKNOWN_LOCATION); SET_USE (op, new_init); } } @@ -1599,6 +1602,213 @@ split_loop_on_cond (struct loop *loop) return do_split; } +/* Check if the LOOP exit branch likes "if (idx != bound)". + if INV is not NULL and the branch is "if (bound != idx)", set *INV to true. + return the branch edge which exit loop. */ + +static edge +get_ne_cond_branch (struct loop *loop, bool *inv) +{ + int i; + edge e; + + auto_vec edges = get_loop_exit_edges (loop); + FOR_EACH_VEC_ELT (edges, i, e) + { + basic_block bb = e->src; + + /* Check gcond. */ + gimple *last = last_stmt (bb); + if (!last || gimple_code (last) != GIMPLE_COND) + continue; + gcond *cond = as_a (last); + enum tree_code code = gimple_cond_code (cond); + if (code != NE_EXPR) + continue; + + /* Make sure idx and bound. */ + tree idx = gimple_cond_lhs (cond); + tree bnd = gimple_cond_rhs (cond); + if (expr_invariant_in_loop_p (loop, idx)) + { + std::swap (idx, bnd); + if (inv) + *inv = true; + } + else if (!expr_invariant_in_loop_p (loop, bnd)) + continue; + + /* Extract conversion. */ + if (TREE_CODE (idx) == SSA_NAME) + { + gimple *stmt = SSA_NAME_DEF_STMT (idx); + if (is_gimple_assign (stmt) + && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt)) + && flow_bb_inside_loop_p (loop, gimple_bb (stmt))) + idx = gimple_assign_rhs1 (stmt); + } + + /* Make sure idx is iv. */ + class loop *useloop = loop_containing_stmt (cond); + affine_iv iv; + if (!simple_iv (loop, useloop, idx, &iv, false)) + continue; + + /* No need to split loop, if base is know value. + Or check range info. */ + if (TREE_CODE (iv.base) == INTEGER_CST) + continue; + + /* There is type conversion on idx(or rhs of idx's def). + And there is converting shorter to longer type. */ + tree type = TREE_TYPE (idx); + if (!INTEGRAL_TYPE_P (type) || TREE_CODE (idx) != SSA_NAME + || !TYPE_UNSIGNED (type) + || TYPE_PRECISION (type) == TYPE_PRECISION (sizetype)) + continue; + + /* Check loop is simple to split. */ + gcc_assert (bb != loop->latch); + + if (single_pred_p (loop->latch) + && single_pred_edge (loop->latch)->src == bb + && empty_block_p (loop->latch)) + return e; + + /* Splitting is cheap for idx increase header. */ + if (bb == loop->header) + { + if (get_virtual_phi (loop->header)) + continue; + + /* In loop header: i++ or ++i. */ + gimple_stmt_iterator gsi = gsi_start_bb (bb); + if (gsi_end_p (gsi)) + return e; + + gimple *s1 = gsi_stmt (gsi); + if (!(is_gimple_assign (s1) + && (idx == gimple_assign_lhs (s1) + || idx == gimple_assign_rhs1 (s1)))) + continue; + + gsi_next (&gsi); + if (!gsi_end_p (gsi) && gsi_stmt (gsi) == cond) + return e; + } + } + + return NULL; +} + +/* Split the LOOP with NE_EXPR into two loops with GT_EXPR and LT_EXPR. */ + +static bool +split_ne_loop (struct loop *loop, edge cond_e) +{ + initialize_original_copy_tables (); + + struct loop *loop2 = loop_version (loop, boolean_true_node, NULL, + profile_probability::always (), + profile_probability::never (), + profile_probability::always (), + profile_probability::always (), true); + + gcc_assert (loop2); + update_ssa (TODO_update_ssa); + + free_original_copy_tables (); + + /* Change if (i != n) to LOOP1:if (i > n) and LOOP2:if (i < n) */ + bool inv = false; + edge dup_cond = get_ne_cond_branch (loop2, &inv); + enum tree_code up_code = inv ? LT_EXPR : GT_EXPR; + enum tree_code down_code = inv ? GT_EXPR : LT_EXPR; + + gcond *gc = as_a (last_stmt (cond_e->src)); + gimple_cond_set_code (gc, up_code); + + gcond *dup_gc = as_a (last_stmt (dup_cond->src)); + gimple_cond_set_code (dup_gc, down_code); + + /* Link the exit cond edge to new loop. */ + gcond *break_cond = as_a (gimple_copy (gc)); + edge pred_e = single_pred_edge (loop->latch); + gcc_assert (pred_e); + bool simple_loop = pred_e->src == cond_e->src && empty_block_p (loop->latch); + if (simple_loop) + gimple_cond_set_code (break_cond, down_code); + else + gimple_cond_make_true (break_cond); + + basic_block break_bb = split_edge (cond_e); + gimple_stmt_iterator gsi = gsi_last_bb (break_bb); + gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT); + + edge to_exit = single_succ_edge (break_bb); + edge to_new_loop = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0); + to_new_loop->flags |= EDGE_TRUE_VALUE; + to_exit->flags |= EDGE_FALSE_VALUE; + to_exit->flags &= ~EDGE_FALLTHRU; + to_exit->probability = cond_e->probability; + to_new_loop->probability = to_exit->probability.invert (); + + update_ssa (TODO_update_ssa); + + connect_loop_phis (loop, loop2, to_new_loop, !simple_loop); + + rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop); + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, ";; Loop split.\n"); + + return true; +} + +/* Checks if LOOP contains a suitable NE_EXPR conditional block to split. +L_H: + if (i!=N) + S; + i++; + goto L_H; + +The "i!=N" is like "i>N || iN) + S; + i++; + goto L_H; +L1_H: + if (inum_nodes; i++) + num += estimate_num_insns_seq (bb_seq (bbs[i]), &eni_size_weights); + free (bbs); + + if (num > param_max_insn_ne_cond_split) + return false; + + edge branch_edge = get_ne_cond_branch (loop, NULL); + if (branch_edge && split_ne_loop (loop, branch_edge)) + return true; + + return false; +} + /* Main entry point. Perform loop splitting on all suitable loops. */ static unsigned int @@ -1628,7 +1838,8 @@ tree_ssa_split_loops (void) if (optimize_loop_for_size_p (loop)) continue; - if (split_loop (loop) || split_loop_on_cond (loop)) + if (split_loop (loop) || split_loop_on_cond (loop) + || split_loop_on_ne_cond (loop)) { /* Mark our containing loop as having had some split inner loops. */ loop_outer (loop)->aux = loop;