From patchwork Mon Nov 2 09:00:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1391993 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=pzMB+cNb; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPn1t4ZXtz9sVM for ; Mon, 2 Nov 2020 20:00:33 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F406F3858008; Mon, 2 Nov 2020 09:00:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F406F3858008 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1604307630; bh=EY4W3vzEBUIOH3ZiJj9ashefX6vdzI5oHUJfrGU/j24=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=pzMB+cNb7XEsWZRANavm1pticv7aOjJXKbkkAexeGXYV6msZetdx9bUlK2R4QOHBD FdHoRngYecC/vjxQInoO3slc7axqgSu53wWPIhMyrSgbEn6dIl7myg8/khh5DF44r/ CsDF3Lfzyb9mM8XfVKiRePknNJeSyUh5MAXPQGCE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 8E5A53858007 for ; Mon, 2 Nov 2020 09:00:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 8E5A53858007 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 0A28VfWQ021948; Mon, 2 Nov 2020 04:00:11 -0500 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 34j8ymj9er-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 02 Nov 2020 04:00:11 -0500 Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 0A28WGVr025829; Mon, 2 Nov 2020 04:00:11 -0500 Received: from ppma03fra.de.ibm.com (6b.4a.5195.ip4.static.sl-reverse.com [149.81.74.107]) by mx0a-001b2d01.pphosted.com with ESMTP id 34j8ymj9ca-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 02 Nov 2020 04:00:10 -0500 Received: from pps.filterd (ppma03fra.de.ibm.com [127.0.0.1]) by ppma03fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 0A28wAtT015420; Mon, 2 Nov 2020 09:00:08 GMT Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by ppma03fra.de.ibm.com with ESMTP id 34j8rh856d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 02 Nov 2020 09:00:08 +0000 Received: from d06av24.portsmouth.uk.ibm.com (mk.ibm.com [9.149.105.60]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 0A2906c18192564 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 2 Nov 2020 09:00:06 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DF57A42061; Mon, 2 Nov 2020 09:00:05 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0CD6A42072; Mon, 2 Nov 2020 09:00:04 +0000 (GMT) Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.147.16]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 2 Nov 2020 09:00:03 +0000 (GMT) Subject: [PATCH v3] pass: Run cleanup passes before SLP [PR96789] To: richard.sandiford@arm.com References: <1010143e-ea5a-606a-4259-dd167c9c978d@linux.ibm.com> <07d34278-f313-81c7-7799-3b1b1a64964f@linux.ibm.com> Message-ID: Date: Mon, 2 Nov 2020 17:00:02 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.312, 18.0.737 definitions=2020-11-02_03:2020-10-30, 2020-11-02 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 bulkscore=0 phishscore=0 adultscore=0 impostorscore=0 lowpriorityscore=0 mlxscore=0 spamscore=0 mlxlogscore=999 priorityscore=1501 clxscore=1015 suspectscore=1 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2011020063 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Cc: Bill Schmidt , "Kewen.Lin via Gcc-patches" , Segher Boessenkool Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Hi Richard, Thanks a lot for the review and sorry for the late reply. As the review comments, the changes against v2 are: - Fix the introduced tabification issues in passes.def. - Add pending TODOs group (different bitmask numbering). - Move pending_TODOs unsigned to struct function. - Fix some comments as adviced. - Update pending TODOs setting in tree_unroll_loops_completely_1. [... snipped] >> +extern unsigned int pending_TODOs; > > Would it be better to put this in struct function? At least that > would prevent any possibility of the flags being accidentally > carried over between functions. (Obviously that would be a bug, > but there's a danger that it might be a mostly silent bug.) > Good point, updated! >> + /* Once we process our father we will have processed >> + the fathers of our children as well, so avoid doing >> + redundant work and clear fathers we've gathered sofar. >> + But don't clear it for one outermost loop since its >> + propagation doesn't run necessarily all the time. */ >> + bitmap_clear (father_bbs); >> + >> + bitmap_set_bit (father_bbs, loop_father->header->index); > > Is this a way of telling the caller to set “outermost_unrolled”? > If so, with the suggestion above, we could just set pending_TODOs > directly. Yes, updated. > > I had to read the above several times before I (hopefully) understood it. > It seems that it's really overloading the bitmap for two different things > (hence the need to explain why the clearing doesn't happen for the outer > loop). Sorry, the appending part "But don't clear it for one outermost loop since ..." is still not good to get it clear. You are right, if the bit set for the outermost loop (its father), it's not guaranteed that we will do the propagation for it (and its children), so it shouldn't clear father_bbs otherwise some expected propagation probably won't happen. Bootstrapped/regtested on powerpc64le-linux-gnu P8 again. Does it look better? Thanks again! BR, Kewen ----- gcc/ChangeLog: PR tree-optimization/96789 * function.c (allocate_struct_function): Init pending_TODOs. * function.h (struct function): New member unsigned pending_TODOs. * passes.c (class pass_pre_slp_scalar_cleanup): New class. (make_pass_pre_slp_scalar_cleanup): New function. (pass_data_pre_slp_scalar_cleanup): New pass data. * passes.def: (pass_pre_slp_scalar_cleanup): New pass, add pass_fre and pass_dse as its children. * timevar.def (TV_SCALAR_CLEANUP): New timevar. * tree-pass.h (PENDING_TODO_force_next_scalar_cleanup): New pending TODO flag. (make_pass_pre_slp_scalar_cleanup): New declare. * tree-ssa-loop-ivcanon.c (tree_unroll_loops_completely_1): Once any outermost loop gets unrolled, flag cfun pending_TODOs PENDING_TODO_force_next_scalar_cleanup on. gcc/testsuite/ChangeLog: PR tree-optimization/96789 * gcc.dg/tree-ssa/ssa-dse-28.c: Adjust. * gcc.dg/tree-ssa/ssa-dse-29.c: Likewise. * gcc.dg/vect/bb-slp-41.c: Likewise. * gcc.dg/tree-ssa/pr96789.c: New test. diff --git a/gcc/function.c b/gcc/function.c index 2c8fa217f1f..3e92ee9c665 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -4841,6 +4841,8 @@ allocate_struct_function (tree fndecl, bool abstract_p) binding annotations among them. */ cfun->debug_nonbind_markers = lang_hooks.emits_begin_stmt && MAY_HAVE_DEBUG_MARKER_STMTS; + + cfun->pending_TODOs = 0; } /* This is like allocate_struct_function, but pushes a new cfun for FNDECL diff --git a/gcc/function.h b/gcc/function.h index d55cbddd0b5..ffed6520bf9 100644 --- a/gcc/function.h +++ b/gcc/function.h @@ -269,6 +269,13 @@ struct GTY(()) function { /* Value histograms attached to particular statements. */ htab_t GTY((skip)) value_histograms; + /* Different from normal TODO_flags which are handled right at the + begin or the end of one pass execution, the pending_TODOs are + passed down in the pipeline until one of its consumers can + perform the requested action. Consumers should then clear the + flags for the actions that they have taken. */ + unsigned int pending_TODOs; + /* For function.c. */ /* Points to the FUNCTION_DECL of this function. */ diff --git a/gcc/passes.c b/gcc/passes.c index 6ff31ec37d7..eb8efc11c90 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -731,7 +731,54 @@ make_pass_late_compilation (gcc::context *ctxt) return new pass_late_compilation (ctxt); } +/* Pre-SLP scalar cleanup, it has several cleanup passes like FRE, DSE. */ +namespace { + +const pass_data pass_data_pre_slp_scalar_cleanup = +{ + GIMPLE_PASS, /* type */ + "*pre_slp_scalar_cleanup", /* name */ + OPTGROUP_LOOP, /* optinfo_flags */ + TV_SCALAR_CLEANUP, /* tv_id */ + ( PROP_cfg | PROP_ssa ), /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_pre_slp_scalar_cleanup : public gimple_opt_pass +{ +public: + pass_pre_slp_scalar_cleanup (gcc::context *ctxt) + : gimple_opt_pass (pass_data_pre_slp_scalar_cleanup, ctxt) + { + } + + virtual bool + gate (function *fun) + { + return flag_tree_slp_vectorize + && (fun->pending_TODOs & PENDING_TODO_force_next_scalar_cleanup); + } + + virtual unsigned int + execute (function *fun) + { + fun->pending_TODOs &= ~PENDING_TODO_force_next_scalar_cleanup; + return 0; + } + +}; // class pass_pre_slp_scalar_cleanup + +} // anon namespace + +gimple_opt_pass * +make_pass_pre_slp_scalar_cleanup (gcc::context *ctxt) +{ + return new pass_pre_slp_scalar_cleanup (ctxt); +} /* Set the static pass number of pass PASS to ID and record that in the mapping from static pass number to pass. */ diff --git a/gcc/passes.def b/gcc/passes.def index c0098d755bf..1f41f36396e 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -288,11 +288,16 @@ along with GCC; see the file COPYING3. If not see /* pass_vectorize must immediately follow pass_if_conversion. Please do not add any other passes in between. */ NEXT_PASS (pass_vectorize); - PUSH_INSERT_PASSES_WITHIN (pass_vectorize) + PUSH_INSERT_PASSES_WITHIN (pass_vectorize) NEXT_PASS (pass_dce); - POP_INSERT_PASSES () - NEXT_PASS (pass_predcom); + POP_INSERT_PASSES () + NEXT_PASS (pass_predcom); NEXT_PASS (pass_complete_unroll); + NEXT_PASS (pass_pre_slp_scalar_cleanup); + PUSH_INSERT_PASSES_WITHIN (pass_pre_slp_scalar_cleanup) + NEXT_PASS (pass_fre, false /* may_iterate */); + NEXT_PASS (pass_dse); + POP_INSERT_PASSES () NEXT_PASS (pass_slp_vectorize); NEXT_PASS (pass_loop_prefetch); /* Run IVOPTs after the last pass that uses data-reference analysis diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr96789.c b/gcc/testsuite/gcc.dg/tree-ssa/pr96789.c new file mode 100644 index 00000000000..d6139a014d8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr96789.c @@ -0,0 +1,58 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -funroll-loops -ftree-vectorize -fdump-tree-dse-details" } */ + +/* Test if scalar cleanup pass takes effects, mainly check + its secondary pass DSE can remove dead stores on array + tmp. */ + +#include "stdint.h" + +static inline void +foo (int16_t *diff, int i_size, uint8_t *val1, int i_val1, uint8_t *val2, + int i_val2) +{ + for (int y = 0; y < i_size; y++) + { + for (int x = 0; x < i_size; x++) + diff[x + y * i_size] = val1[x] - val2[x]; + val1 += i_val1; + val2 += i_val2; + } +} + +void +bar (int16_t res[16], uint8_t *val1, uint8_t *val2) +{ + int16_t d[16]; + int16_t tmp[16]; + + foo (d, 4, val1, 16, val2, 32); + + for (int i = 0; i < 4; i++) + { + int s03 = d[i * 4 + 0] + d[i * 4 + 3]; + int s12 = d[i * 4 + 1] + d[i * 4 + 2]; + int d03 = d[i * 4 + 0] - d[i * 4 + 3]; + int d12 = d[i * 4 + 1] - d[i * 4 + 2]; + + tmp[0 * 4 + i] = s03 + s12; + tmp[1 * 4 + i] = 2 * d03 + d12; + tmp[2 * 4 + i] = s03 - s12; + tmp[3 * 4 + i] = d03 - 2 * d12; + } + + for (int i = 0; i < 4; i++) + { + int s03 = tmp[i * 4 + 0] + tmp[i * 4 + 3]; + int s12 = tmp[i * 4 + 1] + tmp[i * 4 + 2]; + int d03 = tmp[i * 4 + 0] - tmp[i * 4 + 3]; + int d12 = tmp[i * 4 + 1] - tmp[i * 4 + 2]; + + res[i * 4 + 0] = s03 + s12; + res[i * 4 + 1] = 2 * d03 + d12; + res[i * 4 + 2] = s03 - s12; + res[i * 4 + 3] = d03 - 2 * d12; + } +} + +/* { dg-final { scan-tree-dump {Deleted dead store:.*tmp} "dse3" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-28.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-28.c index d3a1bbc5ce5..b81cabe604a 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-28.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-28.c @@ -17,5 +17,5 @@ int foo (int *p, int b) /* { dg-final { scan-tree-dump-not "Deleted dead store" "dse1"} } */ /* { dg-final { scan-tree-dump-not "Deleted dead store" "dse2"} } */ -/* { dg-final { scan-tree-dump-not "Deleted dead store" "dse3"} } */ +/* { dg-final { scan-tree-dump-not "Deleted dead store" "dse4"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-29.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-29.c index 31529e7caed..f4ef89c590c 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-29.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-29.c @@ -22,5 +22,5 @@ foo(int cond, struct z *s) /* { dg-final { scan-tree-dump-times "Deleted dead store" 3 "dse1"} } */ /* { dg-final { scan-tree-dump-not "Deleted dead store" "dse2"} } */ -/* { dg-final { scan-tree-dump-not "Deleted dead store" "dse3"} } */ +/* { dg-final { scan-tree-dump-not "Deleted dead store" "dse4"} } */ diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-41.c b/gcc/testsuite/gcc.dg/vect/bb-slp-41.c index 7de5ed1f5be..72245115f30 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-41.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-41.c @@ -10,7 +10,10 @@ foo (int *a, int *b) a[i] = b[0] + b[1] + b[i+1] + b[i+2]; } -void bar (int *a, int *b) +/* Disable pre-slp FRE to avoid unexpected SLP on the epilogue + of the 1st loop. */ +void __attribute__((optimize("-fno-tree-fre"))) +bar (int *a, int *b) { int i; for (i = 0; i < (ARR_SIZE - 2); ++i) diff --git a/gcc/timevar.def b/gcc/timevar.def index 7dd1e2685a4..7549a331bc2 100644 --- a/gcc/timevar.def +++ b/gcc/timevar.def @@ -193,6 +193,7 @@ DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH , "tree loop unswitching") DEFTIMEVAR (TV_LOOP_SPLIT , "loop splitting") DEFTIMEVAR (TV_LOOP_JAM , "unroll and jam") DEFTIMEVAR (TV_COMPLETE_UNROLL , "complete unrolling") +DEFTIMEVAR (TV_SCALAR_CLEANUP , "scalar cleanup") DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops") DEFTIMEVAR (TV_TREE_VECTORIZATION , "tree vectorization") DEFTIMEVAR (TV_TREE_SLP_VECTORIZATION, "tree slp vectorization") diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index f01e811917d..2561963df2e 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -310,6 +310,11 @@ protected: #define TODO_verify_all TODO_verify_il +/* To-do flags for pending_TODOs. */ + +/* Tell the next scalar cleanup pass that there is + work for it to do. */ +#define PENDING_TODO_force_next_scalar_cleanup (1 << 1) /* Register pass info. */ @@ -380,6 +385,7 @@ extern gimple_opt_pass *make_pass_simduid_cleanup (gcc::context *ctxt); extern gimple_opt_pass *make_pass_slp_vectorize (gcc::context *ctxt); extern gimple_opt_pass *make_pass_complete_unroll (gcc::context *ctxt); extern gimple_opt_pass *make_pass_complete_unrolli (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_pre_slp_scalar_cleanup (gcc::context *ctxt); extern gimple_opt_pass *make_pass_parallelize_loops (gcc::context *ctxt); extern gimple_opt_pass *make_pass_loop_prefetch (gcc::context *ctxt); extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt); diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c index 298ab215530..9a9076cee67 100644 --- a/gcc/tree-ssa-loop-ivcanon.c +++ b/gcc/tree-ssa-loop-ivcanon.c @@ -1411,6 +1411,13 @@ tree_unroll_loops_completely_1 (bool may_increase_size, bool unroll_outer, bitmap_clear (father_bbs); bitmap_set_bit (father_bbs, loop_father->header->index); } + else if (unroll_outer + && !(cfun->pending_TODOs + & PENDING_TODO_force_next_scalar_cleanup)) + { + /* Trigger scalar cleanup once any outermost loop gets unrolled. */ + cfun->pending_TODOs |= PENDING_TODO_force_next_scalar_cleanup; + } return true; }