From patchwork Fri May 22 15:42:12 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Lawrence X-Patchwork-Id: 475692 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 9391F140E33 for ; Sat, 23 May 2015 01:42:26 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=MTBmaUB4; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; q=dns; s=default; b=n285ryNNoDVmfJwY4kZo5GgmFkGgj35YXiP8XmbjoJ7 GAqi2ioDzFkhjh4O8n0LQGK+iixNd20jzFLKJJTFfUqB+Siuy5wqbmIayIrs2uG3 oTLrAmnswpUyMDNtjMSQCrUBaVbitpE9YoMu/6/X9U61GFvczxs1cLy8WrLCN9RM = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; s=default; bh=xJZqp5gTDRo67Ykinxp36qqYT9I=; b=MTBmaUB4YbjAXWV16 UojhOhm7j+5RVuI8GcEN6YvRI+yuV+7CdVgrfmEpT0k7C+3cShu3OXHLAHC8jKjB l9uZII0leKbDR7LLG116dYuckwQhFtzYERA1L6NqJmBp/FxwTRbbLmUbEp6jJtiN xOvAYxR9knQnRmNu41Qz5vC+8Y= Received: (qmail 75080 invoked by alias); 22 May 2015 15:42:19 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 75071 invoked by uid 89); 22 May 2015 15:42:18 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.4 required=5.0 tests=AWL, BAYES_50, SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 22 May 2015 15:42:16 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by uk-mta-2.uk.mimecast.lan; Fri, 22 May 2015 16:42:13 +0100 Received: from [10.2.207.65] ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Fri, 22 May 2015 16:42:13 +0100 Message-ID: <555F4E54.6080600@arm.com> Date: Fri, 22 May 2015 16:42:12 +0100 From: Alan Lawrence User-Agent: Thunderbird 2.0.0.24 (X11/20101213) MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" CC: Richard Biener Subject: [PATCH/RFC] Make loop-header-copying more aggressive, rerun before tree-if-conversion X-MC-Unique: H4Skhw-oQ9qD5SYOBdzjQg-1 X-IsSubscribed: yes This example which I wrote to test ifconversion, currently fails to if-convert or vectorize: int foo () { for (int i = 0; i < 32 ; i++) { int m = (a[i] & i) ? 5 : 4; b[i] = a[i] * m; } } ...because jump-threading in dom1 rearranged the loop into a form that neither if-conversion nor vectorization would attempt. Discussion at https://gcc.gnu.org/ml/gcc/2015-04/msg00343.html lead to the suggestion that I should rerun loop-header copying (an earlier attempt to fix ifconversion, https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01743.html, still did not enable vectorization.) This patch does so (and makes slightly less conservative, to tackle the example above). I found I had to make this a separate pass, so that the phi nodes were cleaned up at the end of the pass before running tree_if_conversion. Also at this stage in the compiler (inside loop opts) it was not possible to run loop_optimizer_init+finalize, or other loop_optimizer data structures needed later would be deleted; hence, I have two nearly-but-not-quite-identical passes, the new "ch_vect" avoiding the init/finalize. I tried to tackle this with some C++ subclassing, which removes the duplication, but the result feels a little ugly; suggestions for any neater approach welcome. This patch causes failure of the scan-tree-dump of dom2 in gcc.dg/ssa/pr21417.c. This looks for jump-threading to perform an optimization, but no longer finds the expected line in the log - as the loop-header-copying phase has already done an equivalent transformation *before* dom2. The final CFG is thus in the desired form, but I'm not sure how to determine this (scanning the CFG itself is very difficult, well beyond what we can do with regex, requiring looking at multiple lines and basic blocks). Can anyone advise? [The test issue can be worked around by preserving the old do_while_p logic for the first header-copying pass, and using the new logic only for the second, but this is more awkward inside the compiler, which feels wrong.] Besides the new vect-ifcvt-11.c, the testsuite actually has a couple of other examples where this patch enables (undesired!) vectorization. I've dealt with these, but for the record: * gcc.dg/vect/slp-perm-7.c: the initialization loop in main, contained a check that input[i] < 200; this was already optimized out (because input[i] was set to i%256, where iFAIL of a scan-tree-dump test, which I'm unsure how to fix, but no other regressions. gcc/ChangeLog: * tree-pass.h (make_pass_ch_vect): New. * passes.def: Add pass_ch_vect just before pass_if_conversion. * tree-ssa-loop-ch.c (do_while_loop_p): For single-exit loops, look for blocks with exit edge and code after them. (pass_data_ch_vect, class pass_ch_vect, make_pass_ch_vect): New. (class pass_ch): Extend pass_ch_vect. (pass_ch::execute): Move all but loop_optimizer_init/finalize to... (pass_ch_vect::execute): ...here. * tree-ssa-loop.c (pass_tree_loop_init::execute): Add flags LOOPS_HAVE_PREHEADERS and LOOPS_HAVE_SIMPLE_LATCHES. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-perm-7.c (zero): New. (main): Test zero rather than input[i], to avoid vectorization. * gcc.dg/vect/vect-strided-a-u16-i4.c (main1): Narrow scope of x,y,z,w. of unsigned * gcc.dg/vect/vect-ifcvt-11.c: New. diff --git a/gcc/passes.def b/gcc/passes.def index 1d598b2..87cfe2a 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -247,6 +247,7 @@ along with GCC; see the file COPYING3. If not see PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops) NEXT_PASS (pass_expand_omp_ssa); POP_INSERT_PASSES () + NEXT_PASS (pass_ch_vect); NEXT_PASS (pass_if_conversion); /* pass_vectorize must immediately follow pass_if_conversion. Please do not add any other passes in between. */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-7.c b/gcc/testsuite/gcc.dg/vect/slp-perm-7.c index 6291096..e0ec55a 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-perm-7.c +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-7.c @@ -20,6 +20,8 @@ #define N 16 +volatile int zero = 0; + /* SLP with load permutation and loop-based vectorization. */ void foo (int *__restrict__ pInput, int *__restrict__ pOutput, int *__restrict__ pInput2, int *__restrict__ pOutput2) @@ -57,7 +59,7 @@ int main (int argc, const char* argv[]) input2[i] = i%256; output[i] = 0; output2[i] = 0; - if (input[i] > 200) + if (zero) /* Avoid vectorization. */ abort (); } diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c new file mode 100644 index 0000000..53b139d --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c @@ -0,0 +1,37 @@ +/* { dg-require-effective-target vect_condition } */ +/* { dg-require-effective-target vect_int } */ + +#include "tree-vect.h" + +#define N 16 + +extern void abort(void); + +int A[N] = {36, 39, 42, 45, 43, 32, 21, 12, 23, 34, 45, 56, 67, 78, 81, 11}; +int B[N] = {144,195,210,225,172,128,105,60, 92, 136,225,280,268,390,324,55}; + +__attribute__((noinline)) +void foo () +{ + for (int i = 0; i < N; i++) + { + int m = (A[i] & i) ? 5 : 4; + A[i] = A[i] * m; + } +} + +int main () +{ + + check_vect (); + foo (); + /* check results: */ + for (int i = 0; i < N; i++) + if (A[i] != B[i]) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c index 68114a6..0656fb7 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c @@ -21,7 +21,6 @@ main1 () s *ptr = arr; s res[N]; int i; - unsigned short x, y, z, w; for (i = 0; i < N; i++) { @@ -35,6 +34,7 @@ main1 () for (i = 0; i < N; i++) { + unsigned short x, y, z, w; x = ptr->b - ptr->a; y = ptr->d - ptr->c; res[i].c = x + y; diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index bc8763d..cb0f986 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -379,6 +379,7 @@ extern gimple_opt_pass *make_pass_loop_prefetch (gcc::context *ctxt); extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt); extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt); extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt); extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt); extern gimple_opt_pass *make_pass_phi_only_cprop (gcc::context *ctxt); extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt); diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c index c6441b8..0fb7155 100644 --- a/gcc/tree-ssa-loop-ch.c +++ b/gcc/tree-ssa-loop-ch.c @@ -140,6 +140,29 @@ do_while_loop_p (struct loop *loop) && gimple_code (stmt) == GIMPLE_COND) return false; + /* Leave anything with multiple exits, given it satisfies the above. */ + if (!single_exit (loop)) + return true; + + /* If any block in the loop has an exit edge, and code after it, it is + not a do-while loop. */ + basic_block *body = get_loop_body (loop); + for (unsigned i = 0; i < loop->num_nodes; i++) + { + bool block_has_exit = false, block_precedes_code = false; + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, body[i]->succs) + if (loop_exit_edge_p (loop, e)) + block_has_exit = true; + else if (e->dest != loop->header + && e->dest != loop->latch) + block_precedes_code = true; + if (block_has_exit && block_precedes_code) + return false; + } + free (body); + return true; } @@ -162,21 +185,48 @@ const pass_data pass_data_ch = 0, /* todo_flags_finish */ }; -class pass_ch : public gimple_opt_pass +const pass_data pass_data_ch_vect = +{ + GIMPLE_PASS, /* type */ + "ch_vect", /* name */ + OPTGROUP_LOOP, /* optinfo_flags */ + TV_TREE_CH, /* tv_id */ + ( PROP_cfg | PROP_ssa ), /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_ch_vect : public gimple_opt_pass { public: - pass_ch (gcc::context *ctxt) - : gimple_opt_pass (pass_data_ch, ctxt) + pass_ch_vect (gcc::context *ctxt) + : gimple_opt_pass (pass_data_ch_vect, ctxt) {} /* opt_pass methods: */ virtual bool gate (function *) { return flag_tree_ch != 0; } virtual unsigned int execute (function *); +protected: + pass_ch_vect (const pass_data data, gcc::context *ctxt) + : gimple_opt_pass (data, ctxt) + {} +}; // class pass_ch_vect +/* Adds a call to loop_optimizer_init before it executes, + and loop_optimizer_finalize after. */ +class pass_ch : public pass_ch_vect +{ +public: + pass_ch (gcc::context *ctxt) : pass_ch_vect (pass_data_ch, ctxt) + {} + + virtual unsigned int execute (function *); }; // class pass_ch unsigned int -pass_ch::execute (function *fun) +pass_ch_vect::execute (function *fun) { struct loop *loop; basic_block header; @@ -186,13 +236,8 @@ pass_ch::execute (function *fun) unsigned bbs_size; bool changed = false; - loop_optimizer_init (LOOPS_HAVE_PREHEADERS - | LOOPS_HAVE_SIMPLE_LATCHES); if (number_of_loops (fun) <= 1) - { - loop_optimizer_finalize (); return 0; - } bbs = XNEWVEC (basic_block, n_basic_blocks_for_fn (fun)); copied_bbs = XNEWVEC (basic_block, n_basic_blocks_for_fn (fun)); @@ -296,17 +341,35 @@ pass_ch::execute (function *fun) changed = true; } - update_ssa (TODO_update_ssa); + if (changed) + update_ssa (TODO_update_ssa); free (bbs); free (copied_bbs); - loop_optimizer_finalize (); return changed ? TODO_cleanup_cfg : 0; } +unsigned int +pass_ch::execute (function *fun) +{ + loop_optimizer_init (LOOPS_HAVE_PREHEADERS + | LOOPS_HAVE_SIMPLE_LATCHES); + + unsigned int res = pass_ch_vect::execute (fun); + + loop_optimizer_finalize (); + return res; +} + } // anon namespace gimple_opt_pass * +make_pass_ch_vect (gcc::context *ctxt) +{ + return new pass_ch_vect (ctxt); +} + +gimple_opt_pass * make_pass_ch (gcc::context *ctxt) { return new pass_ch (ctxt); diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c index ccb8f97..a5029bb 100644 --- a/gcc/tree-ssa-loop.c +++ b/gcc/tree-ssa-loop.c @@ -234,7 +234,9 @@ unsigned int pass_tree_loop_init::execute (function *fun ATTRIBUTE_UNUSED) { loop_optimizer_init (LOOPS_NORMAL - | LOOPS_HAVE_RECORDED_EXITS); + | LOOPS_HAVE_RECORDED_EXITS + | LOOPS_HAVE_PREHEADERS + | LOOPS_HAVE_SIMPLE_LATCHES); rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa); /* We might discover new loops, e.g. when turning irreducible