From patchwork Fri Jan 22 20:21:10 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 571790 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 477481402DD for ; Sat, 23 Jan 2016 07:21:31 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=a705Y6Fd; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:message-id:date:mime-version:content-type; q=dns; s= default; b=mJZbVMdwJnCyZaUkBL7lewmazdmJ9Jp80OYsRXu/W9BnknjA7puL0 n+Q6GGGMXVmM/HVWvhdSXFHCn/1LONarYaLBFRHFMM275LJkZHKIsbbUaVw0Dgpp LQHpLZ8A+QIPLQ/ZWgaPHOLgr3O0si1egXmBC+NoPWi3RR84Mq0DTg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:message-id:date:mime-version:content-type; s= default; bh=LllGgNidGfVlcYYIGtAUtMkSubI=; b=a705Y6FdXCT7LsEYfro5 l3ZeLTnn3xP03OtoFHQiGIHVQFoieo9KeFDW6Y/lMydJx8U9iTmL4oAsPeXByUIw kqGu0Oflefn7MJQ39UZV3ib90Ig0eKop7LoM5if6e5PfeeXBd2D6Ln7qXJBRSwdt 0FUBvb2sDNXZsTWCo83QTIE= Received: (qmail 117978 invoked by alias); 22 Jan 2016 20:21:15 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 117962 invoked by uid 89); 22 Jan 2016 20:21:14 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=cached, derive, bitmap_bit_p, bitmap_set_bit X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Fri, 22 Jan 2016 20:21:12 +0000 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by mx1.redhat.com (Postfix) with ESMTPS id 53BD78F013 for ; Fri, 22 Jan 2016 20:21:11 +0000 (UTC) Received: from slagheap.utah.redhat.com (ovpn-113-74.phx2.redhat.com [10.3.113.74]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u0MKLAra032559 for ; Fri, 22 Jan 2016 15:21:11 -0500 From: Jeff Law To: gcc-patches Subject: [PATCH][PR tree-optimization/69347] Speedup DOM slightly Message-ID: <56A28F36.9000807@redhat.com> Date: Fri, 22 Jan 2016 13:21:10 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 X-IsSubscribed: yes So as noted in BZ69347 we have regressed a bit in the amount of time spent in DOM for included testcase. My previous patch helped a little, but there's still some low hanging fruit. One of the things that was added during this cycle was the ability to do a bit of secondary equivalence discovery. ie, we discover an equivalence due to an edge traversal -- we can, in limited circumstances, back propagate the known value backwards and discover additional equivalences. That code turns out to be surprisingly expensive. After a fair amount poking with perf & gprof a few things stick out. While dominance testing is relatively cheap, when done often enough it gets expensive. We can do a bit better. Essentially we're walking over a set of immediate uses and seeing which of those uses dominate a given block. Instead we can take the given block and compute its dominators into a bitmap. We then check if the immediate use blocks are in the bitmap. That turns out to be considerably faster, I believe that's because the immediate uses are typically clustered, thus there's a single element in the bitmap. Testing is two memory loads and a memory bit test. Contrast that to 9 memory loads for dominated_by_p if I'm counting correctly. That cuts the amount of time spent in DOM in half for the 69347 testcase. The second change is in cprop_into_successor_phis and was found as I was wandering the perf report. Essentially SSA_NAME_VALUE will always be null, an SSA_NAME or a constant (min_invariant). Thus the test that it's an SSA_NAME || is_gimple_min_invariant is totally useless. Bootstrapped and regression tested on x86. Also verified that for my bucket of .i files, there was no change in the resulting code. Given the major issues are resolved, I'm moving this to a P4. There's still a compile-time regression that's not accounted for, but it's relatively small and I suspect it's related to the computed goto used for a dispatch table -- which is certainly supported, but it's not the typical code pushed through GCC. The big regression left is intentional as we've intentionally upped the parameter for when to avoid gcse. Installed on the trunk. Jeff commit c3b7df35046db818c4c4b7d808b5a3471a0ee9b9 Author: Jeff Law Date: Fri Jan 22 13:17:54 2016 -0700 PR middle-end/69347 * tree-ssa-dom.c (back_propagate_equivalences): Factored out of record_temporary_equivalences. Rewritten to avoid unnecessary calls into dominated_by_p. (cprop_into_successor_phis): Avoid unnecessary tests. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 85cde94..73df84f 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,11 @@ +2016-01-21 Jeff Law + + PR middle-end/69347 + * tree-ssa-dom.c (back_propagate_equivalences): Factored out of + record_temporary_equivalences. Rewritten to avoid unnecessary calls + into dominated_by_p. + (cprop_into_successor_phis): Avoid unnecessary tests. + 2016-01-22 Richard Henderson PR target/69416 diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c index 84c9a6a..b690d92 100644 --- a/gcc/tree-ssa-dom.c +++ b/gcc/tree-ssa-dom.c @@ -819,6 +819,74 @@ dom_valueize (tree t) return t; } +/* We have just found an equivalence for LHS on an edge E. + Look backwards to other uses of LHS and see if we can derive + additional equivalences that are valid on edge E. */ +static void +back_propagate_equivalences (tree lhs, edge e, + class const_and_copies *const_and_copies) +{ + use_operand_p use_p; + imm_use_iterator iter; + bitmap domby = NULL; + basic_block dest = e->dest; + + /* Iterate over the uses of LHS to see if any dominate E->dest. + If so, they may create useful equivalences too. + + ??? If the code gets re-organized to a worklist to catch more + indirect opportunities and it is made to handle PHIs then this + should only consider use_stmts in basic-blocks we have already visited. */ + FOR_EACH_IMM_USE_FAST (use_p, iter, lhs) + { + gimple *use_stmt = USE_STMT (use_p); + + /* Often the use is in DEST, which we trivially know we can't use. + This is cheaper than the dominator set tests below. */ + if (dest == gimple_bb (use_stmt)) + continue; + + /* Filter out statements that can never produce a useful + equivalence. */ + tree lhs2 = gimple_get_lhs (use_stmt); + if (!lhs2 || TREE_CODE (lhs2) != SSA_NAME) + continue; + + /* Profiling has shown the domination tests here can be fairly + expensive. We get significant improvements by building the + set of blocks that dominate BB. We can then just test + for set membership below. + + We also initialize the set lazily since often the only uses + are going to be in the same block as DEST. */ + if (!domby) + { + domby = BITMAP_ALLOC (NULL); + basic_block bb = get_immediate_dominator (CDI_DOMINATORS, dest); + while (bb) + { + bitmap_set_bit (domby, bb->index); + bb = get_immediate_dominator (CDI_DOMINATORS, bb); + } + } + + /* This tests if USE_STMT does not dominate DEST. */ + if (!bitmap_bit_p (domby, gimple_bb (use_stmt)->index)) + continue; + + /* At this point USE_STMT dominates DEST and may result in a + useful equivalence. Try to simplify its RHS to a constant + or SSA_NAME. */ + tree res = gimple_fold_stmt_to_constant_1 (use_stmt, dom_valueize, + no_follow_ssa_edges); + if (res && (TREE_CODE (res) == SSA_NAME || is_gimple_min_invariant (res))) + record_equality (lhs2, res, const_and_copies); + } + + if (domby) + BITMAP_FREE (domby); +} + /* Record into CONST_AND_COPIES and AVAIL_EXPRS_STACK any equivalences implied by traversing edge E (which are cached in E->aux). @@ -836,19 +904,23 @@ record_temporary_equivalences (edge e, if (edge_info) { cond_equivalence *eq; + /* If we have 0 = COND or 1 = COND equivalences, record them + into our expression hash tables. */ + for (i = 0; edge_info->cond_equivalences.iterate (i, &eq); ++i) + record_cond (eq, avail_exprs_stack); + tree lhs = edge_info->lhs; - tree rhs = edge_info->rhs; + if (!lhs || TREE_CODE (lhs) != SSA_NAME) + return; - /* If we have a simple NAME = VALUE equivalence, record it. */ - if (lhs) - record_equality (lhs, rhs, const_and_copies); + /* Record the simple NAME = VALUE equivalence. */ + tree rhs = edge_info->rhs; + record_equality (lhs, rhs, const_and_copies); /* If LHS is an SSA_NAME and RHS is a constant integer and LHS was set via a widening type conversion, then we may be able to record additional equivalences. */ - if (lhs - && TREE_CODE (lhs) == SSA_NAME - && TREE_CODE (rhs) == INTEGER_CST) + if (TREE_CODE (rhs) == INTEGER_CST) { gimple *defstmt = SSA_NAME_DEF_STMT (lhs); @@ -875,45 +947,10 @@ record_temporary_equivalences (edge e, } } - /* If LHS is an SSA_NAME with a new equivalency then try if - stmts with uses of that LHS that dominate the edge destination - simplify and allow further equivalences to be recorded. */ - if (lhs && TREE_CODE (lhs) == SSA_NAME) - { - use_operand_p use_p; - imm_use_iterator iter; - FOR_EACH_IMM_USE_FAST (use_p, iter, lhs) - { - gimple *use_stmt = USE_STMT (use_p); - - /* Only bother to record more equivalences for lhs that - can be directly used by e->dest. - ??? If the code gets re-organized to a worklist to - catch more indirect opportunities and it is made to - handle PHIs then this should only consider use_stmts - in basic-blocks we have already visited. */ - if (e->dest == gimple_bb (use_stmt) - || !dominated_by_p (CDI_DOMINATORS, - e->dest, gimple_bb (use_stmt))) - continue; - tree lhs2 = gimple_get_lhs (use_stmt); - if (lhs2 && TREE_CODE (lhs2) == SSA_NAME) - { - tree res - = gimple_fold_stmt_to_constant_1 (use_stmt, dom_valueize, - no_follow_ssa_edges); - if (res - && (TREE_CODE (res) == SSA_NAME - || is_gimple_min_invariant (res))) - record_equality (lhs2, res, const_and_copies); - } - } - } - - /* If we have 0 = COND or 1 = COND equivalences, record them - into our expression hash tables. */ - for (i = 0; edge_info->cond_equivalences.iterate (i, &eq); ++i) - record_cond (eq, avail_exprs_stack); + /* Any equivalence found for LHS may result in additional + equivalences for other uses of LHS that we have already + processed. */ + back_propagate_equivalences (lhs, e, const_and_copies); } } @@ -1321,8 +1358,6 @@ cprop_into_successor_phis (basic_block bb, new_val = SSA_NAME_VALUE (orig_val); if (new_val && new_val != orig_val - && (TREE_CODE (new_val) == SSA_NAME - || is_gimple_min_invariant (new_val)) && may_propagate_copy (orig_val, new_val)) propagate_value (orig_p, new_val); }