From patchwork Tue Jul 19 21:42:35 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 105547 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id EBA92B6F64 for ; Wed, 20 Jul 2011 07:43:18 +1000 (EST) Received: (qmail 7816 invoked by alias); 19 Jul 2011 21:43:17 -0000 Received: (qmail 7798 invoked by uid 22791); 19 Jul 2011 21:43:16 -0000 X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from e32.co.us.ibm.com (HELO e32.co.us.ibm.com) (32.97.110.150) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 19 Jul 2011 21:42:44 +0000 Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com [9.17.195.227]) by e32.co.us.ibm.com (8.14.4/8.13.1) with ESMTP id p6JKWmL9026647; Tue, 19 Jul 2011 14:32:48 -0600 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v9.1) with ESMTP id p6JLgbgx147234; Tue, 19 Jul 2011 15:42:37 -0600 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p6JLgav8006611; Tue, 19 Jul 2011 15:42:36 -0600 Received: from [9.10.86.161] (liebl-pc.rchland.ibm.com [9.10.86.161] (may be forged)) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id p6JLgaRA006595; Tue, 19 Jul 2011 15:42:36 -0600 Subject: Re: [PATCH] Address lowering [1/3] Main patch From: "William J. Schmidt" To: rguenth@gcc.gnu.org Cc: gcc-patches@gcc.gnu.org, bergner@vnet.ibm.com In-Reply-To: References: <1309444782.26980.52.camel@oc2474580526.ibm.com> <1309874343.5402.34.camel@oc2474580526.ibm.com> <1309962495.3758.29.camel@oc2474580526.ibm.com> Date: Tue, 19 Jul 2011 16:42:35 -0500 Message-ID: <1311111755.4429.16.camel@oc2474580526.ibm.com> Mime-Version: 1.0 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org I've been distracted by other things, but got back to this today... On Wed, 2011-07-06 at 16:58 +0200, Richard Guenther wrote: > Ah, so we still have the ARRAY_REFs here. Yeah, well - then the > issue boils down to get_inner_reference canonicalizing the offset > according to what fold-const.c implements, and we simply emit > the offset in that form (if you look at the normal_inner_ref: case > in expand_expr_real*). Thus with the above form at expansion > time the matching would be applied there (or during our RTL > address-generation in fwprop.c ...). I put together a simple patch to match the specific pattern from the original problem report in expand_expr_real_1. This returns the code generation for that specific pattern to what it was a few years ago, but has little effect on SPEC cpu2000 results. A small handful of benchmarks are improved, but most results are in the noise range. So solving the original problem alone doesn't account for very much of the improvement we're getting with the full address lowering. Note that this only converted the basic pattern; I did not attempt to do the extra "zero-offset mem-ref" optimization, which I would have had to rewrite for an RTL phase. I don't see much point in pursuing that, given these results. > > Another idea is of course to lower all handled_component_p > memory accesses - something I did on the old mem-ref branch > and I believe I also suggested at some point. Without such lowering > all the address-generation isn't exposed to the GIMPLE optimizers. > > The simplest way of doing the lowering is to do sth like > > base = get_inner_reference (rhs, ..., &bitpos, &offset, ...); > base = fold_build2 (POINTER_PLUS_EXPR, ..., > base, offset); > base = force_gimple_operand (base, ... is_gimple_mem_ref_addr); > tem = fold_build2 (MEM_REF, TREE_TYPE (rhs), > base, > build_int_cst (get_alias_ptr_type (rhs), bitpos)); > > at some point. I didn't end up doing that because of the loss of > alias information. My next experiment will be to try something simple along these lines. I'll look at the old mem-ref branch to see what you were doing there. It will be interesting to see whether the gains generally outweigh the small loss of TBAA information, as we saw using the affine-combination approach. As an FYI, this is the patch I used for my experiment. Let me know if you see anything horrifically wrong that might have impacted the results. I didn't make any attempt to figure out whether the pattern rewrite is appropriate for the target machine, since this was just an experiment. Thanks, Bill Index: gcc/expr.c =================================================================== --- gcc/expr.c (revision 176422) +++ gcc/expr.c (working copy) @@ -7152,7 +7152,64 @@ expand_constructor (tree exp, rtx target, enum exp return target; } +/* Look for specific patterns in the inner reference BASE and its + OFFSET expression. If found, return a single tree of type EXPTYPE + that reassociates the addressability to combine constants. */ +static tree +reassociate_base_and_offset (tree base, tree offset, tree exptype) +{ + tree mult_op0, mult_op1, t1, t2; + tree mult_expr, add_expr, mem_ref; + tree offset_type; + HOST_WIDE_INT c1, c2, c3, c; + /* Currently we look for the following pattern, where Tj is an + arbitrary tree, and Ck is an integer constant: + + base: MEM_REF (T1, C1) + offset: MULT_EXPR (PLUS_EXPR (T2, C2), C3) + + This is converted to: + + MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)), + C1 + C2*C3) + */ + if (!base + || !offset + || TREE_CODE (base) != MEM_REF + || TREE_CODE (TREE_OPERAND (base, 1)) != INTEGER_CST + || TREE_CODE (offset) != MULT_EXPR) + return NULL_TREE; + + mult_op0 = TREE_OPERAND (offset, 0); + mult_op1 = TREE_OPERAND (offset, 1); + + if (TREE_CODE (mult_op0) != PLUS_EXPR + || TREE_CODE (mult_op1) != INTEGER_CST + || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST) + return NULL_TREE; + + t1 = TREE_OPERAND (base, 0); + t2 = TREE_OPERAND (mult_op0, 0); + c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1)); + c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1)); + c3 = TREE_INT_CST_LOW (mult_op1); + c = c1 + c2 * c3; + + offset_type = TREE_TYPE (TREE_OPERAND (base, 1)); + + mult_expr = build2 (MULT_EXPR, sizetype, t2, + build_int_cst (offset_type, c3)); + + add_expr = build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr); + + mem_ref = build2 (MEM_REF, exptype, add_expr, + build_int_cst (offset_type, c)); + + return mem_ref; +} + + /* expand_expr: generate code for computing expression EXP. An rtx for the computed value is returned. The value is never null. In the case of a void EXP, const0_rtx is returned. @@ -9034,7 +9091,7 @@ expand_expr_real_1 (tree exp, rtx target, enum mac { enum machine_mode mode1, mode2; HOST_WIDE_INT bitsize, bitpos; - tree offset; + tree offset, tem2; int volatilep = 0, must_force_mem; bool packedp = false; tree tem = get_inner_reference (exp, &bitsize, &bitpos, &offset, @@ -9051,6 +9108,15 @@ expand_expr_real_1 (tree exp, rtx target, enum mac && DECL_PACKED (TREE_OPERAND (exp, 1)))) packedp = true; + /* Experimental: Attempt to reassociate the base and offset + to combine constants. */ + if ((tem2 = reassociate_base_and_offset (tem, offset, TREE_TYPE (exp))) + != NULL_TREE) + { + tem = tem2; + offset = NULL_TREE; + } + /* If TEM's type is a union of variable size, pass TARGET to the inner computation, since it will need a temporary and TARGET is known to have to do. This occurs in unchecked conversion in Ada. */