From patchwork Tue Jul 19 21:42:35 2011
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
X-Patchwork-Id: 105547
Return-Path: 
 <gcc-patches-return-297302-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	by ozlabs.org (Postfix) with SMTP id EBA92B6F64
	for <incoming@patchwork.ozlabs.org>;
	Wed, 20 Jul 2011 07:43:18 +1000 (EST)
Received: (qmail 7816 invoked by alias); 19 Jul 2011 21:43:17 -0000
Received: (qmail 7798 invoked by uid 22791); 19 Jul 2011 21:43:16 -0000
X-SWARE-Spam-Status: No, hits=-2.2 required=5.0	tests=AWL, BAYES_00,
	RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from e32.co.us.ibm.com (HELO e32.co.us.ibm.com) (32.97.110.150) by
	sourceware.org (qpsmtpd/0.43rc1) with ESMTP;
	Tue, 19 Jul 2011 21:42:44 +0000
Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com
	[9.17.195.227])	by e32.co.us.ibm.com (8.14.4/8.13.1) with
	ESMTP id p6JKWmL9026647; Tue, 19 Jul 2011 14:32:48 -0600
Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com
	[9.17.195.167])	by d03relay02.boulder.ibm.com
	(8.13.8/8.13.8/NCO v9.1) with ESMTP id p6JLgbgx147234;
	Tue, 19 Jul 2011 15:42:37 -0600
Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1])	by
	d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with
	ESMTP id p6JLgav8006611; Tue, 19 Jul 2011 15:42:36 -0600
Received: from [9.10.86.161] (liebl-pc.rchland.ibm.com [9.10.86.161] (may be
	forged))	by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0
	AVin) with ESMTP id p6JLgaRA006595; Tue, 19 Jul 2011 15:42:36 -0600
Subject: Re: [PATCH] Address lowering [1/3] Main patch
From: "William J. Schmidt" <wschmidt@linux.vnet.ibm.com>
To: rguenth@gcc.gnu.org
Cc: gcc-patches@gcc.gnu.org, bergner@vnet.ibm.com
In-Reply-To: 
 <CAFiYyc11S1Zcx1LH9UCDSomGYTuiS+C5phvgtq-CyHZAtAHS-Q@mail.gmail.com>
References: <1309444782.26980.52.camel@oc2474580526.ibm.com>	
	<CAFiYyc0JjZLhqNEmnfgsrVN3X6X-7L4oGe=mN4kLmEe3N1aP4g@mail.gmail.com>	
	<1309874343.5402.34.camel@oc2474580526.ibm.com>	
	<CAFiYyc2oNiJJq8qymMPwgMk1613=Y_t2-GNXtr=LDk_8YK9VsQ@mail.gmail.com>	
	<1309962495.3758.29.camel@oc2474580526.ibm.com>	
	<CAFiYyc11S1Zcx1LH9UCDSomGYTuiS+C5phvgtq-CyHZAtAHS-Q@mail.gmail.com>
Date: Tue, 19 Jul 2011 16:42:35 -0500
Message-ID: <1311111755.4429.16.camel@oc2474580526.ibm.com>
Mime-Version: 1.0
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org

I've been distracted by other things, but got back to this today...

On Wed, 2011-07-06 at 16:58 +0200, Richard Guenther wrote:
> Ah, so we still have the ARRAY_REFs here.  Yeah, well - then the
> issue boils down to get_inner_reference canonicalizing the offset
> according to what fold-const.c implements, and we simply emit
> the offset in that form (if you look at the normal_inner_ref: case
> in expand_expr_real*).  Thus with the above form at expansion
> time the matching would be applied there (or during our RTL
> address-generation in fwprop.c ...).

I put together a simple patch to match the specific pattern from the
original problem report in expand_expr_real_1.  This returns the code
generation for that specific pattern to what it was a few years ago, but
has little effect on SPEC cpu2000 results.  A small handful of
benchmarks are improved, but most results are in the noise range.  So
solving the original problem alone doesn't account for very much of the
improvement we're getting with the full address lowering.

Note that this only converted the basic pattern; I did not attempt to do
the extra "zero-offset mem-ref" optimization, which I would have had to
rewrite for an RTL phase.  I don't see much point in pursuing that,
given these results.

> 
> Another idea is of course to lower all handled_component_p
> memory accesses - something I did on the old mem-ref branch
> and I believe I also suggested at some point.  Without such lowering
> all the address-generation isn't exposed to the GIMPLE optimizers.
> 
> The simplest way of doing the lowering is to do sth like
> 
>   base = get_inner_reference (rhs, ..., &bitpos, &offset, ...);
>   base = fold_build2 (POINTER_PLUS_EXPR, ...,
>                                base, offset);
>   base = force_gimple_operand (base, ... is_gimple_mem_ref_addr);
>   tem = fold_build2 (MEM_REF, TREE_TYPE (rhs),
>                              base,
>                              build_int_cst (get_alias_ptr_type (rhs), bitpos));
> 
> at some point.  I didn't end up doing that because of the loss of
> alias information.

My next experiment will be to try something simple along these lines.
I'll look at the old mem-ref branch to see what you were doing there.
It will be interesting to see whether the gains generally outweigh the
small loss of TBAA information, as we saw using the affine-combination
approach.

As an FYI, this is the patch I used for my experiment.  Let me know if
you see anything horrifically wrong that might have impacted the
results.  I didn't make any attempt to figure out whether the pattern
rewrite is appropriate for the target machine, since this was just an
experiment.

Thanks,
Bill

Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 176422)
+++ gcc/expr.c	(working copy)
@@ -7152,7 +7152,64 @@ expand_constructor (tree exp, rtx target, enum exp
   return target;
 }
 
+/* Look for specific patterns in the inner reference BASE and its
+   OFFSET expression.  If found, return a single tree of type EXPTYPE
+   that reassociates the addressability to combine constants.  */
+static tree
+reassociate_base_and_offset (tree base, tree offset, tree exptype)
+{
+  tree mult_op0, mult_op1, t1, t2;
+  tree mult_expr, add_expr, mem_ref;
+  tree offset_type;
+  HOST_WIDE_INT c1, c2, c3, c;
 
+  /* Currently we look for the following pattern, where Tj is an
+     arbitrary tree, and Ck is an integer constant:
+
+     base:    MEM_REF (T1, C1)
+     offset:  MULT_EXPR (PLUS_EXPR (T2, C2), C3)
+
+     This is converted to:
+
+              MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
+                       C1 + C2*C3)
+  */
+  if (!base
+      || !offset
+      || TREE_CODE (base) != MEM_REF
+      || TREE_CODE (TREE_OPERAND (base, 1)) != INTEGER_CST
+      || TREE_CODE (offset) != MULT_EXPR)
+    return NULL_TREE;
+
+  mult_op0 = TREE_OPERAND (offset, 0);
+  mult_op1 = TREE_OPERAND (offset, 1);
+
+  if (TREE_CODE (mult_op0) != PLUS_EXPR
+      || TREE_CODE (mult_op1) != INTEGER_CST
+      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
+    return NULL_TREE;
+
+  t1 = TREE_OPERAND (base, 0);
+  t2 = TREE_OPERAND (mult_op0, 0);
+  c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1));
+  c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1));
+  c3 = TREE_INT_CST_LOW (mult_op1);
+  c = c1 + c2 * c3;
+
+  offset_type = TREE_TYPE (TREE_OPERAND (base, 1));
+
+  mult_expr = build2 (MULT_EXPR, sizetype, t2, 
+		      build_int_cst (offset_type, c3));
+
+  add_expr = build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
+
+  mem_ref = build2 (MEM_REF, exptype, add_expr, 
+		    build_int_cst (offset_type, c));
+
+  return mem_ref;
+}
+
+
 /* expand_expr: generate code for computing expression EXP.
    An rtx for the computed value is returned.  The value is never null.
    In the case of a void EXP, const0_rtx is returned.
@@ -9034,7 +9091,7 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
       {
 	enum machine_mode mode1, mode2;
 	HOST_WIDE_INT bitsize, bitpos;
-	tree offset;
+	tree offset, tem2;
 	int volatilep = 0, must_force_mem;
 	bool packedp = false;
 	tree tem = get_inner_reference (exp, &bitsize, &bitpos, &offset,
@@ -9051,6 +9108,15 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
 		&& DECL_PACKED (TREE_OPERAND (exp, 1))))
 	  packedp = true;
 
+	/* Experimental:  Attempt to reassociate the base and offset
+	   to combine constants.  */
+	if ((tem2 = reassociate_base_and_offset (tem, offset, TREE_TYPE (exp)))
+	    != NULL_TREE)
+	  {
+	    tem = tem2;
+	    offset = NULL_TREE;
+	  }
+
 	/* If TEM's type is a union of variable size, pass TARGET to the inner
 	   computation, since it will need a temporary and TARGET is known
 	   to have to do.  This occurs in unchecked conversion in Ada.  */