Message ID | 1311111755.4429.16.camel@oc2474580526.ibm.com |
---|---|
State | New |
Headers | show |
On Tue, 19 Jul 2011, William J. Schmidt wrote: > I've been distracted by other things, but got back to this today... > > On Wed, 2011-07-06 at 16:58 +0200, Richard Guenther wrote: > > Ah, so we still have the ARRAY_REFs here. Yeah, well - then the > > issue boils down to get_inner_reference canonicalizing the offset > > according to what fold-const.c implements, and we simply emit > > the offset in that form (if you look at the normal_inner_ref: case > > in expand_expr_real*). Thus with the above form at expansion > > time the matching would be applied there (or during our RTL > > address-generation in fwprop.c ...). > > I put together a simple patch to match the specific pattern from the > original problem report in expand_expr_real_1. This returns the code > generation for that specific pattern to what it was a few years ago, but > has little effect on SPEC cpu2000 results. A small handful of > benchmarks are improved, but most results are in the noise range. So > solving the original problem alone doesn't account for very much of the > improvement we're getting with the full address lowering. > > Note that this only converted the basic pattern; I did not attempt to do > the extra "zero-offset mem-ref" optimization, which I would have had to > rewrite for an RTL phase. I don't see much point in pursuing that, > given these results. > > > > > Another idea is of course to lower all handled_component_p > > memory accesses - something I did on the old mem-ref branch > > and I believe I also suggested at some point. Without such lowering > > all the address-generation isn't exposed to the GIMPLE optimizers. > > > > The simplest way of doing the lowering is to do sth like > > > > base = get_inner_reference (rhs, ..., &bitpos, &offset, ...); > > base = fold_build2 (POINTER_PLUS_EXPR, ..., > > base, offset); > > base = force_gimple_operand (base, ... is_gimple_mem_ref_addr); > > tem = fold_build2 (MEM_REF, TREE_TYPE (rhs), > > base, > > build_int_cst (get_alias_ptr_type (rhs), bitpos)); > > > > at some point. I didn't end up doing that because of the loss of > > alias information. > > My next experiment will be to try something simple along these lines. > I'll look at the old mem-ref branch to see what you were doing there. > It will be interesting to see whether the gains generally outweigh the > small loss of TBAA information, as we saw using the affine-combination > approach. > > As an FYI, this is the patch I used for my experiment. Let me know if > you see anything horrifically wrong that might have impacted the > results. I didn't make any attempt to figure out whether the pattern > rewrite is appropriate for the target machine, since this was just an > experiment. I wonder if the code below triggered at all as since we expand from SSA we no longer see the larger trees in-place but you have to look them up via SSA defs using get_gimple_for_ssa_name (or the helper get_def_for_expr). So I expect that the check for a MULT_EXPR offset only will trigger if the memory reference was still in the form of a variable ARRAY_REF (then get_inner_reference will produce an offset of the form i * sizeof (element)). I think what we want to catch here is also the case of *(ptr + i) which will not show up as ARRAY_REF but as MEM[ptr + 0] where ptr is an SSA name with a defining statement doing ptr' + i * sizeof (element). Richard. > Thanks, > Bill > > Index: gcc/expr.c > =================================================================== > --- gcc/expr.c (revision 176422) > +++ gcc/expr.c (working copy) > @@ -7152,7 +7152,64 @@ expand_constructor (tree exp, rtx target, enum exp > return target; > } > > +/* Look for specific patterns in the inner reference BASE and its > + OFFSET expression. If found, return a single tree of type EXPTYPE > + that reassociates the addressability to combine constants. */ > +static tree > +reassociate_base_and_offset (tree base, tree offset, tree exptype) > +{ > + tree mult_op0, mult_op1, t1, t2; > + tree mult_expr, add_expr, mem_ref; > + tree offset_type; > + HOST_WIDE_INT c1, c2, c3, c; > > + /* Currently we look for the following pattern, where Tj is an > + arbitrary tree, and Ck is an integer constant: > + > + base: MEM_REF (T1, C1) > + offset: MULT_EXPR (PLUS_EXPR (T2, C2), C3) > + > + This is converted to: > + > + MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)), > + C1 + C2*C3) > + */ > + if (!base > + || !offset > + || TREE_CODE (base) != MEM_REF > + || TREE_CODE (TREE_OPERAND (base, 1)) != INTEGER_CST > + || TREE_CODE (offset) != MULT_EXPR) > + return NULL_TREE; > + > + mult_op0 = TREE_OPERAND (offset, 0); > + mult_op1 = TREE_OPERAND (offset, 1); > + > + if (TREE_CODE (mult_op0) != PLUS_EXPR > + || TREE_CODE (mult_op1) != INTEGER_CST > + || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST) > + return NULL_TREE; > + > + t1 = TREE_OPERAND (base, 0); > + t2 = TREE_OPERAND (mult_op0, 0); > + c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1)); > + c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1)); > + c3 = TREE_INT_CST_LOW (mult_op1); > + c = c1 + c2 * c3; > + > + offset_type = TREE_TYPE (TREE_OPERAND (base, 1)); > + > + mult_expr = build2 (MULT_EXPR, sizetype, t2, > + build_int_cst (offset_type, c3)); > + > + add_expr = build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr); > + > + mem_ref = build2 (MEM_REF, exptype, add_expr, > + build_int_cst (offset_type, c)); > + > + return mem_ref; > +} > + > + > /* expand_expr: generate code for computing expression EXP. > An rtx for the computed value is returned. The value is never null. > In the case of a void EXP, const0_rtx is returned. > @@ -9034,7 +9091,7 @@ expand_expr_real_1 (tree exp, rtx target, enum mac > { > enum machine_mode mode1, mode2; > HOST_WIDE_INT bitsize, bitpos; > - tree offset; > + tree offset, tem2; > int volatilep = 0, must_force_mem; > bool packedp = false; > tree tem = get_inner_reference (exp, &bitsize, &bitpos, &offset, > @@ -9051,6 +9108,15 @@ expand_expr_real_1 (tree exp, rtx target, enum mac > && DECL_PACKED (TREE_OPERAND (exp, 1)))) > packedp = true; > > + /* Experimental: Attempt to reassociate the base and offset > + to combine constants. */ > + if ((tem2 = reassociate_base_and_offset (tem, offset, TREE_TYPE (exp))) > + != NULL_TREE) > + { > + tem = tem2; > + offset = NULL_TREE; > + } > + > /* If TEM's type is a union of variable size, pass TARGET to the inner > computation, since it will need a temporary and TARGET is known > to have to do. This occurs in unchecked conversion in Ada. */ > > >
On Wed, 2011-07-20 at 11:03 +0200, Richard Guenther wrote: > I wonder if the code below triggered at all as since we expand from > SSA we no longer see the larger trees in-place but you have to > look them up via SSA defs using get_gimple_for_ssa_name (or the > helper get_def_for_expr). So I expect that the check for a MULT_EXPR > offset only will trigger if the memory reference was still in the > form of a variable ARRAY_REF (then get_inner_reference will produce > an offset of the form i * sizeof (element)). I think what we want > to catch here is also the case of *(ptr + i) which will not show > up as ARRAY_REF but as MEM[ptr + 0] where ptr is an SSA name > with a defining statement doing ptr' + i * sizeof (element). > > Richard. Confirmed. The compiler is pretty smart about reconstructing ARRAY_REFs out of *(p + i) if p is known to point to an array, but if p is a scalar pointer with unknown target the code doesn't kick in. I'll cobble up a version that tests for both patterns and get some new numbers. Thanks, Bill
Index: gcc/expr.c =================================================================== --- gcc/expr.c (revision 176422) +++ gcc/expr.c (working copy) @@ -7152,7 +7152,64 @@ expand_constructor (tree exp, rtx target, enum exp return target; } +/* Look for specific patterns in the inner reference BASE and its + OFFSET expression. If found, return a single tree of type EXPTYPE + that reassociates the addressability to combine constants. */ +static tree +reassociate_base_and_offset (tree base, tree offset, tree exptype) +{ + tree mult_op0, mult_op1, t1, t2; + tree mult_expr, add_expr, mem_ref; + tree offset_type; + HOST_WIDE_INT c1, c2, c3, c; + /* Currently we look for the following pattern, where Tj is an + arbitrary tree, and Ck is an integer constant: + + base: MEM_REF (T1, C1) + offset: MULT_EXPR (PLUS_EXPR (T2, C2), C3) + + This is converted to: + + MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)), + C1 + C2*C3) + */ + if (!base + || !offset + || TREE_CODE (base) != MEM_REF + || TREE_CODE (TREE_OPERAND (base, 1)) != INTEGER_CST + || TREE_CODE (offset) != MULT_EXPR) + return NULL_TREE; + + mult_op0 = TREE_OPERAND (offset, 0); + mult_op1 = TREE_OPERAND (offset, 1); + + if (TREE_CODE (mult_op0) != PLUS_EXPR + || TREE_CODE (mult_op1) != INTEGER_CST + || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST) + return NULL_TREE; + + t1 = TREE_OPERAND (base, 0); + t2 = TREE_OPERAND (mult_op0, 0); + c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1)); + c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1)); + c3 = TREE_INT_CST_LOW (mult_op1); + c = c1 + c2 * c3; + + offset_type = TREE_TYPE (TREE_OPERAND (base, 1)); + + mult_expr = build2 (MULT_EXPR, sizetype, t2, + build_int_cst (offset_type, c3)); + + add_expr = build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr); + + mem_ref = build2 (MEM_REF, exptype, add_expr, + build_int_cst (offset_type, c)); + + return mem_ref; +} + + /* expand_expr: generate code for computing expression EXP. An rtx for the computed value is returned. The value is never null. In the case of a void EXP, const0_rtx is returned. @@ -9034,7 +9091,7 @@ expand_expr_real_1 (tree exp, rtx target, enum mac { enum machine_mode mode1, mode2; HOST_WIDE_INT bitsize, bitpos; - tree offset; + tree offset, tem2; int volatilep = 0, must_force_mem; bool packedp = false; tree tem = get_inner_reference (exp, &bitsize, &bitpos, &offset, @@ -9051,6 +9108,15 @@ expand_expr_real_1 (tree exp, rtx target, enum mac && DECL_PACKED (TREE_OPERAND (exp, 1)))) packedp = true; + /* Experimental: Attempt to reassociate the base and offset + to combine constants. */ + if ((tem2 = reassociate_base_and_offset (tem, offset, TREE_TYPE (exp))) + != NULL_TREE) + { + tem = tem2; + offset = NULL_TREE; + } + /* If TEM's type is a union of variable size, pass TARGET to the inner computation, since it will need a temporary and TARGET is known to have to do. This occurs in unchecked conversion in Ada. */