diff mbox

Fix PR46932: Block auto increment on frame pointer

Message ID AM5PR0802MB2610C93E52AAFEFD3A1DA5B283A70@AM5PR0802MB2610.eurprd08.prod.outlook.com
State New
Headers show

Commit Message

Wilco Dijkstra July 20, 2017, 2:49 p.m. UTC
Block auto increment on frame pointer references.  This is never
beneficial since the SFP expands into SP+C or FP+C during register
allocation.  The generated code for the testcase is now as expected:

	str	x30, [sp, -32]!
	strb	w0, [sp, 31]
	add	x0, sp, 31
	bl	foo3
	ldr	x30, [sp], 32
	ret

On SPEC2017 codesize improves uniformly across the board.

ChangeLog:
2017-07-20  Wilco Dijkstra  <wdijkstr@arm.com>

	PR middle-end/46932
	* auto-inc-dec.c (parse_add_or_inc): Block autoinc on sfp.

gcc/testsuite/

	* gcc.dg/pr46932.c: New testcase.

--

Comments

Jeff Law July 25, 2017, 8:42 p.m. UTC | #1
On 07/20/2017 08:49 AM, Wilco Dijkstra wrote:
> Block auto increment on frame pointer references.  This is never
> beneficial since the SFP expands into SP+C or FP+C during register
> allocation.  The generated code for the testcase is now as expected:
> 
> 	str	x30, [sp, -32]!
> 	strb	w0, [sp, 31]
> 	add	x0, sp, 31
> 	bl	foo3
> 	ldr	x30, [sp], 32
> 	ret
> 
> On SPEC2017 codesize improves uniformly across the board.
> 
> ChangeLog:
> 2017-07-20  Wilco Dijkstra  <wdijkstr@arm.com>
> 
> 	PR middle-end/46932
> 	* auto-inc-dec.c (parse_add_or_inc): Block autoinc on sfp.
> 
> gcc/testsuite/
> 
> 	* gcc.dg/pr46932.c: New testcase.
My only concern here would be cases where we don't end up eliminating FP
to SP.  But I'd think it's unlikely that we'd have enough auto-inc
opportunities on the frame pointer for it to matter much anyway.

OK for the trunk.

jeff
Wilco Dijkstra July 25, 2017, 9:25 p.m. UTC | #2
Jeff Law wrote:

> My only concern here would be cases where we don't end up eliminating FP
> to SP.  But I'd think it's unlikely that we'd have enough auto-inc
> opportunities on the frame pointer for it to matter much anyway.

What kind of case are you thinking of? Whether it is SP or FP doesn't matter,
we cannot copy propagate the pointer:

        add     x1, fp, 32
        strb    w0, [x1, -1]!
    
Not even if the elimination offset is zero:

        mov     x1, fp
        strb    w0, [x1, 31]!

Basically an independent add is better than the above:

        strb    w0, [fp, 31]
        add     x1, fp, 31

I think the phase is still overly aggressive and there are more cases where it 
doesn't make sense to use auto increment. For example you will extra moves
just like the 2nd case above when it's not the last use of the source.

Wilco
Jeff Law July 25, 2017, 10:29 p.m. UTC | #3
On 07/25/2017 03:25 PM, Wilco Dijkstra wrote:
> Jeff Law wrote:
> 
>> My only concern here would be cases where we don't end up eliminating FP
>> to SP.  But I'd think it's unlikely that we'd have enough auto-inc
>> opportunities on the frame pointer for it to matter much anyway.
> 
> What kind of case are you thinking of? Whether it is SP or FP doesn't matter,
> we cannot copy propagate the pointer:
None in particular.  My point was that we're unlikely to even see that
many cases where an autoinc opportunity exists using the frame pointer
to begin with.  So I don't see that restricting auto-incs involving FP
should ever cause significant problems.




> 
>         add     x1, fp, 32
>         strb    w0, [x1, -1]!
>     
> Not even if the elimination offset is zero:
> 
>         mov     x1, fp
>         strb    w0, [x1, 31]!
> 
> Basically an independent add is better than the above:
> 
>         strb    w0, [fp, 31]
>         add     x1, fp, 31
> 
> I think the phase is still overly aggressive and there are more cases where it 
> doesn't make sense to use auto increment. For example you will extra moves
> just like the 2nd case above when it's not the last use of the source.
Certainly possible in the general sense, though we have some targets (SH
IIRC) that try really hard to exploit autoinc addressing modes.    On
modern processors I would think the independent add is usually a better
choice.

Instructions with multiple outputs are problematic for out-of-order
processors.  You often end up holding resources within the CPU until the
instruction can fully retire and often no dependent insns are allowed to
move forward either, even if they just depended on the embedded
arithmetic which may be ready to retire had it been issued as a separate
instruction.

Jeff
diff mbox

Patch

diff --git a/gcc/auto-inc-dec.c b/gcc/auto-inc-dec.c
index 91fa5cf0bbe04b8a2c2c2b9209d245a327de0ffd..db1bd5bba2cee9fbf24d6d522505ac292688aab9 100644
--- a/gcc/auto-inc-dec.c
+++ b/gcc/auto-inc-dec.c
@@ -769,6 +769,12 @@  parse_add_or_inc (rtx_insn *insn, bool before_mem)
   inc_insn.pat = pat;
   inc_insn.reg_res = SET_DEST (pat);
   inc_insn.reg0 = XEXP (SET_SRC (pat), 0);
+
+  /* Block any auto increment of the frame pointer since it expands into
+     an addition and cannot be removed by copy propagation.  */
+  if (inc_insn.reg0 == frame_pointer_rtx)
+    return false;
+
   if (rtx_equal_p (inc_insn.reg_res, inc_insn.reg0))
     inc_insn.form = before_mem ? FORM_PRE_INC : FORM_POST_INC;
   else
diff --git a/gcc/testsuite/gcc.dg/pr46932.c b/gcc/testsuite/gcc.dg/pr46932.c
new file mode 100644
index 0000000000000000000000000000000000000000..b96febcc095fbec2cc4e02d1155871220d17cc99
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr46932.c
@@ -0,0 +1,13 @@ 
+/* { dg-options "-O2 -fdump-rtl-auto_inc_dec" } */
+
+/* Check that accesses based on the frame pointer do not
+   use auto increment.  */
+
+extern void foo (char*);
+void t01 (char t)
+{
+  char c = t;
+  foo (&c);
+}
+
+/* { dg-final { scan-rtl-dump-not "success" "auto_inc_dec" } } */