diff mbox

Reduce stack usage in sha512 (PR target/77308)

Message ID AM4PR0701MB21629E281C1C4538834D9806E4C10@AM4PR0701MB2162.eurprd07.prod.outlook.com
State New
Headers show

Commit Message

Bernd Edlinger Sept. 30, 2016, 7:48 a.m. UTC
Hi,

this patch mitigates the excessive stack usage on arm in code
that does lots of int64 shift ops like sha512.

It reduces the stack usage in that example from 4K to 2K while
less than 0.5K would be expected.

In all cases the additional set instructions are optimized later
so that this caused no code size increase, but just made
LRA's job a bit easier.

It does certainly not solve the problem completely but at least
improve the stability, in an area that I'd call security relevant.


Boot-strapped and reg-tested on arm-linux-gnueabihf.
Is it OK for trunk?


Thanks
Bernd.

Comments

Richard Biener Sept. 30, 2016, 8:39 a.m. UTC | #1
On Fri, Sep 30, 2016 at 9:48 AM, Bernd Edlinger
<bernd.edlinger@hotmail.de> wrote:
> Hi,
>
> this patch mitigates the excessive stack usage on arm in code
> that does lots of int64 shift ops like sha512.
>
> It reduces the stack usage in that example from 4K to 2K while
> less than 0.5K would be expected.
>
> In all cases the additional set instructions are optimized later
> so that this caused no code size increase, but just made
> LRA's job a bit easier.
>
> It does certainly not solve the problem completely but at least
> improve the stability, in an area that I'd call security relevant.
>
>
> Boot-strapped and reg-tested on arm-linux-gnueabihf.
> Is it OK for trunk?

A comment before the SETs and a testcase would be nice.  IIRC
we do have stack size testcases via using -fstack-usage.

Richard.

>
> Thanks
> Bernd.
Eric Botcazou Sept. 30, 2016, 9:31 a.m. UTC | #2
> A comment before the SETs and a testcase would be nice.  IIRC
> we do have stack size testcases via using -fstack-usage.

Or -Wstack-usage, which might be more appropriate here.
Bernd Edlinger Sept. 30, 2016, 10:14 a.m. UTC | #3
Eric Botcazou wrote:
>> A comment before the SETs and a testcase would be nice.  IIRC
>> we do have stack size testcases via using -fstack-usage.
>
>Or -Wstack-usage, which might be more appropriate here.

Yes.  good idea.  I was not aware that we already have that kind of tests.

When trying to write this test. I noticed, that I did not try -Os so far.
But for -Os the stack is still the unchanged 3500 bytes.

However for embedded targets I am often inclined to use -Os, and
would certainly not expect the stack to explode...

I see in arm.md there are places like

      /* If we're optimizing for size, we prefer the libgcc calls.  */
      if (optimize_function_for_size_p (cfun))
        FAIL;

      /* Expand operation using core-registers.
         'FAIL' would achieve the same thing, but this is a bit smarter.  */
      scratch1 = gen_reg_rtx (SImode);
      scratch2 = gen_reg_rtx (SImode);
      arm_emit_coreregs_64bit_shift (LSHIFTRT, operands[0], operands[1],
                                     operands[2], scratch1, scratch2);


.. that explains why this happens.  I think it would be better to
use the emit_coreregs for shift count >= 32, because these are
effectively 32-bit shifts.

Will try if that can be improved, and come back with the
results.


Thanks
Bernd.
diff mbox

Patch

2016-09-29  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR target/77308
	* config/arm/arm.c (arm_emit_coreregs_64bit_shift): Clear the result
	register explicitly.

Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 239624)
+++ gcc/config/arm/arm.c	(working copy)
@@ -29159,6 +29159,7 @@  arm_emit_coreregs_64bit_shift (enum rtx_code code,
 	  /* Shifts by a constant less than 32.  */
 	  rtx reverse_amount = GEN_INT (32 - INTVAL (amount));
 
+	  emit_insn (SET (out, const0_rtx));
 	  emit_insn (SET (out_down, LSHIFT (code, in_down, amount)));
 	  emit_insn (SET (out_down,
 			  ORR (REV_LSHIFT (code, in_up, reverse_amount),
@@ -29170,12 +29171,11 @@  arm_emit_coreregs_64bit_shift (enum rtx_code code,
 	  /* Shifts by a constant greater than 31.  */
 	  rtx adj_amount = GEN_INT (INTVAL (amount) - 32);
 
+	  emit_insn (SET (out, const0_rtx));
 	  emit_insn (SET (out_down, SHIFT (code, in_up, adj_amount)));
 	  if (code == ASHIFTRT)
 	    emit_insn (gen_ashrsi3 (out_up, in_up,
 				    GEN_INT (31)));
-	  else
-	    emit_insn (SET (out_up, const0_rtx));
 	}
     }
   else