Message ID | AM4PR0701MB21629E281C1C4538834D9806E4C10@AM4PR0701MB2162.eurprd07.prod.outlook.com |
---|---|
State | New |
Headers | show |
On Fri, Sep 30, 2016 at 9:48 AM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: > Hi, > > this patch mitigates the excessive stack usage on arm in code > that does lots of int64 shift ops like sha512. > > It reduces the stack usage in that example from 4K to 2K while > less than 0.5K would be expected. > > In all cases the additional set instructions are optimized later > so that this caused no code size increase, but just made > LRA's job a bit easier. > > It does certainly not solve the problem completely but at least > improve the stability, in an area that I'd call security relevant. > > > Boot-strapped and reg-tested on arm-linux-gnueabihf. > Is it OK for trunk? A comment before the SETs and a testcase would be nice. IIRC we do have stack size testcases via using -fstack-usage. Richard. > > Thanks > Bernd.
> A comment before the SETs and a testcase would be nice. IIRC > we do have stack size testcases via using -fstack-usage. Or -Wstack-usage, which might be more appropriate here.
Eric Botcazou wrote: >> A comment before the SETs and a testcase would be nice. IIRC >> we do have stack size testcases via using -fstack-usage. > >Or -Wstack-usage, which might be more appropriate here. Yes. good idea. I was not aware that we already have that kind of tests. When trying to write this test. I noticed, that I did not try -Os so far. But for -Os the stack is still the unchanged 3500 bytes. However for embedded targets I am often inclined to use -Os, and would certainly not expect the stack to explode... I see in arm.md there are places like /* If we're optimizing for size, we prefer the libgcc calls. */ if (optimize_function_for_size_p (cfun)) FAIL; /* Expand operation using core-registers. 'FAIL' would achieve the same thing, but this is a bit smarter. */ scratch1 = gen_reg_rtx (SImode); scratch2 = gen_reg_rtx (SImode); arm_emit_coreregs_64bit_shift (LSHIFTRT, operands[0], operands[1], operands[2], scratch1, scratch2); .. that explains why this happens. I think it would be better to use the emit_coreregs for shift count >= 32, because these are effectively 32-bit shifts. Will try if that can be improved, and come back with the results. Thanks Bernd.
2016-09-29 Bernd Edlinger <bernd.edlinger@hotmail.de> PR target/77308 * config/arm/arm.c (arm_emit_coreregs_64bit_shift): Clear the result register explicitly. Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c (revision 239624) +++ gcc/config/arm/arm.c (working copy) @@ -29159,6 +29159,7 @@ arm_emit_coreregs_64bit_shift (enum rtx_code code, /* Shifts by a constant less than 32. */ rtx reverse_amount = GEN_INT (32 - INTVAL (amount)); + emit_insn (SET (out, const0_rtx)); emit_insn (SET (out_down, LSHIFT (code, in_down, amount))); emit_insn (SET (out_down, ORR (REV_LSHIFT (code, in_up, reverse_amount), @@ -29170,12 +29171,11 @@ arm_emit_coreregs_64bit_shift (enum rtx_code code, /* Shifts by a constant greater than 31. */ rtx adj_amount = GEN_INT (INTVAL (amount) - 32); + emit_insn (SET (out, const0_rtx)); emit_insn (SET (out_down, SHIFT (code, in_up, adj_amount))); if (code == ASHIFTRT) emit_insn (gen_ashrsi3 (out_up, in_up, GEN_INT (31))); - else - emit_insn (SET (out_up, const0_rtx)); } } else