Message ID | 20190905203409.759-1-jimw@sifive.com |
---|---|
State | New |
Headers | show |
Series | RISC-V: Fix bad insn splits with paradoxical subregs. | expand |
Hi Jim, On Thu, Sep 05, 2019 at 01:34:09PM -0700, Jim Wilson wrote: > Shifting by more than the size of a SUBREG_REG doesn't work, so we either > need to disable splits if an input is paradoxical, or else we need to > generate a clean temporary for intermediate results. My big question is why do other targets not have this problem? Or what is it they do differently? Segher
On Thu, Sep 5, 2019 at 3:03 PM Segher Boessenkool <segher@kernel.crashing.org> wrote: > My big question is why do other targets not have this problem? Or what > is it they do differently? RISC-V doesn't have convenient sign/zero extend instructions. We also have a 12-bit immediate, whereas most other RISCs have a 16-bit immediate, so we can't easily load a constant and mask to zero-extend an HImode value. Thus we have a fair amount of splitters that emit two shifts for zero/sign extend operations, or try to optimize various combinations of shifts or AND operations that can map to instructions that we do have. We also have two splitters that were reusing input registers for intermediate values when they should have been using a clobber to get a temporary. The failure case is when we end up generating something like (lshiftrt:DI (subreg:DI (reg:SI)) (const_int 48)) after a 48 bit left shift with a paradoxical reg input reused for an intermediate result, and the rtl simplifier says that must be zero because an SImode reg doesn't have any valid bits in the upper 32-bits. This is may also be tied into the fact that we have WORD_REGISTER_OPERATIONS defined. Some funny stuff can happen with that defined, and I think most of the other major ports no longer define it. I haven't yet been tempted enough to try to remove that define from the RISC-V port yet, but suspect it will eventually be necessary. I think the WORD_REGISTER_OPERATIONS define is so badly misused by the optimizer nowadays that it isn't worth the trouble anymore. But removing it will be a lot of work to check for performance and code size regressions, so I haven't looked at it yet, and may not for a long while. The other three patterns that were fixed by adding a check for paradoxical regs have no known failure case, and were only fixed for completeness. These are hypothetically necessary changes, and may not in fact be actually necessary, but they appear to be completely harmless so I saw no harm in adding them. In the testing I did, I never saw a paradoxical reg here. But that isn't proof that a paradoxical reg can never occur here. Jim
Hi Jim, On Thu, Sep 05, 2019 at 04:40:56PM -0700, Jim Wilson wrote: > On Thu, Sep 5, 2019 at 3:03 PM Segher Boessenkool > <segher@kernel.crashing.org> wrote: > > My big question is why do other targets not have this problem? Or what > > is it they do differently? > > RISC-V doesn't have convenient sign/zero extend instructions. We also > have a 12-bit immediate, whereas most other RISCs have a 16-bit > immediate, so we can't easily load a constant and mask to zero-extend > an HImode value. Thus we have a fair amount of splitters that emit > two shifts for zero/sign extend operations, or try to optimize various > combinations of shifts or AND operations that can map to instructions > that we do have. There are other targets in that same boat, it's not *so* unusual. > We also have two splitters that were reusing input > registers for intermediate values when they should have been using a > clobber to get a temporary. Ah, that is just it I guess. And the problem happens with *pseudos*, which have just their one mode always (but can be used in another mode). And then things go wrong if you use one in a bigger mode than it is. Hrm I wonder if we could (or should) protect against this kind of thing in generic code. combine could have seen you did something undefined I suppose. > The failure case is when we end up > generating something like (lshiftrt:DI (subreg:DI (reg:SI)) (const_int > 48)) after a 48 bit left shift with a paradoxical reg input reused for > an intermediate result, and the rtl simplifier says that must be zero > because an SImode reg doesn't have any valid bits in the upper > 32-bits. Since it's a pseudo, not a hard reg, yeah. > This is may also be tied into the fact that we have > WORD_REGISTER_OPERATIONS defined. Some funny stuff can happen with > that defined, and I think most of the other major ports no longer > define it. Yeah. Not only because of problems, but also because it doesn't help much, and hurts in cases too, so it depends what way the balance swings for your target. > I haven't yet been tempted enough to try to remove that > define from the RISC-V port yet, but suspect it will eventually be > necessary. I think the WORD_REGISTER_OPERATIONS define is so badly > misused by the optimizer nowadays that it isn't worth the trouble > anymore. But removing it will be a lot of work to check for > performance and code size regressions, so I haven't looked at it yet, > and may not for a long while. I tested it yesterday with builds of Linux. Compiler from Sunday, kernel tree a few weeks old. riscv32 and riscv64, both with their defconfigs. Removing WORD_REGISTER_OPERATIONS results in 0.001% smaller code for riscv32, and 0.036% larger code for riscv64. For the latter, shift/shift combos were often involved. But it is pretty minimal either way :-) > The other three patterns that were fixed by adding a check for > paradoxical regs have no known failure case, and were only fixed for > completeness. These are hypothetically necessary changes, and may not > in fact be actually necessary, but they appear to be completely > harmless so I saw no harm in adding them. In the testing I did, I > never saw a paradoxical reg here. But that isn't proof that a > paradoxical reg can never occur here. Thanks for the explanations, that clears things up :-) Segher
Hi Jakub, Richard: This commit is fixing wrong code gen for RISC-V, does it OK to backport to GCC 9 branch? On Fri, Sep 6, 2019 at 4:34 AM Jim Wilson <jimw@sifive.com> wrote: > > Shifting by more than the size of a SUBREG_REG doesn't work, so we either > need to disable splits if an input is paradoxical, or else we need to > generate a clean temporary for intermediate results. > > This was tested with rv32i/newlib and rv64gc/linux cross builds and checks. > There were no regressions. The new pr91635.c testcase works with the patch > and fails without it. Also, code size for libc and libstdc++ was checked > and is smaller or equal sized with the patch, ignoring alignment padding. > The shift-shift-4.c testcase gives better code size with this patch. > The shift-shift-5.c testcase gave worse code size with a draft version > of this patch and is OK with the final version of this patch. > > Jakub wrote the first version of this patch, so gets primary credit for it. > > Committed. > > Jim > > gcc/ > PR target/91635 > * config/riscv/riscv.md (zero_extendsidi2, zero_extendhi<GPR:mode>2, > extend<SHORT:mode><SUPERQI:mode>2): Don't split if > paradoxical_subreg_p (operands[0]). > (*lshrsi3_zero_extend_3+1, *lshrsi3_zero_extend_3+2): Add clobber and > use as intermediate value. > > gcc/testsuite/ > PR target/91635 > * gcc.c-torture/execute/pr91635.c: New test. > * gcc.target/riscv/shift-shift-4.c: New test. > * gcc.target/riscv/shift-shift-5.c: New test. > --- > gcc/config/riscv/riscv.md | 30 +++++++--- > gcc/testsuite/gcc.c-torture/execute/pr91635.c | 57 +++++++++++++++++++ > .../gcc.target/riscv/shift-shift-4.c | 13 +++++ > .../gcc.target/riscv/shift-shift-5.c | 16 ++++++ > 4 files changed, 107 insertions(+), 9 deletions(-) > create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr91635.c > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-4.c > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-5.c > > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md > index 78260fcf6fd..744a027a1b7 100644 > --- a/gcc/config/riscv/riscv.md > +++ b/gcc/config/riscv/riscv.md > @@ -1051,7 +1051,9 @@ > "@ > # > lwu\t%0,%1" > - "&& reload_completed && REG_P (operands[1])" > + "&& reload_completed > + && REG_P (operands[1]) > + && !paradoxical_subreg_p (operands[0])" > [(set (match_dup 0) > (ashift:DI (match_dup 1) (const_int 32))) > (set (match_dup 0) > @@ -1068,7 +1070,9 @@ > "@ > # > lhu\t%0,%1" > - "&& reload_completed && REG_P (operands[1])" > + "&& reload_completed > + && REG_P (operands[1]) > + && !paradoxical_subreg_p (operands[0])" > [(set (match_dup 0) > (ashift:GPR (match_dup 1) (match_dup 2))) > (set (match_dup 0) > @@ -1117,7 +1121,9 @@ > "@ > # > l<SHORT:size>\t%0,%1" > - "&& reload_completed && REG_P (operands[1])" > + "&& reload_completed > + && REG_P (operands[1]) > + && !paradoxical_subreg_p (operands[0])" > [(set (match_dup 0) (ashift:SI (match_dup 1) (match_dup 2))) > (set (match_dup 0) (ashiftrt:SI (match_dup 0) (match_dup 2)))] > { > @@ -1766,15 +1772,20 @@ > ;; Handle AND with 2^N-1 for N from 12 to XLEN. This can be split into > ;; two logical shifts. Otherwise it requires 3 instructions: lui, > ;; xor/addi/srli, and. > + > +;; Generating a temporary for the shift output gives better combiner results; > +;; and also fixes a problem where op0 could be a paradoxical reg and shifting > +;; by amounts larger than the size of the SUBREG_REG doesn't work. > (define_split > [(set (match_operand:GPR 0 "register_operand") > (and:GPR (match_operand:GPR 1 "register_operand") > - (match_operand:GPR 2 "p2m1_shift_operand")))] > + (match_operand:GPR 2 "p2m1_shift_operand"))) > + (clobber (match_operand:GPR 3 "register_operand"))] > "" > - [(set (match_dup 0) > + [(set (match_dup 3) > (ashift:GPR (match_dup 1) (match_dup 2))) > (set (match_dup 0) > - (lshiftrt:GPR (match_dup 0) (match_dup 2)))] > + (lshiftrt:GPR (match_dup 3) (match_dup 2)))] > { > /* Op2 is a VOIDmode constant, so get the mode size from op1. */ > operands[2] = GEN_INT (GET_MODE_BITSIZE (GET_MODE (operands[1])) > @@ -1786,12 +1797,13 @@ > (define_split > [(set (match_operand:DI 0 "register_operand") > (and:DI (match_operand:DI 1 "register_operand") > - (match_operand:DI 2 "high_mask_shift_operand")))] > + (match_operand:DI 2 "high_mask_shift_operand"))) > + (clobber (match_operand:DI 3 "register_operand"))] > "TARGET_64BIT" > - [(set (match_dup 0) > + [(set (match_dup 3) > (lshiftrt:DI (match_dup 1) (match_dup 2))) > (set (match_dup 0) > - (ashift:DI (match_dup 0) (match_dup 2)))] > + (ashift:DI (match_dup 3) (match_dup 2)))] > { > operands[2] = GEN_INT (ctz_hwi (INTVAL (operands[2]))); > }) > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr91635.c b/gcc/testsuite/gcc.c-torture/execute/pr91635.c > new file mode 100644 > index 00000000000..878a491fc36 > --- /dev/null > +++ b/gcc/testsuite/gcc.c-torture/execute/pr91635.c > @@ -0,0 +1,57 @@ > +/* PR target/91635 */ > + > +#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 \ > + && __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8 > +unsigned short b, c; > +int u, v, w, x; > + > +__attribute__ ((noipa)) int > +foo (unsigned short c) > +{ > + c <<= __builtin_add_overflow (-c, -1, &b); > + c >>= 1; > + return c; > +} > + > +__attribute__ ((noipa)) int > +bar (unsigned short b) > +{ > + b <<= -14 & 15; > + b = b >> -~1; > + return b; > +} > + > +__attribute__ ((noipa)) int > +baz (unsigned short e) > +{ > + e <<= 1; > + e >>= __builtin_add_overflow (8719476735, u, &v); > + return e; > +} > + > +__attribute__ ((noipa)) int > +qux (unsigned int e) > +{ > + c = ~1; > + c *= e; > + c = c >> (-15 & 5); > + return c + w + x; > +} > +#endif > + > +int > +main () > +{ > +#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 \ > + && __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8 > + if (foo (0xffff) != 0x7fff) > + __builtin_abort (); > + if (bar (5) != 5) > + __builtin_abort (); > + if (baz (~0) != 0x7fff) > + __builtin_abort (); > + if (qux (2) != 0x7ffe) > + __builtin_abort (); > +#endif > + return 0; > +} > diff --git a/gcc/testsuite/gcc.target/riscv/shift-shift-4.c b/gcc/testsuite/gcc.target/riscv/shift-shift-4.c > new file mode 100644 > index 00000000000..72a45ee87ae > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/shift-shift-4.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv32i -mabi=ilp32 -O2" } */ > + > +/* One zero-extend shift can be eliminated by modifying the constant in the > + greater than test. Started working after modifying the splitter > + lshrsi3_zero_extend_3+1 to use a temporary reg for the first split dest. */ > +int > +sub (int i) > +{ > + i &= 0x7fffffff; > + return i > 0x7f800000; > +} > +/* { dg-final { scan-assembler-not "srli" } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/shift-shift-5.c b/gcc/testsuite/gcc.target/riscv/shift-shift-5.c > new file mode 100644 > index 00000000000..5b2ae89a471 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/shift-shift-5.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gc -mabi=lp64d -O2" } */ > + > +/* Fails if lshrsi3_zero_extend_3+1 uses a temp reg which has no REG_DEST > + note. */ > +unsigned long > +sub (long l) > +{ > + union u { > + struct s { int a : 19; unsigned int b : 13; int x; } s; > + long l; > + } u; > + u.l = l; > + return u.s.b; > +} > +/* { dg-final { scan-assembler "srliw" } } */ > -- > 2.17.1 >
On Wed, 18 Sep 2019, Kito Cheng wrote: > Hi Jakub, Richard: > > This commit is fixing wrong code gen for RISC-V, does it OK to > backport to GCC 9 branch? Since it is target specific and for non-primary/secondary targets it's the RISC-V maintainers call whether to allow backporting this. Generally wrong-code issues can be backported even if they are not regressions if the chance they introduce other issues is low. Richard. > On Fri, Sep 6, 2019 at 4:34 AM Jim Wilson <jimw@sifive.com> wrote: > > > > Shifting by more than the size of a SUBREG_REG doesn't work, so we either > > need to disable splits if an input is paradoxical, or else we need to > > generate a clean temporary for intermediate results. > > > > This was tested with rv32i/newlib and rv64gc/linux cross builds and checks. > > There were no regressions. The new pr91635.c testcase works with the patch > > and fails without it. Also, code size for libc and libstdc++ was checked > > and is smaller or equal sized with the patch, ignoring alignment padding. > > The shift-shift-4.c testcase gives better code size with this patch. > > The shift-shift-5.c testcase gave worse code size with a draft version > > of this patch and is OK with the final version of this patch. > > > > Jakub wrote the first version of this patch, so gets primary credit for it. > > > > Committed. > > > > Jim > > > > gcc/ > > PR target/91635 > > * config/riscv/riscv.md (zero_extendsidi2, zero_extendhi<GPR:mode>2, > > extend<SHORT:mode><SUPERQI:mode>2): Don't split if > > paradoxical_subreg_p (operands[0]). > > (*lshrsi3_zero_extend_3+1, *lshrsi3_zero_extend_3+2): Add clobber and > > use as intermediate value. > > > > gcc/testsuite/ > > PR target/91635 > > * gcc.c-torture/execute/pr91635.c: New test. > > * gcc.target/riscv/shift-shift-4.c: New test. > > * gcc.target/riscv/shift-shift-5.c: New test. > > --- > > gcc/config/riscv/riscv.md | 30 +++++++--- > > gcc/testsuite/gcc.c-torture/execute/pr91635.c | 57 +++++++++++++++++++ > > .../gcc.target/riscv/shift-shift-4.c | 13 +++++ > > .../gcc.target/riscv/shift-shift-5.c | 16 ++++++ > > 4 files changed, 107 insertions(+), 9 deletions(-) > > create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr91635.c > > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-4.c > > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-5.c > > > > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md > > index 78260fcf6fd..744a027a1b7 100644 > > --- a/gcc/config/riscv/riscv.md > > +++ b/gcc/config/riscv/riscv.md > > @@ -1051,7 +1051,9 @@ > > "@ > > # > > lwu\t%0,%1" > > - "&& reload_completed && REG_P (operands[1])" > > + "&& reload_completed > > + && REG_P (operands[1]) > > + && !paradoxical_subreg_p (operands[0])" > > [(set (match_dup 0) > > (ashift:DI (match_dup 1) (const_int 32))) > > (set (match_dup 0) > > @@ -1068,7 +1070,9 @@ > > "@ > > # > > lhu\t%0,%1" > > - "&& reload_completed && REG_P (operands[1])" > > + "&& reload_completed > > + && REG_P (operands[1]) > > + && !paradoxical_subreg_p (operands[0])" > > [(set (match_dup 0) > > (ashift:GPR (match_dup 1) (match_dup 2))) > > (set (match_dup 0) > > @@ -1117,7 +1121,9 @@ > > "@ > > # > > l<SHORT:size>\t%0,%1" > > - "&& reload_completed && REG_P (operands[1])" > > + "&& reload_completed > > + && REG_P (operands[1]) > > + && !paradoxical_subreg_p (operands[0])" > > [(set (match_dup 0) (ashift:SI (match_dup 1) (match_dup 2))) > > (set (match_dup 0) (ashiftrt:SI (match_dup 0) (match_dup 2)))] > > { > > @@ -1766,15 +1772,20 @@ > > ;; Handle AND with 2^N-1 for N from 12 to XLEN. This can be split into > > ;; two logical shifts. Otherwise it requires 3 instructions: lui, > > ;; xor/addi/srli, and. > > + > > +;; Generating a temporary for the shift output gives better combiner results; > > +;; and also fixes a problem where op0 could be a paradoxical reg and shifting > > +;; by amounts larger than the size of the SUBREG_REG doesn't work. > > (define_split > > [(set (match_operand:GPR 0 "register_operand") > > (and:GPR (match_operand:GPR 1 "register_operand") > > - (match_operand:GPR 2 "p2m1_shift_operand")))] > > + (match_operand:GPR 2 "p2m1_shift_operand"))) > > + (clobber (match_operand:GPR 3 "register_operand"))] > > "" > > - [(set (match_dup 0) > > + [(set (match_dup 3) > > (ashift:GPR (match_dup 1) (match_dup 2))) > > (set (match_dup 0) > > - (lshiftrt:GPR (match_dup 0) (match_dup 2)))] > > + (lshiftrt:GPR (match_dup 3) (match_dup 2)))] > > { > > /* Op2 is a VOIDmode constant, so get the mode size from op1. */ > > operands[2] = GEN_INT (GET_MODE_BITSIZE (GET_MODE (operands[1])) > > @@ -1786,12 +1797,13 @@ > > (define_split > > [(set (match_operand:DI 0 "register_operand") > > (and:DI (match_operand:DI 1 "register_operand") > > - (match_operand:DI 2 "high_mask_shift_operand")))] > > + (match_operand:DI 2 "high_mask_shift_operand"))) > > + (clobber (match_operand:DI 3 "register_operand"))] > > "TARGET_64BIT" > > - [(set (match_dup 0) > > + [(set (match_dup 3) > > (lshiftrt:DI (match_dup 1) (match_dup 2))) > > (set (match_dup 0) > > - (ashift:DI (match_dup 0) (match_dup 2)))] > > + (ashift:DI (match_dup 3) (match_dup 2)))] > > { > > operands[2] = GEN_INT (ctz_hwi (INTVAL (operands[2]))); > > }) > > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr91635.c b/gcc/testsuite/gcc.c-torture/execute/pr91635.c > > new file mode 100644 > > index 00000000000..878a491fc36 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.c-torture/execute/pr91635.c > > @@ -0,0 +1,57 @@ > > +/* PR target/91635 */ > > + > > +#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 \ > > + && __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8 > > +unsigned short b, c; > > +int u, v, w, x; > > + > > +__attribute__ ((noipa)) int > > +foo (unsigned short c) > > +{ > > + c <<= __builtin_add_overflow (-c, -1, &b); > > + c >>= 1; > > + return c; > > +} > > + > > +__attribute__ ((noipa)) int > > +bar (unsigned short b) > > +{ > > + b <<= -14 & 15; > > + b = b >> -~1; > > + return b; > > +} > > + > > +__attribute__ ((noipa)) int > > +baz (unsigned short e) > > +{ > > + e <<= 1; > > + e >>= __builtin_add_overflow (8719476735, u, &v); > > + return e; > > +} > > + > > +__attribute__ ((noipa)) int > > +qux (unsigned int e) > > +{ > > + c = ~1; > > + c *= e; > > + c = c >> (-15 & 5); > > + return c + w + x; > > +} > > +#endif > > + > > +int > > +main () > > +{ > > +#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 \ > > + && __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8 > > + if (foo (0xffff) != 0x7fff) > > + __builtin_abort (); > > + if (bar (5) != 5) > > + __builtin_abort (); > > + if (baz (~0) != 0x7fff) > > + __builtin_abort (); > > + if (qux (2) != 0x7ffe) > > + __builtin_abort (); > > +#endif > > + return 0; > > +} > > diff --git a/gcc/testsuite/gcc.target/riscv/shift-shift-4.c b/gcc/testsuite/gcc.target/riscv/shift-shift-4.c > > new file mode 100644 > > index 00000000000..72a45ee87ae > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/riscv/shift-shift-4.c > > @@ -0,0 +1,13 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-march=rv32i -mabi=ilp32 -O2" } */ > > + > > +/* One zero-extend shift can be eliminated by modifying the constant in the > > + greater than test. Started working after modifying the splitter > > + lshrsi3_zero_extend_3+1 to use a temporary reg for the first split dest. */ > > +int > > +sub (int i) > > +{ > > + i &= 0x7fffffff; > > + return i > 0x7f800000; > > +} > > +/* { dg-final { scan-assembler-not "srli" } } */ > > diff --git a/gcc/testsuite/gcc.target/riscv/shift-shift-5.c b/gcc/testsuite/gcc.target/riscv/shift-shift-5.c > > new file mode 100644 > > index 00000000000..5b2ae89a471 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/riscv/shift-shift-5.c > > @@ -0,0 +1,16 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-march=rv64gc -mabi=lp64d -O2" } */ > > + > > +/* Fails if lshrsi3_zero_extend_3+1 uses a temp reg which has no REG_DEST > > + note. */ > > +unsigned long > > +sub (long l) > > +{ > > + union u { > > + struct s { int a : 19; unsigned int b : 13; int x; } s; > > + long l; > > + } u; > > + u.l = l; > > + return u.s.b; > > +} > > +/* { dg-final { scan-assembler "srliw" } } */ > > -- > > 2.17.1 > > >
Hi Richard: Got it, thanks :) On Wed, Sep 18, 2019 at 6:25 PM Richard Biener <rguenther@suse.de> wrote: > > On Wed, 18 Sep 2019, Kito Cheng wrote: > > > Hi Jakub, Richard: > > > > This commit is fixing wrong code gen for RISC-V, does it OK to > > backport to GCC 9 branch? > > Since it is target specific and for non-primary/secondary targets > it's the RISC-V maintainers call whether to allow backporting this. > Generally wrong-code issues can be backported even if they are not > regressions if the chance they introduce other issues is low. > > Richard. > > > On Fri, Sep 6, 2019 at 4:34 AM Jim Wilson <jimw@sifive.com> wrote: > > > > > > Shifting by more than the size of a SUBREG_REG doesn't work, so we either > > > need to disable splits if an input is paradoxical, or else we need to > > > generate a clean temporary for intermediate results. > > > > > > This was tested with rv32i/newlib and rv64gc/linux cross builds and checks. > > > There were no regressions. The new pr91635.c testcase works with the patch > > > and fails without it. Also, code size for libc and libstdc++ was checked > > > and is smaller or equal sized with the patch, ignoring alignment padding. > > > The shift-shift-4.c testcase gives better code size with this patch. > > > The shift-shift-5.c testcase gave worse code size with a draft version > > > of this patch and is OK with the final version of this patch. > > > > > > Jakub wrote the first version of this patch, so gets primary credit for it. > > > > > > Committed. > > > > > > Jim > > > > > > gcc/ > > > PR target/91635 > > > * config/riscv/riscv.md (zero_extendsidi2, zero_extendhi<GPR:mode>2, > > > extend<SHORT:mode><SUPERQI:mode>2): Don't split if > > > paradoxical_subreg_p (operands[0]). > > > (*lshrsi3_zero_extend_3+1, *lshrsi3_zero_extend_3+2): Add clobber and > > > use as intermediate value. > > > > > > gcc/testsuite/ > > > PR target/91635 > > > * gcc.c-torture/execute/pr91635.c: New test. > > > * gcc.target/riscv/shift-shift-4.c: New test. > > > * gcc.target/riscv/shift-shift-5.c: New test. > > > --- > > > gcc/config/riscv/riscv.md | 30 +++++++--- > > > gcc/testsuite/gcc.c-torture/execute/pr91635.c | 57 +++++++++++++++++++ > > > .../gcc.target/riscv/shift-shift-4.c | 13 +++++ > > > .../gcc.target/riscv/shift-shift-5.c | 16 ++++++ > > > 4 files changed, 107 insertions(+), 9 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr91635.c > > > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-4.c > > > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-5.c > > > > > > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md > > > index 78260fcf6fd..744a027a1b7 100644 > > > --- a/gcc/config/riscv/riscv.md > > > +++ b/gcc/config/riscv/riscv.md > > > @@ -1051,7 +1051,9 @@ > > > "@ > > > # > > > lwu\t%0,%1" > > > - "&& reload_completed && REG_P (operands[1])" > > > + "&& reload_completed > > > + && REG_P (operands[1]) > > > + && !paradoxical_subreg_p (operands[0])" > > > [(set (match_dup 0) > > > (ashift:DI (match_dup 1) (const_int 32))) > > > (set (match_dup 0) > > > @@ -1068,7 +1070,9 @@ > > > "@ > > > # > > > lhu\t%0,%1" > > > - "&& reload_completed && REG_P (operands[1])" > > > + "&& reload_completed > > > + && REG_P (operands[1]) > > > + && !paradoxical_subreg_p (operands[0])" > > > [(set (match_dup 0) > > > (ashift:GPR (match_dup 1) (match_dup 2))) > > > (set (match_dup 0) > > > @@ -1117,7 +1121,9 @@ > > > "@ > > > # > > > l<SHORT:size>\t%0,%1" > > > - "&& reload_completed && REG_P (operands[1])" > > > + "&& reload_completed > > > + && REG_P (operands[1]) > > > + && !paradoxical_subreg_p (operands[0])" > > > [(set (match_dup 0) (ashift:SI (match_dup 1) (match_dup 2))) > > > (set (match_dup 0) (ashiftrt:SI (match_dup 0) (match_dup 2)))] > > > { > > > @@ -1766,15 +1772,20 @@ > > > ;; Handle AND with 2^N-1 for N from 12 to XLEN. This can be split into > > > ;; two logical shifts. Otherwise it requires 3 instructions: lui, > > > ;; xor/addi/srli, and. > > > + > > > +;; Generating a temporary for the shift output gives better combiner results; > > > +;; and also fixes a problem where op0 could be a paradoxical reg and shifting > > > +;; by amounts larger than the size of the SUBREG_REG doesn't work. > > > (define_split > > > [(set (match_operand:GPR 0 "register_operand") > > > (and:GPR (match_operand:GPR 1 "register_operand") > > > - (match_operand:GPR 2 "p2m1_shift_operand")))] > > > + (match_operand:GPR 2 "p2m1_shift_operand"))) > > > + (clobber (match_operand:GPR 3 "register_operand"))] > > > "" > > > - [(set (match_dup 0) > > > + [(set (match_dup 3) > > > (ashift:GPR (match_dup 1) (match_dup 2))) > > > (set (match_dup 0) > > > - (lshiftrt:GPR (match_dup 0) (match_dup 2)))] > > > + (lshiftrt:GPR (match_dup 3) (match_dup 2)))] > > > { > > > /* Op2 is a VOIDmode constant, so get the mode size from op1. */ > > > operands[2] = GEN_INT (GET_MODE_BITSIZE (GET_MODE (operands[1])) > > > @@ -1786,12 +1797,13 @@ > > > (define_split > > > [(set (match_operand:DI 0 "register_operand") > > > (and:DI (match_operand:DI 1 "register_operand") > > > - (match_operand:DI 2 "high_mask_shift_operand")))] > > > + (match_operand:DI 2 "high_mask_shift_operand"))) > > > + (clobber (match_operand:DI 3 "register_operand"))] > > > "TARGET_64BIT" > > > - [(set (match_dup 0) > > > + [(set (match_dup 3) > > > (lshiftrt:DI (match_dup 1) (match_dup 2))) > > > (set (match_dup 0) > > > - (ashift:DI (match_dup 0) (match_dup 2)))] > > > + (ashift:DI (match_dup 3) (match_dup 2)))] > > > { > > > operands[2] = GEN_INT (ctz_hwi (INTVAL (operands[2]))); > > > }) > > > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr91635.c b/gcc/testsuite/gcc.c-torture/execute/pr91635.c > > > new file mode 100644 > > > index 00000000000..878a491fc36 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.c-torture/execute/pr91635.c > > > @@ -0,0 +1,57 @@ > > > +/* PR target/91635 */ > > > + > > > +#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 \ > > > + && __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8 > > > +unsigned short b, c; > > > +int u, v, w, x; > > > + > > > +__attribute__ ((noipa)) int > > > +foo (unsigned short c) > > > +{ > > > + c <<= __builtin_add_overflow (-c, -1, &b); > > > + c >>= 1; > > > + return c; > > > +} > > > + > > > +__attribute__ ((noipa)) int > > > +bar (unsigned short b) > > > +{ > > > + b <<= -14 & 15; > > > + b = b >> -~1; > > > + return b; > > > +} > > > + > > > +__attribute__ ((noipa)) int > > > +baz (unsigned short e) > > > +{ > > > + e <<= 1; > > > + e >>= __builtin_add_overflow (8719476735, u, &v); > > > + return e; > > > +} > > > + > > > +__attribute__ ((noipa)) int > > > +qux (unsigned int e) > > > +{ > > > + c = ~1; > > > + c *= e; > > > + c = c >> (-15 & 5); > > > + return c + w + x; > > > +} > > > +#endif > > > + > > > +int > > > +main () > > > +{ > > > +#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 \ > > > + && __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8 > > > + if (foo (0xffff) != 0x7fff) > > > + __builtin_abort (); > > > + if (bar (5) != 5) > > > + __builtin_abort (); > > > + if (baz (~0) != 0x7fff) > > > + __builtin_abort (); > > > + if (qux (2) != 0x7ffe) > > > + __builtin_abort (); > > > +#endif > > > + return 0; > > > +} > > > diff --git a/gcc/testsuite/gcc.target/riscv/shift-shift-4.c b/gcc/testsuite/gcc.target/riscv/shift-shift-4.c > > > new file mode 100644 > > > index 00000000000..72a45ee87ae > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/riscv/shift-shift-4.c > > > @@ -0,0 +1,13 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-march=rv32i -mabi=ilp32 -O2" } */ > > > + > > > +/* One zero-extend shift can be eliminated by modifying the constant in the > > > + greater than test. Started working after modifying the splitter > > > + lshrsi3_zero_extend_3+1 to use a temporary reg for the first split dest. */ > > > +int > > > +sub (int i) > > > +{ > > > + i &= 0x7fffffff; > > > + return i > 0x7f800000; > > > +} > > > +/* { dg-final { scan-assembler-not "srli" } } */ > > > diff --git a/gcc/testsuite/gcc.target/riscv/shift-shift-5.c b/gcc/testsuite/gcc.target/riscv/shift-shift-5.c > > > new file mode 100644 > > > index 00000000000..5b2ae89a471 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/riscv/shift-shift-5.c > > > @@ -0,0 +1,16 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-march=rv64gc -mabi=lp64d -O2" } */ > > > + > > > +/* Fails if lshrsi3_zero_extend_3+1 uses a temp reg which has no REG_DEST > > > + note. */ > > > +unsigned long > > > +sub (long l) > > > +{ > > > + union u { > > > + struct s { int a : 19; unsigned int b : 13; int x; } s; > > > + long l; > > > + } u; > > > + u.l = l; > > > + return u.s.b; > > > +} > > > +/* { dg-final { scan-assembler "srliw" } } */ > > > -- > > > 2.17.1 > > > > > > > -- > Richard Biener <rguenther@suse.de> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Felix Imendörffer; HRB 247165 (AG München)
On Wed, Sep 18, 2019 at 3:27 AM Kito Cheng <kito.cheng@gmail.com> wrote: > On Wed, Sep 18, 2019 at 6:25 PM Richard Biener <rguenther@suse.de> wrote: > > Since it is target specific and for non-primary/secondary targets > > it's the RISC-V maintainers call whether to allow backporting this. > > Generally wrong-code issues can be backported even if they are not > > regressions if the chance they introduce other issues is low. Speaking as a RISC-V maintainer, I'd like to see it backported. That is why I left the bug report open. Jim
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 78260fcf6fd..744a027a1b7 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -1051,7 +1051,9 @@ "@ # lwu\t%0,%1" - "&& reload_completed && REG_P (operands[1])" + "&& reload_completed + && REG_P (operands[1]) + && !paradoxical_subreg_p (operands[0])" [(set (match_dup 0) (ashift:DI (match_dup 1) (const_int 32))) (set (match_dup 0) @@ -1068,7 +1070,9 @@ "@ # lhu\t%0,%1" - "&& reload_completed && REG_P (operands[1])" + "&& reload_completed + && REG_P (operands[1]) + && !paradoxical_subreg_p (operands[0])" [(set (match_dup 0) (ashift:GPR (match_dup 1) (match_dup 2))) (set (match_dup 0) @@ -1117,7 +1121,9 @@ "@ # l<SHORT:size>\t%0,%1" - "&& reload_completed && REG_P (operands[1])" + "&& reload_completed + && REG_P (operands[1]) + && !paradoxical_subreg_p (operands[0])" [(set (match_dup 0) (ashift:SI (match_dup 1) (match_dup 2))) (set (match_dup 0) (ashiftrt:SI (match_dup 0) (match_dup 2)))] { @@ -1766,15 +1772,20 @@ ;; Handle AND with 2^N-1 for N from 12 to XLEN. This can be split into ;; two logical shifts. Otherwise it requires 3 instructions: lui, ;; xor/addi/srli, and. + +;; Generating a temporary for the shift output gives better combiner results; +;; and also fixes a problem where op0 could be a paradoxical reg and shifting +;; by amounts larger than the size of the SUBREG_REG doesn't work. (define_split [(set (match_operand:GPR 0 "register_operand") (and:GPR (match_operand:GPR 1 "register_operand") - (match_operand:GPR 2 "p2m1_shift_operand")))] + (match_operand:GPR 2 "p2m1_shift_operand"))) + (clobber (match_operand:GPR 3 "register_operand"))] "" - [(set (match_dup 0) + [(set (match_dup 3) (ashift:GPR (match_dup 1) (match_dup 2))) (set (match_dup 0) - (lshiftrt:GPR (match_dup 0) (match_dup 2)))] + (lshiftrt:GPR (match_dup 3) (match_dup 2)))] { /* Op2 is a VOIDmode constant, so get the mode size from op1. */ operands[2] = GEN_INT (GET_MODE_BITSIZE (GET_MODE (operands[1])) @@ -1786,12 +1797,13 @@ (define_split [(set (match_operand:DI 0 "register_operand") (and:DI (match_operand:DI 1 "register_operand") - (match_operand:DI 2 "high_mask_shift_operand")))] + (match_operand:DI 2 "high_mask_shift_operand"))) + (clobber (match_operand:DI 3 "register_operand"))] "TARGET_64BIT" - [(set (match_dup 0) + [(set (match_dup 3) (lshiftrt:DI (match_dup 1) (match_dup 2))) (set (match_dup 0) - (ashift:DI (match_dup 0) (match_dup 2)))] + (ashift:DI (match_dup 3) (match_dup 2)))] { operands[2] = GEN_INT (ctz_hwi (INTVAL (operands[2]))); }) diff --git a/gcc/testsuite/gcc.c-torture/execute/pr91635.c b/gcc/testsuite/gcc.c-torture/execute/pr91635.c new file mode 100644 index 00000000000..878a491fc36 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr91635.c @@ -0,0 +1,57 @@ +/* PR target/91635 */ + +#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 \ + && __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8 +unsigned short b, c; +int u, v, w, x; + +__attribute__ ((noipa)) int +foo (unsigned short c) +{ + c <<= __builtin_add_overflow (-c, -1, &b); + c >>= 1; + return c; +} + +__attribute__ ((noipa)) int +bar (unsigned short b) +{ + b <<= -14 & 15; + b = b >> -~1; + return b; +} + +__attribute__ ((noipa)) int +baz (unsigned short e) +{ + e <<= 1; + e >>= __builtin_add_overflow (8719476735, u, &v); + return e; +} + +__attribute__ ((noipa)) int +qux (unsigned int e) +{ + c = ~1; + c *= e; + c = c >> (-15 & 5); + return c + w + x; +} +#endif + +int +main () +{ +#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 \ + && __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8 + if (foo (0xffff) != 0x7fff) + __builtin_abort (); + if (bar (5) != 5) + __builtin_abort (); + if (baz (~0) != 0x7fff) + __builtin_abort (); + if (qux (2) != 0x7ffe) + __builtin_abort (); +#endif + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/shift-shift-4.c b/gcc/testsuite/gcc.target/riscv/shift-shift-4.c new file mode 100644 index 00000000000..72a45ee87ae --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/shift-shift-4.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32i -mabi=ilp32 -O2" } */ + +/* One zero-extend shift can be eliminated by modifying the constant in the + greater than test. Started working after modifying the splitter + lshrsi3_zero_extend_3+1 to use a temporary reg for the first split dest. */ +int +sub (int i) +{ + i &= 0x7fffffff; + return i > 0x7f800000; +} +/* { dg-final { scan-assembler-not "srli" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/shift-shift-5.c b/gcc/testsuite/gcc.target/riscv/shift-shift-5.c new file mode 100644 index 00000000000..5b2ae89a471 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/shift-shift-5.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc -mabi=lp64d -O2" } */ + +/* Fails if lshrsi3_zero_extend_3+1 uses a temp reg which has no REG_DEST + note. */ +unsigned long +sub (long l) +{ + union u { + struct s { int a : 19; unsigned int b : 13; int x; } s; + long l; + } u; + u.l = l; + return u.s.b; +} +/* { dg-final { scan-assembler "srliw" } } */