diff mbox series

RISC-V: Fix bad insn splits with paradoxical subregs.

Message ID 20190905203409.759-1-jimw@sifive.com
State New
Headers show
Series RISC-V: Fix bad insn splits with paradoxical subregs. | expand

Commit Message

Jim Wilson Sept. 5, 2019, 8:34 p.m. UTC
Shifting by more than the size of a SUBREG_REG doesn't work, so we either
need to disable splits if an input is paradoxical, or else we need to
generate a clean temporary for intermediate results.

This was tested with rv32i/newlib and rv64gc/linux cross builds and checks.
There were no regressions.  The new pr91635.c testcase works with the patch
and fails without it.  Also, code size for libc and libstdc++ was checked
and is smaller or equal sized with the patch, ignoring alignment padding.
The shift-shift-4.c testcase gives better code size with this patch.
The shift-shift-5.c testcase gave worse code size with a draft version
of this patch and is OK with the final version of this patch.

Jakub wrote the first version of this patch, so gets primary credit for it.

Committed.

Jim

	gcc/
	PR target/91635
	* config/riscv/riscv.md (zero_extendsidi2, zero_extendhi<GPR:mode>2,
	extend<SHORT:mode><SUPERQI:mode>2): Don't split if
	paradoxical_subreg_p (operands[0]).
	(*lshrsi3_zero_extend_3+1, *lshrsi3_zero_extend_3+2): Add clobber and
	use as intermediate value.

	gcc/testsuite/
	PR target/91635
	* gcc.c-torture/execute/pr91635.c: New test.
	* gcc.target/riscv/shift-shift-4.c: New test.
	* gcc.target/riscv/shift-shift-5.c: New test.
---
 gcc/config/riscv/riscv.md                     | 30 +++++++---
 gcc/testsuite/gcc.c-torture/execute/pr91635.c | 57 +++++++++++++++++++
 .../gcc.target/riscv/shift-shift-4.c          | 13 +++++
 .../gcc.target/riscv/shift-shift-5.c          | 16 ++++++
 4 files changed, 107 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr91635.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-5.c

Comments

Segher Boessenkool Sept. 5, 2019, 10:03 p.m. UTC | #1
Hi Jim,

On Thu, Sep 05, 2019 at 01:34:09PM -0700, Jim Wilson wrote:
> Shifting by more than the size of a SUBREG_REG doesn't work, so we either
> need to disable splits if an input is paradoxical, or else we need to
> generate a clean temporary for intermediate results.

My big question is why do other targets not have this problem?  Or what
is it they do differently?


Segher
Jim Wilson Sept. 5, 2019, 11:40 p.m. UTC | #2
On Thu, Sep 5, 2019 at 3:03 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> My big question is why do other targets not have this problem?  Or what
> is it they do differently?

RISC-V doesn't have convenient sign/zero extend instructions.  We also
have a 12-bit immediate, whereas most other RISCs have a 16-bit
immediate, so we can't easily load a constant and mask to zero-extend
an HImode value.  Thus we have a fair amount of splitters that emit
two shifts for zero/sign extend operations, or try to optimize various
combinations of shifts or AND operations that can map to instructions
that we do have.  We also have two splitters that were reusing input
registers for intermediate values when they should have been using a
clobber to get a temporary.  The failure case is when we end up
generating something like (lshiftrt:DI (subreg:DI (reg:SI)) (const_int
48)) after a 48 bit left shift with a paradoxical reg input reused for
an intermediate result, and the rtl simplifier says that must be zero
because an SImode reg doesn't have any valid bits in the upper
32-bits.  This is may also be tied into the fact that we have
WORD_REGISTER_OPERATIONS defined.  Some funny stuff can happen with
that defined, and I think most of the other major ports no longer
define it.  I haven't yet been tempted enough to try to remove that
define from the RISC-V port yet, but suspect it will eventually be
necessary.  I think the WORD_REGISTER_OPERATIONS define is so badly
misused by the optimizer nowadays that it isn't worth the trouble
anymore.  But removing it will be a lot of work to check for
performance and code size regressions, so I haven't looked at it yet,
and may not for a long while.

The other three patterns that were fixed by adding a check for
paradoxical regs have no known failure case, and were only fixed for
completeness.  These are hypothetically necessary changes, and may not
in fact be actually necessary, but they appear to be completely
harmless so I saw no harm in adding them.  In the testing I did, I
never saw a paradoxical reg here.  But that isn't proof that a
paradoxical reg can never occur here.

Jim
Segher Boessenkool Sept. 6, 2019, 12:38 p.m. UTC | #3
Hi Jim,

On Thu, Sep 05, 2019 at 04:40:56PM -0700, Jim Wilson wrote:
> On Thu, Sep 5, 2019 at 3:03 PM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > My big question is why do other targets not have this problem?  Or what
> > is it they do differently?
> 
> RISC-V doesn't have convenient sign/zero extend instructions.  We also
> have a 12-bit immediate, whereas most other RISCs have a 16-bit
> immediate, so we can't easily load a constant and mask to zero-extend
> an HImode value.  Thus we have a fair amount of splitters that emit
> two shifts for zero/sign extend operations, or try to optimize various
> combinations of shifts or AND operations that can map to instructions
> that we do have.

There are other targets in that same boat, it's not *so* unusual.

> We also have two splitters that were reusing input
> registers for intermediate values when they should have been using a
> clobber to get a temporary.

Ah, that is just it I guess.  And the problem happens with *pseudos*,
which have just their one mode always (but can be used in another mode).
And then things go wrong if you use one in a bigger mode than it is.

Hrm I wonder if we could (or should) protect against this kind of thing
in generic code.  combine could have seen you did something undefined I
suppose.

> The failure case is when we end up
> generating something like (lshiftrt:DI (subreg:DI (reg:SI)) (const_int
> 48)) after a 48 bit left shift with a paradoxical reg input reused for
> an intermediate result, and the rtl simplifier says that must be zero
> because an SImode reg doesn't have any valid bits in the upper
> 32-bits.

Since it's a pseudo, not a hard reg, yeah.

> This is may also be tied into the fact that we have
> WORD_REGISTER_OPERATIONS defined.  Some funny stuff can happen with
> that defined, and I think most of the other major ports no longer
> define it.

Yeah.  Not only because of problems, but also because it doesn't help
much, and hurts in cases too, so it depends what way the balance swings
for your target.

> I haven't yet been tempted enough to try to remove that
> define from the RISC-V port yet, but suspect it will eventually be
> necessary.  I think the WORD_REGISTER_OPERATIONS define is so badly
> misused by the optimizer nowadays that it isn't worth the trouble
> anymore.  But removing it will be a lot of work to check for
> performance and code size regressions, so I haven't looked at it yet,
> and may not for a long while.

I tested it yesterday with builds of Linux.  Compiler from Sunday,
kernel tree a few weeks old.  riscv32 and riscv64, both with their
defconfigs.  Removing WORD_REGISTER_OPERATIONS results in 0.001%
smaller code for riscv32, and 0.036% larger code for riscv64.  For
the latter, shift/shift combos were often involved.  But it is
pretty minimal either way :-)

> The other three patterns that were fixed by adding a check for
> paradoxical regs have no known failure case, and were only fixed for
> completeness.  These are hypothetically necessary changes, and may not
> in fact be actually necessary, but they appear to be completely
> harmless so I saw no harm in adding them.  In the testing I did, I
> never saw a paradoxical reg here.  But that isn't proof that a
> paradoxical reg can never occur here.

Thanks for the explanations, that clears things up :-)


Segher
Kito Cheng Sept. 18, 2019, 10:23 a.m. UTC | #4
Hi Jakub, Richard:

This commit is fixing wrong code gen for RISC-V, does it OK to
backport to GCC 9 branch?

On Fri, Sep 6, 2019 at 4:34 AM Jim Wilson <jimw@sifive.com> wrote:
>
> Shifting by more than the size of a SUBREG_REG doesn't work, so we either
> need to disable splits if an input is paradoxical, or else we need to
> generate a clean temporary for intermediate results.
>
> This was tested with rv32i/newlib and rv64gc/linux cross builds and checks.
> There were no regressions.  The new pr91635.c testcase works with the patch
> and fails without it.  Also, code size for libc and libstdc++ was checked
> and is smaller or equal sized with the patch, ignoring alignment padding.
> The shift-shift-4.c testcase gives better code size with this patch.
> The shift-shift-5.c testcase gave worse code size with a draft version
> of this patch and is OK with the final version of this patch.
>
> Jakub wrote the first version of this patch, so gets primary credit for it.
>
> Committed.
>
> Jim
>
>         gcc/
>         PR target/91635
>         * config/riscv/riscv.md (zero_extendsidi2, zero_extendhi<GPR:mode>2,
>         extend<SHORT:mode><SUPERQI:mode>2): Don't split if
>         paradoxical_subreg_p (operands[0]).
>         (*lshrsi3_zero_extend_3+1, *lshrsi3_zero_extend_3+2): Add clobber and
>         use as intermediate value.
>
>         gcc/testsuite/
>         PR target/91635
>         * gcc.c-torture/execute/pr91635.c: New test.
>         * gcc.target/riscv/shift-shift-4.c: New test.
>         * gcc.target/riscv/shift-shift-5.c: New test.
> ---
>  gcc/config/riscv/riscv.md                     | 30 +++++++---
>  gcc/testsuite/gcc.c-torture/execute/pr91635.c | 57 +++++++++++++++++++
>  .../gcc.target/riscv/shift-shift-4.c          | 13 +++++
>  .../gcc.target/riscv/shift-shift-5.c          | 16 ++++++
>  4 files changed, 107 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr91635.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-5.c
>
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index 78260fcf6fd..744a027a1b7 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -1051,7 +1051,9 @@
>    "@
>     #
>     lwu\t%0,%1"
> -  "&& reload_completed && REG_P (operands[1])"
> +  "&& reload_completed
> +   && REG_P (operands[1])
> +   && !paradoxical_subreg_p (operands[0])"
>    [(set (match_dup 0)
>         (ashift:DI (match_dup 1) (const_int 32)))
>     (set (match_dup 0)
> @@ -1068,7 +1070,9 @@
>    "@
>     #
>     lhu\t%0,%1"
> -  "&& reload_completed && REG_P (operands[1])"
> +  "&& reload_completed
> +   && REG_P (operands[1])
> +   && !paradoxical_subreg_p (operands[0])"
>    [(set (match_dup 0)
>         (ashift:GPR (match_dup 1) (match_dup 2)))
>     (set (match_dup 0)
> @@ -1117,7 +1121,9 @@
>    "@
>     #
>     l<SHORT:size>\t%0,%1"
> -  "&& reload_completed && REG_P (operands[1])"
> +  "&& reload_completed
> +   && REG_P (operands[1])
> +   && !paradoxical_subreg_p (operands[0])"
>    [(set (match_dup 0) (ashift:SI (match_dup 1) (match_dup 2)))
>     (set (match_dup 0) (ashiftrt:SI (match_dup 0) (match_dup 2)))]
>  {
> @@ -1766,15 +1772,20 @@
>  ;; Handle AND with 2^N-1 for N from 12 to XLEN.  This can be split into
>  ;; two logical shifts.  Otherwise it requires 3 instructions: lui,
>  ;; xor/addi/srli, and.
> +
> +;; Generating a temporary for the shift output gives better combiner results;
> +;; and also fixes a problem where op0 could be a paradoxical reg and shifting
> +;; by amounts larger than the size of the SUBREG_REG doesn't work.
>  (define_split
>    [(set (match_operand:GPR 0 "register_operand")
>         (and:GPR (match_operand:GPR 1 "register_operand")
> -                (match_operand:GPR 2 "p2m1_shift_operand")))]
> +                (match_operand:GPR 2 "p2m1_shift_operand")))
> +   (clobber (match_operand:GPR 3 "register_operand"))]
>    ""
> - [(set (match_dup 0)
> + [(set (match_dup 3)
>         (ashift:GPR (match_dup 1) (match_dup 2)))
>    (set (match_dup 0)
> -       (lshiftrt:GPR (match_dup 0) (match_dup 2)))]
> +       (lshiftrt:GPR (match_dup 3) (match_dup 2)))]
>  {
>    /* Op2 is a VOIDmode constant, so get the mode size from op1.  */
>    operands[2] = GEN_INT (GET_MODE_BITSIZE (GET_MODE (operands[1]))
> @@ -1786,12 +1797,13 @@
>  (define_split
>    [(set (match_operand:DI 0 "register_operand")
>         (and:DI (match_operand:DI 1 "register_operand")
> -               (match_operand:DI 2 "high_mask_shift_operand")))]
> +               (match_operand:DI 2 "high_mask_shift_operand")))
> +   (clobber (match_operand:DI 3 "register_operand"))]
>    "TARGET_64BIT"
> -  [(set (match_dup 0)
> +  [(set (match_dup 3)
>         (lshiftrt:DI (match_dup 1) (match_dup 2)))
>     (set (match_dup 0)
> -       (ashift:DI (match_dup 0) (match_dup 2)))]
> +       (ashift:DI (match_dup 3) (match_dup 2)))]
>  {
>    operands[2] = GEN_INT (ctz_hwi (INTVAL (operands[2])));
>  })
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr91635.c b/gcc/testsuite/gcc.c-torture/execute/pr91635.c
> new file mode 100644
> index 00000000000..878a491fc36
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr91635.c
> @@ -0,0 +1,57 @@
> +/* PR target/91635 */
> +
> +#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 \
> +    && __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8
> +unsigned short b, c;
> +int u, v, w, x;
> +
> +__attribute__ ((noipa)) int
> +foo (unsigned short c)
> +{
> +  c <<= __builtin_add_overflow (-c, -1, &b);
> +  c >>= 1;
> +  return c;
> +}
> +
> +__attribute__ ((noipa)) int
> +bar (unsigned short b)
> +{
> +  b <<= -14 & 15;
> +  b = b >> -~1;
> +  return b;
> +}
> +
> +__attribute__ ((noipa)) int
> +baz (unsigned short e)
> +{
> +  e <<= 1;
> +  e >>= __builtin_add_overflow (8719476735, u, &v);
> +  return e;
> +}
> +
> +__attribute__ ((noipa)) int
> +qux (unsigned int e)
> +{
> +  c = ~1;
> +  c *= e;
> +  c = c >> (-15 & 5);
> +  return c + w + x;
> +}
> +#endif
> +
> +int
> +main ()
> +{
> +#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 \
> +    && __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8
> +  if (foo (0xffff) != 0x7fff)
> +    __builtin_abort ();
> +  if (bar (5) != 5)
> +    __builtin_abort ();
> +  if (baz (~0) != 0x7fff)
> +    __builtin_abort ();
> +  if (qux (2) != 0x7ffe)
> +    __builtin_abort ();
> +#endif
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/shift-shift-4.c b/gcc/testsuite/gcc.target/riscv/shift-shift-4.c
> new file mode 100644
> index 00000000000..72a45ee87ae
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/shift-shift-4.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32i -mabi=ilp32 -O2" } */
> +
> +/* One zero-extend shift can be eliminated by modifying the constant in the
> +   greater than test.  Started working after modifying the splitter
> +   lshrsi3_zero_extend_3+1 to use a temporary reg for the first split dest.  */
> +int
> +sub (int i)
> +{
> +  i &= 0x7fffffff;
> +  return i > 0x7f800000;
> +}
> +/* { dg-final { scan-assembler-not "srli" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/shift-shift-5.c b/gcc/testsuite/gcc.target/riscv/shift-shift-5.c
> new file mode 100644
> index 00000000000..5b2ae89a471
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/shift-shift-5.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64d -O2" } */
> +
> +/* Fails if lshrsi3_zero_extend_3+1 uses a temp reg which has no REG_DEST
> +   note.  */
> +unsigned long
> +sub (long l)
> +{
> +  union u {
> +    struct s { int a : 19; unsigned int b : 13; int x; } s;
> +    long l;
> +  } u;
> +  u.l = l;
> +  return u.s.b;
> +}
> +/* { dg-final { scan-assembler "srliw" } } */
> --
> 2.17.1
>
Richard Biener Sept. 18, 2019, 10:25 a.m. UTC | #5
On Wed, 18 Sep 2019, Kito Cheng wrote:

> Hi Jakub, Richard:
> 
> This commit is fixing wrong code gen for RISC-V, does it OK to
> backport to GCC 9 branch?

Since it is target specific and for non-primary/secondary targets
it's the RISC-V maintainers call whether to allow backporting this.
Generally wrong-code issues can be backported even if they are not
regressions if the chance they introduce other issues is low.

Richard.

> On Fri, Sep 6, 2019 at 4:34 AM Jim Wilson <jimw@sifive.com> wrote:
> >
> > Shifting by more than the size of a SUBREG_REG doesn't work, so we either
> > need to disable splits if an input is paradoxical, or else we need to
> > generate a clean temporary for intermediate results.
> >
> > This was tested with rv32i/newlib and rv64gc/linux cross builds and checks.
> > There were no regressions.  The new pr91635.c testcase works with the patch
> > and fails without it.  Also, code size for libc and libstdc++ was checked
> > and is smaller or equal sized with the patch, ignoring alignment padding.
> > The shift-shift-4.c testcase gives better code size with this patch.
> > The shift-shift-5.c testcase gave worse code size with a draft version
> > of this patch and is OK with the final version of this patch.
> >
> > Jakub wrote the first version of this patch, so gets primary credit for it.
> >
> > Committed.
> >
> > Jim
> >
> >         gcc/
> >         PR target/91635
> >         * config/riscv/riscv.md (zero_extendsidi2, zero_extendhi<GPR:mode>2,
> >         extend<SHORT:mode><SUPERQI:mode>2): Don't split if
> >         paradoxical_subreg_p (operands[0]).
> >         (*lshrsi3_zero_extend_3+1, *lshrsi3_zero_extend_3+2): Add clobber and
> >         use as intermediate value.
> >
> >         gcc/testsuite/
> >         PR target/91635
> >         * gcc.c-torture/execute/pr91635.c: New test.
> >         * gcc.target/riscv/shift-shift-4.c: New test.
> >         * gcc.target/riscv/shift-shift-5.c: New test.
> > ---
> >  gcc/config/riscv/riscv.md                     | 30 +++++++---
> >  gcc/testsuite/gcc.c-torture/execute/pr91635.c | 57 +++++++++++++++++++
> >  .../gcc.target/riscv/shift-shift-4.c          | 13 +++++
> >  .../gcc.target/riscv/shift-shift-5.c          | 16 ++++++
> >  4 files changed, 107 insertions(+), 9 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr91635.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-5.c
> >
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index 78260fcf6fd..744a027a1b7 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -1051,7 +1051,9 @@
> >    "@
> >     #
> >     lwu\t%0,%1"
> > -  "&& reload_completed && REG_P (operands[1])"
> > +  "&& reload_completed
> > +   && REG_P (operands[1])
> > +   && !paradoxical_subreg_p (operands[0])"
> >    [(set (match_dup 0)
> >         (ashift:DI (match_dup 1) (const_int 32)))
> >     (set (match_dup 0)
> > @@ -1068,7 +1070,9 @@
> >    "@
> >     #
> >     lhu\t%0,%1"
> > -  "&& reload_completed && REG_P (operands[1])"
> > +  "&& reload_completed
> > +   && REG_P (operands[1])
> > +   && !paradoxical_subreg_p (operands[0])"
> >    [(set (match_dup 0)
> >         (ashift:GPR (match_dup 1) (match_dup 2)))
> >     (set (match_dup 0)
> > @@ -1117,7 +1121,9 @@
> >    "@
> >     #
> >     l<SHORT:size>\t%0,%1"
> > -  "&& reload_completed && REG_P (operands[1])"
> > +  "&& reload_completed
> > +   && REG_P (operands[1])
> > +   && !paradoxical_subreg_p (operands[0])"
> >    [(set (match_dup 0) (ashift:SI (match_dup 1) (match_dup 2)))
> >     (set (match_dup 0) (ashiftrt:SI (match_dup 0) (match_dup 2)))]
> >  {
> > @@ -1766,15 +1772,20 @@
> >  ;; Handle AND with 2^N-1 for N from 12 to XLEN.  This can be split into
> >  ;; two logical shifts.  Otherwise it requires 3 instructions: lui,
> >  ;; xor/addi/srli, and.
> > +
> > +;; Generating a temporary for the shift output gives better combiner results;
> > +;; and also fixes a problem where op0 could be a paradoxical reg and shifting
> > +;; by amounts larger than the size of the SUBREG_REG doesn't work.
> >  (define_split
> >    [(set (match_operand:GPR 0 "register_operand")
> >         (and:GPR (match_operand:GPR 1 "register_operand")
> > -                (match_operand:GPR 2 "p2m1_shift_operand")))]
> > +                (match_operand:GPR 2 "p2m1_shift_operand")))
> > +   (clobber (match_operand:GPR 3 "register_operand"))]
> >    ""
> > - [(set (match_dup 0)
> > + [(set (match_dup 3)
> >         (ashift:GPR (match_dup 1) (match_dup 2)))
> >    (set (match_dup 0)
> > -       (lshiftrt:GPR (match_dup 0) (match_dup 2)))]
> > +       (lshiftrt:GPR (match_dup 3) (match_dup 2)))]
> >  {
> >    /* Op2 is a VOIDmode constant, so get the mode size from op1.  */
> >    operands[2] = GEN_INT (GET_MODE_BITSIZE (GET_MODE (operands[1]))
> > @@ -1786,12 +1797,13 @@
> >  (define_split
> >    [(set (match_operand:DI 0 "register_operand")
> >         (and:DI (match_operand:DI 1 "register_operand")
> > -               (match_operand:DI 2 "high_mask_shift_operand")))]
> > +               (match_operand:DI 2 "high_mask_shift_operand")))
> > +   (clobber (match_operand:DI 3 "register_operand"))]
> >    "TARGET_64BIT"
> > -  [(set (match_dup 0)
> > +  [(set (match_dup 3)
> >         (lshiftrt:DI (match_dup 1) (match_dup 2)))
> >     (set (match_dup 0)
> > -       (ashift:DI (match_dup 0) (match_dup 2)))]
> > +       (ashift:DI (match_dup 3) (match_dup 2)))]
> >  {
> >    operands[2] = GEN_INT (ctz_hwi (INTVAL (operands[2])));
> >  })
> > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr91635.c b/gcc/testsuite/gcc.c-torture/execute/pr91635.c
> > new file mode 100644
> > index 00000000000..878a491fc36
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/execute/pr91635.c
> > @@ -0,0 +1,57 @@
> > +/* PR target/91635 */
> > +
> > +#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 \
> > +    && __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8
> > +unsigned short b, c;
> > +int u, v, w, x;
> > +
> > +__attribute__ ((noipa)) int
> > +foo (unsigned short c)
> > +{
> > +  c <<= __builtin_add_overflow (-c, -1, &b);
> > +  c >>= 1;
> > +  return c;
> > +}
> > +
> > +__attribute__ ((noipa)) int
> > +bar (unsigned short b)
> > +{
> > +  b <<= -14 & 15;
> > +  b = b >> -~1;
> > +  return b;
> > +}
> > +
> > +__attribute__ ((noipa)) int
> > +baz (unsigned short e)
> > +{
> > +  e <<= 1;
> > +  e >>= __builtin_add_overflow (8719476735, u, &v);
> > +  return e;
> > +}
> > +
> > +__attribute__ ((noipa)) int
> > +qux (unsigned int e)
> > +{
> > +  c = ~1;
> > +  c *= e;
> > +  c = c >> (-15 & 5);
> > +  return c + w + x;
> > +}
> > +#endif
> > +
> > +int
> > +main ()
> > +{
> > +#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 \
> > +    && __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8
> > +  if (foo (0xffff) != 0x7fff)
> > +    __builtin_abort ();
> > +  if (bar (5) != 5)
> > +    __builtin_abort ();
> > +  if (baz (~0) != 0x7fff)
> > +    __builtin_abort ();
> > +  if (qux (2) != 0x7ffe)
> > +    __builtin_abort ();
> > +#endif
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/shift-shift-4.c b/gcc/testsuite/gcc.target/riscv/shift-shift-4.c
> > new file mode 100644
> > index 00000000000..72a45ee87ae
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/shift-shift-4.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv32i -mabi=ilp32 -O2" } */
> > +
> > +/* One zero-extend shift can be eliminated by modifying the constant in the
> > +   greater than test.  Started working after modifying the splitter
> > +   lshrsi3_zero_extend_3+1 to use a temporary reg for the first split dest.  */
> > +int
> > +sub (int i)
> > +{
> > +  i &= 0x7fffffff;
> > +  return i > 0x7f800000;
> > +}
> > +/* { dg-final { scan-assembler-not "srli" } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/shift-shift-5.c b/gcc/testsuite/gcc.target/riscv/shift-shift-5.c
> > new file mode 100644
> > index 00000000000..5b2ae89a471
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/shift-shift-5.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64d -O2" } */
> > +
> > +/* Fails if lshrsi3_zero_extend_3+1 uses a temp reg which has no REG_DEST
> > +   note.  */
> > +unsigned long
> > +sub (long l)
> > +{
> > +  union u {
> > +    struct s { int a : 19; unsigned int b : 13; int x; } s;
> > +    long l;
> > +  } u;
> > +  u.l = l;
> > +  return u.s.b;
> > +}
> > +/* { dg-final { scan-assembler "srliw" } } */
> > --
> > 2.17.1
> >
>
Kito Cheng Sept. 18, 2019, 10:26 a.m. UTC | #6
Hi Richard:

Got it, thanks :)

On Wed, Sep 18, 2019 at 6:25 PM Richard Biener <rguenther@suse.de> wrote:
>
> On Wed, 18 Sep 2019, Kito Cheng wrote:
>
> > Hi Jakub, Richard:
> >
> > This commit is fixing wrong code gen for RISC-V, does it OK to
> > backport to GCC 9 branch?
>
> Since it is target specific and for non-primary/secondary targets
> it's the RISC-V maintainers call whether to allow backporting this.
> Generally wrong-code issues can be backported even if they are not
> regressions if the chance they introduce other issues is low.
>
> Richard.
>
> > On Fri, Sep 6, 2019 at 4:34 AM Jim Wilson <jimw@sifive.com> wrote:
> > >
> > > Shifting by more than the size of a SUBREG_REG doesn't work, so we either
> > > need to disable splits if an input is paradoxical, or else we need to
> > > generate a clean temporary for intermediate results.
> > >
> > > This was tested with rv32i/newlib and rv64gc/linux cross builds and checks.
> > > There were no regressions.  The new pr91635.c testcase works with the patch
> > > and fails without it.  Also, code size for libc and libstdc++ was checked
> > > and is smaller or equal sized with the patch, ignoring alignment padding.
> > > The shift-shift-4.c testcase gives better code size with this patch.
> > > The shift-shift-5.c testcase gave worse code size with a draft version
> > > of this patch and is OK with the final version of this patch.
> > >
> > > Jakub wrote the first version of this patch, so gets primary credit for it.
> > >
> > > Committed.
> > >
> > > Jim
> > >
> > >         gcc/
> > >         PR target/91635
> > >         * config/riscv/riscv.md (zero_extendsidi2, zero_extendhi<GPR:mode>2,
> > >         extend<SHORT:mode><SUPERQI:mode>2): Don't split if
> > >         paradoxical_subreg_p (operands[0]).
> > >         (*lshrsi3_zero_extend_3+1, *lshrsi3_zero_extend_3+2): Add clobber and
> > >         use as intermediate value.
> > >
> > >         gcc/testsuite/
> > >         PR target/91635
> > >         * gcc.c-torture/execute/pr91635.c: New test.
> > >         * gcc.target/riscv/shift-shift-4.c: New test.
> > >         * gcc.target/riscv/shift-shift-5.c: New test.
> > > ---
> > >  gcc/config/riscv/riscv.md                     | 30 +++++++---
> > >  gcc/testsuite/gcc.c-torture/execute/pr91635.c | 57 +++++++++++++++++++
> > >  .../gcc.target/riscv/shift-shift-4.c          | 13 +++++
> > >  .../gcc.target/riscv/shift-shift-5.c          | 16 ++++++
> > >  4 files changed, 107 insertions(+), 9 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr91635.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-4.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-5.c
> > >
> > > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > > index 78260fcf6fd..744a027a1b7 100644
> > > --- a/gcc/config/riscv/riscv.md
> > > +++ b/gcc/config/riscv/riscv.md
> > > @@ -1051,7 +1051,9 @@
> > >    "@
> > >     #
> > >     lwu\t%0,%1"
> > > -  "&& reload_completed && REG_P (operands[1])"
> > > +  "&& reload_completed
> > > +   && REG_P (operands[1])
> > > +   && !paradoxical_subreg_p (operands[0])"
> > >    [(set (match_dup 0)
> > >         (ashift:DI (match_dup 1) (const_int 32)))
> > >     (set (match_dup 0)
> > > @@ -1068,7 +1070,9 @@
> > >    "@
> > >     #
> > >     lhu\t%0,%1"
> > > -  "&& reload_completed && REG_P (operands[1])"
> > > +  "&& reload_completed
> > > +   && REG_P (operands[1])
> > > +   && !paradoxical_subreg_p (operands[0])"
> > >    [(set (match_dup 0)
> > >         (ashift:GPR (match_dup 1) (match_dup 2)))
> > >     (set (match_dup 0)
> > > @@ -1117,7 +1121,9 @@
> > >    "@
> > >     #
> > >     l<SHORT:size>\t%0,%1"
> > > -  "&& reload_completed && REG_P (operands[1])"
> > > +  "&& reload_completed
> > > +   && REG_P (operands[1])
> > > +   && !paradoxical_subreg_p (operands[0])"
> > >    [(set (match_dup 0) (ashift:SI (match_dup 1) (match_dup 2)))
> > >     (set (match_dup 0) (ashiftrt:SI (match_dup 0) (match_dup 2)))]
> > >  {
> > > @@ -1766,15 +1772,20 @@
> > >  ;; Handle AND with 2^N-1 for N from 12 to XLEN.  This can be split into
> > >  ;; two logical shifts.  Otherwise it requires 3 instructions: lui,
> > >  ;; xor/addi/srli, and.
> > > +
> > > +;; Generating a temporary for the shift output gives better combiner results;
> > > +;; and also fixes a problem where op0 could be a paradoxical reg and shifting
> > > +;; by amounts larger than the size of the SUBREG_REG doesn't work.
> > >  (define_split
> > >    [(set (match_operand:GPR 0 "register_operand")
> > >         (and:GPR (match_operand:GPR 1 "register_operand")
> > > -                (match_operand:GPR 2 "p2m1_shift_operand")))]
> > > +                (match_operand:GPR 2 "p2m1_shift_operand")))
> > > +   (clobber (match_operand:GPR 3 "register_operand"))]
> > >    ""
> > > - [(set (match_dup 0)
> > > + [(set (match_dup 3)
> > >         (ashift:GPR (match_dup 1) (match_dup 2)))
> > >    (set (match_dup 0)
> > > -       (lshiftrt:GPR (match_dup 0) (match_dup 2)))]
> > > +       (lshiftrt:GPR (match_dup 3) (match_dup 2)))]
> > >  {
> > >    /* Op2 is a VOIDmode constant, so get the mode size from op1.  */
> > >    operands[2] = GEN_INT (GET_MODE_BITSIZE (GET_MODE (operands[1]))
> > > @@ -1786,12 +1797,13 @@
> > >  (define_split
> > >    [(set (match_operand:DI 0 "register_operand")
> > >         (and:DI (match_operand:DI 1 "register_operand")
> > > -               (match_operand:DI 2 "high_mask_shift_operand")))]
> > > +               (match_operand:DI 2 "high_mask_shift_operand")))
> > > +   (clobber (match_operand:DI 3 "register_operand"))]
> > >    "TARGET_64BIT"
> > > -  [(set (match_dup 0)
> > > +  [(set (match_dup 3)
> > >         (lshiftrt:DI (match_dup 1) (match_dup 2)))
> > >     (set (match_dup 0)
> > > -       (ashift:DI (match_dup 0) (match_dup 2)))]
> > > +       (ashift:DI (match_dup 3) (match_dup 2)))]
> > >  {
> > >    operands[2] = GEN_INT (ctz_hwi (INTVAL (operands[2])));
> > >  })
> > > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr91635.c b/gcc/testsuite/gcc.c-torture/execute/pr91635.c
> > > new file mode 100644
> > > index 00000000000..878a491fc36
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.c-torture/execute/pr91635.c
> > > @@ -0,0 +1,57 @@
> > > +/* PR target/91635 */
> > > +
> > > +#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 \
> > > +    && __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8
> > > +unsigned short b, c;
> > > +int u, v, w, x;
> > > +
> > > +__attribute__ ((noipa)) int
> > > +foo (unsigned short c)
> > > +{
> > > +  c <<= __builtin_add_overflow (-c, -1, &b);
> > > +  c >>= 1;
> > > +  return c;
> > > +}
> > > +
> > > +__attribute__ ((noipa)) int
> > > +bar (unsigned short b)
> > > +{
> > > +  b <<= -14 & 15;
> > > +  b = b >> -~1;
> > > +  return b;
> > > +}
> > > +
> > > +__attribute__ ((noipa)) int
> > > +baz (unsigned short e)
> > > +{
> > > +  e <<= 1;
> > > +  e >>= __builtin_add_overflow (8719476735, u, &v);
> > > +  return e;
> > > +}
> > > +
> > > +__attribute__ ((noipa)) int
> > > +qux (unsigned int e)
> > > +{
> > > +  c = ~1;
> > > +  c *= e;
> > > +  c = c >> (-15 & 5);
> > > +  return c + w + x;
> > > +}
> > > +#endif
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 \
> > > +    && __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8
> > > +  if (foo (0xffff) != 0x7fff)
> > > +    __builtin_abort ();
> > > +  if (bar (5) != 5)
> > > +    __builtin_abort ();
> > > +  if (baz (~0) != 0x7fff)
> > > +    __builtin_abort ();
> > > +  if (qux (2) != 0x7ffe)
> > > +    __builtin_abort ();
> > > +#endif
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/riscv/shift-shift-4.c b/gcc/testsuite/gcc.target/riscv/shift-shift-4.c
> > > new file mode 100644
> > > index 00000000000..72a45ee87ae
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/shift-shift-4.c
> > > @@ -0,0 +1,13 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=rv32i -mabi=ilp32 -O2" } */
> > > +
> > > +/* One zero-extend shift can be eliminated by modifying the constant in the
> > > +   greater than test.  Started working after modifying the splitter
> > > +   lshrsi3_zero_extend_3+1 to use a temporary reg for the first split dest.  */
> > > +int
> > > +sub (int i)
> > > +{
> > > +  i &= 0x7fffffff;
> > > +  return i > 0x7f800000;
> > > +}
> > > +/* { dg-final { scan-assembler-not "srli" } } */
> > > diff --git a/gcc/testsuite/gcc.target/riscv/shift-shift-5.c b/gcc/testsuite/gcc.target/riscv/shift-shift-5.c
> > > new file mode 100644
> > > index 00000000000..5b2ae89a471
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/shift-shift-5.c
> > > @@ -0,0 +1,16 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=rv64gc -mabi=lp64d -O2" } */
> > > +
> > > +/* Fails if lshrsi3_zero_extend_3+1 uses a temp reg which has no REG_DEST
> > > +   note.  */
> > > +unsigned long
> > > +sub (long l)
> > > +{
> > > +  union u {
> > > +    struct s { int a : 19; unsigned int b : 13; int x; } s;
> > > +    long l;
> > > +  } u;
> > > +  u.l = l;
> > > +  return u.s.b;
> > > +}
> > > +/* { dg-final { scan-assembler "srliw" } } */
> > > --
> > > 2.17.1
> > >
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 247165 (AG München)
Jim Wilson Sept. 18, 2019, 8:34 p.m. UTC | #7
On Wed, Sep 18, 2019 at 3:27 AM Kito Cheng <kito.cheng@gmail.com> wrote:
> On Wed, Sep 18, 2019 at 6:25 PM Richard Biener <rguenther@suse.de> wrote:
> > Since it is target specific and for non-primary/secondary targets
> > it's the RISC-V maintainers call whether to allow backporting this.
> > Generally wrong-code issues can be backported even if they are not
> > regressions if the chance they introduce other issues is low.

Speaking as a RISC-V maintainer, I'd like to see it backported.  That
is why I left the bug report open.

Jim
diff mbox series

Patch

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 78260fcf6fd..744a027a1b7 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1051,7 +1051,9 @@ 
   "@
    #
    lwu\t%0,%1"
-  "&& reload_completed && REG_P (operands[1])"
+  "&& reload_completed
+   && REG_P (operands[1])
+   && !paradoxical_subreg_p (operands[0])"
   [(set (match_dup 0)
 	(ashift:DI (match_dup 1) (const_int 32)))
    (set (match_dup 0)
@@ -1068,7 +1070,9 @@ 
   "@
    #
    lhu\t%0,%1"
-  "&& reload_completed && REG_P (operands[1])"
+  "&& reload_completed
+   && REG_P (operands[1])
+   && !paradoxical_subreg_p (operands[0])"
   [(set (match_dup 0)
 	(ashift:GPR (match_dup 1) (match_dup 2)))
    (set (match_dup 0)
@@ -1117,7 +1121,9 @@ 
   "@
    #
    l<SHORT:size>\t%0,%1"
-  "&& reload_completed && REG_P (operands[1])"
+  "&& reload_completed
+   && REG_P (operands[1])
+   && !paradoxical_subreg_p (operands[0])"
   [(set (match_dup 0) (ashift:SI (match_dup 1) (match_dup 2)))
    (set (match_dup 0) (ashiftrt:SI (match_dup 0) (match_dup 2)))]
 {
@@ -1766,15 +1772,20 @@ 
 ;; Handle AND with 2^N-1 for N from 12 to XLEN.  This can be split into
 ;; two logical shifts.  Otherwise it requires 3 instructions: lui,
 ;; xor/addi/srli, and.
+
+;; Generating a temporary for the shift output gives better combiner results;
+;; and also fixes a problem where op0 could be a paradoxical reg and shifting
+;; by amounts larger than the size of the SUBREG_REG doesn't work.
 (define_split
   [(set (match_operand:GPR 0 "register_operand")
 	(and:GPR (match_operand:GPR 1 "register_operand")
-		 (match_operand:GPR 2 "p2m1_shift_operand")))]
+		 (match_operand:GPR 2 "p2m1_shift_operand")))
+   (clobber (match_operand:GPR 3 "register_operand"))]
   ""
- [(set (match_dup 0)
+ [(set (match_dup 3)
        (ashift:GPR (match_dup 1) (match_dup 2)))
   (set (match_dup 0)
-       (lshiftrt:GPR (match_dup 0) (match_dup 2)))]
+       (lshiftrt:GPR (match_dup 3) (match_dup 2)))]
 {
   /* Op2 is a VOIDmode constant, so get the mode size from op1.  */
   operands[2] = GEN_INT (GET_MODE_BITSIZE (GET_MODE (operands[1]))
@@ -1786,12 +1797,13 @@ 
 (define_split
   [(set (match_operand:DI 0 "register_operand")
 	(and:DI (match_operand:DI 1 "register_operand")
-		(match_operand:DI 2 "high_mask_shift_operand")))]
+		(match_operand:DI 2 "high_mask_shift_operand")))
+   (clobber (match_operand:DI 3 "register_operand"))]
   "TARGET_64BIT"
-  [(set (match_dup 0)
+  [(set (match_dup 3)
 	(lshiftrt:DI (match_dup 1) (match_dup 2)))
    (set (match_dup 0)
-	(ashift:DI (match_dup 0) (match_dup 2)))]
+	(ashift:DI (match_dup 3) (match_dup 2)))]
 {
   operands[2] = GEN_INT (ctz_hwi (INTVAL (operands[2])));
 })
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr91635.c b/gcc/testsuite/gcc.c-torture/execute/pr91635.c
new file mode 100644
index 00000000000..878a491fc36
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr91635.c
@@ -0,0 +1,57 @@ 
+/* PR target/91635 */
+
+#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 \
+    && __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8
+unsigned short b, c;
+int u, v, w, x;
+
+__attribute__ ((noipa)) int
+foo (unsigned short c)
+{
+  c <<= __builtin_add_overflow (-c, -1, &b);
+  c >>= 1;
+  return c;
+}
+
+__attribute__ ((noipa)) int
+bar (unsigned short b)
+{
+  b <<= -14 & 15;
+  b = b >> -~1;
+  return b;
+}
+
+__attribute__ ((noipa)) int
+baz (unsigned short e)
+{
+  e <<= 1;
+  e >>= __builtin_add_overflow (8719476735, u, &v);
+  return e;
+}
+
+__attribute__ ((noipa)) int
+qux (unsigned int e)
+{
+  c = ~1;
+  c *= e;
+  c = c >> (-15 & 5);
+  return c + w + x;
+}
+#endif
+
+int
+main ()
+{
+#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 \
+    && __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8
+  if (foo (0xffff) != 0x7fff)
+    __builtin_abort ();
+  if (bar (5) != 5)
+    __builtin_abort ();
+  if (baz (~0) != 0x7fff)
+    __builtin_abort ();
+  if (qux (2) != 0x7ffe)
+    __builtin_abort ();
+#endif
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/shift-shift-4.c b/gcc/testsuite/gcc.target/riscv/shift-shift-4.c
new file mode 100644
index 00000000000..72a45ee87ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shift-shift-4.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv32i -mabi=ilp32 -O2" } */
+
+/* One zero-extend shift can be eliminated by modifying the constant in the
+   greater than test.  Started working after modifying the splitter
+   lshrsi3_zero_extend_3+1 to use a temporary reg for the first split dest.  */
+int
+sub (int i)
+{
+  i &= 0x7fffffff;
+  return i > 0x7f800000;
+}
+/* { dg-final { scan-assembler-not "srli" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shift-shift-5.c b/gcc/testsuite/gcc.target/riscv/shift-shift-5.c
new file mode 100644
index 00000000000..5b2ae89a471
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shift-shift-5.c
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O2" } */
+
+/* Fails if lshrsi3_zero_extend_3+1 uses a temp reg which has no REG_DEST
+   note.  */
+unsigned long
+sub (long l)
+{
+  union u {
+    struct s { int a : 19; unsigned int b : 13; int x; } s;
+    long l;
+  } u;
+  u.l = l;
+  return u.s.b;
+}
+/* { dg-final { scan-assembler "srliw" } } */