diff mbox series

PR rtl-optimization/109476: Use ZERO_EXTEND instead of zeroing a SUBREG.

Message ID 00fd01d97620$36a11e20$a3e35a60$@nextmovesoftware.com
State New
Headers show
Series PR rtl-optimization/109476: Use ZERO_EXTEND instead of zeroing a SUBREG. | expand

Commit Message

Roger Sayle April 23, 2023, 8:14 p.m. UTC
This patch fixes PR rtl-optimization/109476, which is a code quality
regression affecting AVR.  The cause is that the lower-subreg pass is
sometimes overly aggressive, lowering the LSHIFTRT below:

(insn 7 4 8 2 (set (reg:HI 51)
        (lshiftrt:HI (reg/v:HI 49 [ b ])
            (const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3}
     (nil))

into a pair of QImode SUBREG assignments:

(insn 19 4 20 2 (set (subreg:QI (reg:HI 51) 0)
        (reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split}
     (nil))
(insn 20 19 8 2 (set (subreg:QI (reg:HI 51) 1)
        (const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split}
     (nil))

but this idiom, SETs of SUBREGs, interferes with combine's ability
to associate/fuse instructions.  The solution, on targets that
have a suitable ZERO_EXTEND (i.e. where the lower-subreg pass
wouldn't itself split a ZERO_EXTEND, so "splitting_zext" is false),
is to split/lower LSHIFTRT to a ZERO_EXTEND.

To answer Richard's question in comment #10 of the bugzilla PR,
the function resolve_shift_zext is called with one of four RTX
codes, ASHIFTRT, LSHIFTRT, ZERO_EXTEND and ASHIFT, but only with
LSHIFTRT can the setting of low_part and high_part SUBREGs be
replaced by a ZERO_EXTEND.  For ASHIFTRT, we require a sign
extension, so don't set the high_part to zero; if we're splitting
a ZERO_EXTEND then it doesn't make sense to replace it with a
ZERO_EXTEND, and for ASHIFT we've played games to swap the
high_part and low_part SUBREGs, so that we assign the low_part
to zero (for double word shifts by greater than word size bits).

This patch has been tested on x86_64-pc-linux-gnu with a make
bootstrap and make -k check, both 64-bit and 32-bit, with no
new regressions.  Many thanks to Jeff Law for testing this patch
on his build farm, which spotted an issue on xstormy16, which
should now be fixed by (either of) my recent xstormy16 patches.
Ok for mainline?


2023-04-23  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR rtl-optimization/109476
	* lower-subreg.cc: Include explow.h for force_reg.
	(find_decomposable_shift_zext): Pass an additional SPEED_P argument.
	If decomposing a suitable LSHIFTRT and we're not splitting
	ZERO_EXTEND (based on the current SPEED_P), then use a ZERO_EXTEND
	instead of setting a high part SUBREG to zero, which helps combine.
	(decompose_multiword_subregs): Update call to resolve_shift_zext.

gcc/testsuite/ChangeLog
	PR rtl-optimization/109476
	* gcc.target/avr/mmcu/pr109476.c: New test case.


Thanks in advance,
Roger
--

Comments

Richard Biener April 28, 2023, 7:08 a.m. UTC | #1
On Sun, Apr 23, 2023 at 10:14 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch fixes PR rtl-optimization/109476, which is a code quality
> regression affecting AVR.  The cause is that the lower-subreg pass is
> sometimes overly aggressive, lowering the LSHIFTRT below:
>
> (insn 7 4 8 2 (set (reg:HI 51)
>         (lshiftrt:HI (reg/v:HI 49 [ b ])
>             (const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3}
>      (nil))
>
> into a pair of QImode SUBREG assignments:
>
> (insn 19 4 20 2 (set (subreg:QI (reg:HI 51) 0)
>         (reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split}
>      (nil))
> (insn 20 19 8 2 (set (subreg:QI (reg:HI 51) 1)
>         (const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split}
>      (nil))
>
> but this idiom, SETs of SUBREGs, interferes with combine's ability
> to associate/fuse instructions.  The solution, on targets that
> have a suitable ZERO_EXTEND (i.e. where the lower-subreg pass
> wouldn't itself split a ZERO_EXTEND, so "splitting_zext" is false),
> is to split/lower LSHIFTRT to a ZERO_EXTEND.
>
> To answer Richard's question in comment #10 of the bugzilla PR,
> the function resolve_shift_zext is called with one of four RTX
> codes, ASHIFTRT, LSHIFTRT, ZERO_EXTEND and ASHIFT, but only with
> LSHIFTRT can the setting of low_part and high_part SUBREGs be
> replaced by a ZERO_EXTEND.  For ASHIFTRT, we require a sign
> extension, so don't set the high_part to zero; if we're splitting
> a ZERO_EXTEND then it doesn't make sense to replace it with a
> ZERO_EXTEND, and for ASHIFT we've played games to swap the
> high_part and low_part SUBREGs, so that we assign the low_part
> to zero (for double word shifts by greater than word size bits).
>
> This patch has been tested on x86_64-pc-linux-gnu with a make
> bootstrap and make -k check, both 64-bit and 32-bit, with no
> new regressions.  Many thanks to Jeff Law for testing this patch
> on his build farm, which spotted an issue on xstormy16, which
> should now be fixed by (either of) my recent xstormy16 patches.
> Ok for mainline?

OK.

Thanks,
Richard.

>
> 2023-04-23  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR rtl-optimization/109476
>         * lower-subreg.cc: Include explow.h for force_reg.
>         (find_decomposable_shift_zext): Pass an additional SPEED_P argument.
>         If decomposing a suitable LSHIFTRT and we're not splitting
>         ZERO_EXTEND (based on the current SPEED_P), then use a ZERO_EXTEND
>         instead of setting a high part SUBREG to zero, which helps combine.
>         (decompose_multiword_subregs): Update call to resolve_shift_zext.
>
> gcc/testsuite/ChangeLog
>         PR rtl-optimization/109476
>         * gcc.target/avr/mmcu/pr109476.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>
diff mbox series

Patch

diff --git a/gcc/lower-subreg.cc b/gcc/lower-subreg.cc
index 481e1e8..81fc5380 100644
--- a/gcc/lower-subreg.cc
+++ b/gcc/lower-subreg.cc
@@ -37,6 +37,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "cfgbuild.h"
 #include "dce.h"
 #include "expr.h"
+#include "explow.h"
 #include "tree-pass.h"
 #include "lower-subreg.h"
 #include "rtl-iter.h"
@@ -1299,11 +1300,12 @@  find_decomposable_shift_zext (rtx_insn *insn, bool speed_p)
 
 /* Decompose a more than word wide shift (in INSN) of a multiword
    pseudo or a multiword zero-extend of a wordmode pseudo into a move
-   and 'set to zero' insn.  Return a pointer to the new insn when a
-   replacement was done.  */
+   and 'set to zero' insn.  SPEED_P says whether we are optimizing
+   for speed or size, when checking if a ZERO_EXTEND is preferable.
+   Return a pointer to the new insn when a replacement was done.  */
 
 static rtx_insn *
-resolve_shift_zext (rtx_insn *insn)
+resolve_shift_zext (rtx_insn *insn, bool speed_p)
 {
   rtx set;
   rtx op;
@@ -1378,14 +1380,29 @@  resolve_shift_zext (rtx_insn *insn)
 				dest_reg, GET_CODE (op) != ASHIFTRT);
     }
 
-  if (dest_reg != src_reg)
-    emit_move_insn (dest_reg, src_reg);
-  if (GET_CODE (op) != ASHIFTRT)
-    emit_move_insn (dest_upper, CONST0_RTX (word_mode));
-  else if (INTVAL (XEXP (op, 1)) == 2 * BITS_PER_WORD - 1)
-    emit_move_insn (dest_upper, copy_rtx (src_reg));
+  /* Consider using ZERO_EXTEND instead of setting DEST_UPPER to zero
+     if this is considered reasonable.  */
+  if (GET_CODE (op) == LSHIFTRT
+      && GET_MODE (op) == twice_word_mode
+      && REG_P (SET_DEST (set))
+      && !choices[speed_p].splitting_zext)
+    {
+      rtx tmp = force_reg (word_mode, copy_rtx (src_reg));
+      tmp = simplify_gen_unary (ZERO_EXTEND, twice_word_mode, tmp, word_mode);
+      emit_move_insn (SET_DEST (set), tmp);
+    }
   else
-    emit_move_insn (dest_upper, upper_src);
+    {
+      if (dest_reg != src_reg)
+	emit_move_insn (dest_reg, src_reg);
+      if (GET_CODE (op) != ASHIFTRT)
+	emit_move_insn (dest_upper, CONST0_RTX (word_mode));
+      else if (INTVAL (XEXP (op, 1)) == 2 * BITS_PER_WORD - 1)
+	emit_move_insn (dest_upper, copy_rtx (src_reg));
+      else
+	emit_move_insn (dest_upper, upper_src);
+    }
+
   insns = get_insns ();
 
   end_sequence ();
@@ -1670,7 +1687,7 @@  decompose_multiword_subregs (bool decompose_copies)
 		    {
 		      rtx_insn *decomposed_shift;
 
-		      decomposed_shift = resolve_shift_zext (insn);
+		      decomposed_shift = resolve_shift_zext (insn, speed_p);
 		      if (decomposed_shift != NULL_RTX)
 			{
 			  insn = decomposed_shift;
diff --git a/gcc/testsuite/gcc.target/avr/mmcu/pr109476.c b/gcc/testsuite/gcc.target/avr/mmcu/pr109476.c
new file mode 100644
index 0000000..6e2269a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/mmcu/pr109476.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-Os -mmcu=avrxmega3" } */
+
+unsigned short foo(unsigned char a, unsigned short b) {
+    return (unsigned char)((b >> 8) + 0) * a ;
+}
+
+/* { dg-final { scan-assembler-times "mul" 1 } } */
+/* { dg-final { scan-assembler-times "mov" 1 } } */
+/* { dg-final { scan-assembler-not "add" } } */
+/* { dg-final { scan-assembler-not "ldi" } } */