diff mbox series

[rs6000] Add subreg patterns for SImode rotate and mask insert

Message ID bfcff14a-49fc-4f47-9a1e-2c04d9cd3458@linux.ibm.com
State New
Headers show
Series [rs6000] Add subreg patterns for SImode rotate and mask insert | expand

Commit Message

HAO CHEN GUI March 1, 2024, 2:41 a.m. UTC
Hi,
  This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In
combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
lshiftrt with an out AND. It matches a DImode rotate and mask insert on
rs6000.

Trying 2 -> 7:
    2: r122:DI=r129:DI
      REG_DEAD r129:DI
    7: r125:SI=r122:DI#0 0>>0x1f
      REG_DEAD r122:DI
Failed to match this instruction:
(set (subreg:DI (reg:SI 125 [ x ]) 0)
    (zero_extract:DI (reg:DI 129)
        (const_int 32 [0x20])
        (const_int 1 [0x1])))
Successfully matched this instruction:
(set (subreg:DI (reg:SI 125 [ x ]) 0)
    (and:DI (lshiftrt:DI (reg:DI 129)
            (const_int 31 [0x1f]))
        (const_int 4294967295 [0xffffffff])))

This conversion blocks the further combination which combines to a SImode
rotate and mask insert insn.

Trying 9, 7 -> 10:
    9: r127:SI=r130:DI#0&0xfffffffffffffffe
      REG_DEAD r130:DI
    7: r125:SI#0=r129:DI 0>>0x1f&0xffffffff
      REG_DEAD r129:DI
   10: r124:SI=r127:SI|r125:SI
      REG_DEAD r125:SI
      REG_DEAD r127:SI
Failed to match this instruction:
(set (reg:SI 124)
    (ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
            (const_int -2 [0xfffffffffffffffe]))
        (subreg:SI (zero_extract:DI (reg:DI 129)
                (const_int 32 [0x20])
                (const_int 1 [0x1])) 0)))
Failed to match this instruction:
(set (reg:SI 124)
    (ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
            (const_int -2 [0xfffffffffffffffe]))
        (subreg:SI (and:DI (lshiftrt:DI (reg:DI 129)
                    (const_int 31 [0x1f]))
                (const_int 4294967295 [0xffffffff])) 0)))

  The root cause of the issue is if it's necessary to do the widen mode for
lshiftrt when the target already has the narrow mode lshiftrt and its cost
is not high. My former patch tried to fix the problem but not accepted yet.
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html

  As it's stage 4 now, I drafted this patch to fix the regression by adding
subreg patterns of SImode rotate and mask insert. It actually does reversed
things and narrow the mode for lshiftrt so that it can matches the SImode
rotate and mask insert.

  The case "rlwimi-2.c" is fixed and restore the corresponding number of
insns to original ones. The case "rlwinm-0.c" is also changed and 9 "rlwinm"
is replaced with 9 "rldicl" as the sequence of combine is changed. It's not
a regression as the total number of insns isn't changed.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen


ChangeLog
rs6000: Add subreg patterns for SImode rotate and mask insert

In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
lshiftrt with an AND.  The new pattern matches rotate and mask insert on
rs6000.  Thus it blocks the pattern to be further combined to a SImode rotate
and mask insert pattern.  This patch fixes the problem by adding two subreg
pattern for SImode rotate and mask insert patterns.

gcc/
	PR target/93738
	* config/rs6000/rs6000.md (*rotlsi3_insert_9): New.
	(*rotlsi3_insert_8): New.

gcc/testsuite/
	PR target/93738
	* gcc.target/powerpc/rlwimi-2.c: Adjust the number of 64bit and 32bit
	rotate instructions.
	* gcc.target/powerpc/rlwinm-0.c: Likewise.

patch.diff

Comments

Kewen.Lin March 6, 2024, 9:26 a.m. UTC | #1
Hi,

on 2024/3/1 10:41, HAO CHEN GUI wrote:
> Hi,
>   This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In
> combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
> lshiftrt with an out AND. It matches a DImode rotate and mask insert on
> rs6000.
> 
> Trying 2 -> 7:
>     2: r122:DI=r129:DI
>       REG_DEAD r129:DI
>     7: r125:SI=r122:DI#0 0>>0x1f
>       REG_DEAD r122:DI
> Failed to match this instruction:
> (set (subreg:DI (reg:SI 125 [ x ]) 0)
>     (zero_extract:DI (reg:DI 129)
>         (const_int 32 [0x20])
>         (const_int 1 [0x1])))
> Successfully matched this instruction:
> (set (subreg:DI (reg:SI 125 [ x ]) 0)
>     (and:DI (lshiftrt:DI (reg:DI 129)
>             (const_int 31 [0x1f]))
>         (const_int 4294967295 [0xffffffff])))
> 
> This conversion blocks the further combination which combines to a SImode
> rotate and mask insert insn.
> 
> Trying 9, 7 -> 10:
>     9: r127:SI=r130:DI#0&0xfffffffffffffffe
>       REG_DEAD r130:DI
>     7: r125:SI#0=r129:DI 0>>0x1f&0xffffffff
>       REG_DEAD r129:DI
>    10: r124:SI=r127:SI|r125:SI
>       REG_DEAD r125:SI
>       REG_DEAD r127:SI
> Failed to match this instruction:
> (set (reg:SI 124)
>     (ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
>             (const_int -2 [0xfffffffffffffffe]))
>         (subreg:SI (zero_extract:DI (reg:DI 129)
>                 (const_int 32 [0x20])
>                 (const_int 1 [0x1])) 0)))
> Failed to match this instruction:
> (set (reg:SI 124)
>     (ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
>             (const_int -2 [0xfffffffffffffffe]))
>         (subreg:SI (and:DI (lshiftrt:DI (reg:DI 129)
>                     (const_int 31 [0x1f]))
>                 (const_int 4294967295 [0xffffffff])) 0)))
> 
>   The root cause of the issue is if it's necessary to do the widen mode for
> lshiftrt when the target already has the narrow mode lshiftrt and its cost
> is not high. My former patch tried to fix the problem but not accepted yet.
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html

Hope Segher can chime in this proposal updating combine pass, I can understand
the new proposal by introducing new patterns in target code is able to fix the
issue, but IMHO it's likely there are some other mis-optimization which don't
get noticed and they need some similar pattern extension (duplicate some pattern
& adjust with subreg) to optimize, from this perspective, it would be nice if
it's possible to have a more general fix.

Some minor comments for this patch itself are inlined.

> 
>   As it's stage 4 now, I drafted this patch to fix the regression by adding
> subreg patterns of SImode rotate and mask insert. It actually does reversed
> things and narrow the mode for lshiftrt so that it can matches the SImode
> rotate and mask insert.
> 
>   The case "rlwimi-2.c" is fixed and restore the corresponding number of
> insns to original ones. The case "rlwinm-0.c" is also changed and 9 "rlwinm"
> is replaced with 9 "rldicl" as the sequence of combine is changed. It's not
> a regression as the total number of insns isn't changed.
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?
> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> rs6000: Add subreg patterns for SImode rotate and mask insert
> 
> In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
> lshiftrt with an AND.  The new pattern matches rotate and mask insert on
> rs6000.  Thus it blocks the pattern to be further combined to a SImode rotate
> and mask insert pattern.  This patch fixes the problem by adding two subreg
> pattern for SImode rotate and mask insert patterns.
> 
> gcc/
> 	PR target/93738
> 	* config/rs6000/rs6000.md (*rotlsi3_insert_9): New.
> 	(*rotlsi3_insert_8): New.
> 
> gcc/testsuite/
> 	PR target/93738
> 	* gcc.target/powerpc/rlwimi-2.c: Adjust the number of 64bit and 32bit
> 	rotate instructions.
> 	* gcc.target/powerpc/rlwinm-0.c: Likewise.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index bc8bc6ab060..b0b40f91e3e 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -4253,6 +4253,36 @@ (define_insn "*rotl<mode>3_insert"
>  ; difference between rlwimi and rldimi.  We also might want dot forms,
>  ; but not for rlwimi on POWER4 and similar processors.
> 
> +; Subreg pattern of insn "*rotlsi3_insert"
> +(define_insn_and_split "*rotlsi3_insert_9"

Nit: "*rotlsi3_insert_subreg" seems a better name, ...

> +  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
> +	(ior:SI (and:SI
> +		 (match_operator:SI 8 "lowpart_subreg_operator"
> +		  [(and:DI (match_operator:DI 4 "rotate_mask_operator"
> +			    [(match_operand:DI 1 "gpc_reg_operand" "r")
> +			     (match_operand:SI 2 "const_int_operand" "n")])
> +			   (match_operand:DI 3 "const_int_operand" "n"))])
> +		 (match_operand:SI 5 "const_int_operand" "n"))
> +		(and:SI (match_operand:SI 6 "gpc_reg_operand" "0")
> +			(match_operand:SI 7 "const_int_operand" "n"))))]
> +  "rs6000_is_valid_insert_mask (operands[5], operands[4], SImode)
> +   && GET_CODE (operands[4]) == LSHIFTRT
> +   && INTVAL (operands[3]) == 0xffffffff
> +   && UINTVAL (operands[5]) + UINTVAL (operands[7]) + 1 == 0"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +	(ior:SI (and:SI (lshiftrt:SI (match_dup 9)
> +				     (match_dup 2))
> +			(match_dup 5))
> +		(and:SI (match_dup 6)
> +			(match_dup 7))))]
> +{
> +  int offset = BYTES_BIG_ENDIAN ? 4 : 0;
> +  operands[9] = gen_rtx_SUBREG (SImode, operands[1], offset);
> +}
> +  [(set_attr "type" "insert")])
> +
>  (define_insn "*rotl<mode>3_insert_2"
>    [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
>  	(ior:GPR (and:GPR (match_operand:GPR 5 "gpc_reg_operand" "0")
> @@ -4331,6 +4361,31 @@ (define_insn "*rotlsi3_insert_4"
>    "rlwimi %0,%1,32-%h2,%h2,31"
>    [(set_attr "type" "insert")])
> 
> +; Subreg pattern of insn "*rotlsi3_insert_4"
> +(define_insn_and_split "*rotlsi3_insert_8"

..., and "*rotlsi3_insert_4_subreg" for this.

> +  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
> +	(ior:SI (and:SI (match_operand:SI 3 "gpc_reg_operand" "0")
> +			(match_operand:SI 4 "const_int_operand" "n"))
> +		(match_operator:SI 6 "lowpart_subreg_operator"
> +		  [(and:DI
> +		    (lshiftrt:DI (match_operand:DI 1 "gpc_reg_operand" "r")
> +				 (match_operand:SI 2 "const_int_operand" "n"))
> +		    (match_operand:DI 5 "const_int_operand" "n"))])))]
> +  "INTVAL (operands[2]) + exact_log2 (-UINTVAL (operands[4])) == 32
> +   && INTVAL (operands[5]) == 0xffffffff"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +	(ior:SI (and:SI (match_dup 3)
> +			(match_dup 4))
> +		(lshiftrt:SI (match_dup 7)
> +			     (match_dup 2))))]
> +{
> +  int offset = BYTES_BIG_ENDIAN ? 4 : 0;
> +  operands[7] = gen_rtx_SUBREG (SImode, operands[1], offset);
> +}
> +  [(set_attr "type" "insert")])
> +
>  (define_insn "*rotlsi3_insert_5"
>    [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
>  	(ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
> diff --git a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> index bafa371db73..62344a95aa0 100644
> --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> @@ -6,10 +6,9 @@
>  /* { dg-final { scan-assembler-times {(?n)^\s+blr} 6750 } } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+mr} 643 { target ilp32 } } } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+mr} 11 { target lp64 } } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 7790 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 6728 { target lp64 } } } */
> 
> -/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target ilp32 } } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1666 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 } } */
> 
>  /* { dg-final { scan-assembler-times {(?n)^\s+mulli} 5036 } } */
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c b/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
> index 4f4fca2d8ef..a10d9174306 100644
> --- a/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
> +++ b/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
> @@ -4,10 +4,10 @@
>  /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 6739 { target ilp32 } } } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9716 { target lp64 } } } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+blr} 3375 } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 3081 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 3090 { target lp64 } } } */
> 
>  /* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3197 { target ilp32 } } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3093 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3084 { target lp64 } } } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+rotlwi} 154 } } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+srwi} 13 { target ilp32 } } } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+srdi} 13 { target lp64 } } } */

Peter's commit r14-9085-g81e5f276c59897 has fixed this with same counts, not rebased?

BR,
Kewen
diff mbox series

Patch

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bc8bc6ab060..b0b40f91e3e 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4253,6 +4253,36 @@  (define_insn "*rotl<mode>3_insert"
 ; difference between rlwimi and rldimi.  We also might want dot forms,
 ; but not for rlwimi on POWER4 and similar processors.

+; Subreg pattern of insn "*rotlsi3_insert"
+(define_insn_and_split "*rotlsi3_insert_9"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
+	(ior:SI (and:SI
+		 (match_operator:SI 8 "lowpart_subreg_operator"
+		  [(and:DI (match_operator:DI 4 "rotate_mask_operator"
+			    [(match_operand:DI 1 "gpc_reg_operand" "r")
+			     (match_operand:SI 2 "const_int_operand" "n")])
+			   (match_operand:DI 3 "const_int_operand" "n"))])
+		 (match_operand:SI 5 "const_int_operand" "n"))
+		(and:SI (match_operand:SI 6 "gpc_reg_operand" "0")
+			(match_operand:SI 7 "const_int_operand" "n"))))]
+  "rs6000_is_valid_insert_mask (operands[5], operands[4], SImode)
+   && GET_CODE (operands[4]) == LSHIFTRT
+   && INTVAL (operands[3]) == 0xffffffff
+   && UINTVAL (operands[5]) + UINTVAL (operands[7]) + 1 == 0"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(ior:SI (and:SI (lshiftrt:SI (match_dup 9)
+				     (match_dup 2))
+			(match_dup 5))
+		(and:SI (match_dup 6)
+			(match_dup 7))))]
+{
+  int offset = BYTES_BIG_ENDIAN ? 4 : 0;
+  operands[9] = gen_rtx_SUBREG (SImode, operands[1], offset);
+}
+  [(set_attr "type" "insert")])
+
 (define_insn "*rotl<mode>3_insert_2"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
 	(ior:GPR (and:GPR (match_operand:GPR 5 "gpc_reg_operand" "0")
@@ -4331,6 +4361,31 @@  (define_insn "*rotlsi3_insert_4"
   "rlwimi %0,%1,32-%h2,%h2,31"
   [(set_attr "type" "insert")])

+; Subreg pattern of insn "*rotlsi3_insert_4"
+(define_insn_and_split "*rotlsi3_insert_8"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
+	(ior:SI (and:SI (match_operand:SI 3 "gpc_reg_operand" "0")
+			(match_operand:SI 4 "const_int_operand" "n"))
+		(match_operator:SI 6 "lowpart_subreg_operator"
+		  [(and:DI
+		    (lshiftrt:DI (match_operand:DI 1 "gpc_reg_operand" "r")
+				 (match_operand:SI 2 "const_int_operand" "n"))
+		    (match_operand:DI 5 "const_int_operand" "n"))])))]
+  "INTVAL (operands[2]) + exact_log2 (-UINTVAL (operands[4])) == 32
+   && INTVAL (operands[5]) == 0xffffffff"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(ior:SI (and:SI (match_dup 3)
+			(match_dup 4))
+		(lshiftrt:SI (match_dup 7)
+			     (match_dup 2))))]
+{
+  int offset = BYTES_BIG_ENDIAN ? 4 : 0;
+  operands[7] = gen_rtx_SUBREG (SImode, operands[1], offset);
+}
+  [(set_attr "type" "insert")])
+
 (define_insn "*rotlsi3_insert_5"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
 	(ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
diff --git a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
index bafa371db73..62344a95aa0 100644
--- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
@@ -6,10 +6,9 @@ 
 /* { dg-final { scan-assembler-times {(?n)^\s+blr} 6750 } } */
 /* { dg-final { scan-assembler-times {(?n)^\s+mr} 643 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {(?n)^\s+mr} 11 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 7790 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 6728 { target lp64 } } } */

-/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1666 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 } } */

 /* { dg-final { scan-assembler-times {(?n)^\s+mulli} 5036 } } */

diff --git a/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c b/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
index 4f4fca2d8ef..a10d9174306 100644
--- a/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
+++ b/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
@@ -4,10 +4,10 @@ 
 /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 6739 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9716 { target lp64 } } } */
 /* { dg-final { scan-assembler-times {(?n)^\s+blr} 3375 } } */
-/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 3081 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 3090 { target lp64 } } } */

 /* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3197 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3093 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3084 { target lp64 } } } */
 /* { dg-final { scan-assembler-times {(?n)^\s+rotlwi} 154 } } */
 /* { dg-final { scan-assembler-times {(?n)^\s+srwi} 13 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {(?n)^\s+srdi} 13 { target lp64 } } } */