diff mbox series

[Patch-2v2,rs6000] Eliminate unnecessary byte swaps for duplicated constant vector store [PR113325]

Message ID e7afc1e1-3faf-47f7-9460-237e3d070b1c@linux.ibm.com
State New
Headers show
Series [Patch-2v2,rs6000] Eliminate unnecessary byte swaps for duplicated constant vector store [PR113325] | expand

Commit Message

HAO CHEN GUI June 12, 2024, 2:47 a.m. UTC
Hi,
  This patch creates an insn_and_split pattern which helps the duplicated
constant vector replace the source pseudo of store insn in fwprop pass.
Thus the store can be implemented by a single stxvd2x and it eliminates the
unnecessary byte swap insn on P8 LE. The test case shows the optimization.

  The patch depends on the first generic patch which uses insn cost in fwprop.
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654276.html

  Compared to previous version, the main change is to remove the predict and
put the check in insn condition and gcc assertion.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions.

Thanks
Gui Haochen


ChangeLog
rs6000: Eliminate unnecessary byte swaps for duplicated constant vector store

gcc/
	PR target/113325
	* config/rs6000/vsx.md (vsx_stxvd2x4_le_const_<mode>): New.

gcc/testsuite/
	PR target/113325
	* gcc.target/powerpc/pr113325.c: New.

patch.diff

Comments

Kewen.Lin June 12, 2024, 6:15 a.m. UTC | #1
Hi Haochen,

on 2024/6/12 10:47, HAO CHEN GUI wrote:
> Hi,
>   This patch creates an insn_and_split pattern which helps the duplicated
> constant vector replace the source pseudo of store insn in fwprop pass.
> Thus the store can be implemented by a single stxvd2x and it eliminates the
> unnecessary byte swap insn on P8 LE. The test case shows the optimization.
> 
>   The patch depends on the first generic patch which uses insn cost in fwprop.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654276.html
> 
>   Compared to previous version, the main change is to remove the predict and
> put the check in insn condition and gcc assertion.
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions.
> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> rs6000: Eliminate unnecessary byte swaps for duplicated constant vector store
> 
> gcc/
> 	PR target/113325
> 	* config/rs6000/vsx.md (vsx_stxvd2x4_le_const_<mode>): New.
> 
> gcc/testsuite/
> 	PR target/113325
> 	* gcc.target/powerpc/pr113325.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index f135fa079bd..89eb32a0758 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3368,6 +3368,32 @@ (define_insn "*vsx_stxvd2x4_le_<mode>"
>    "stxvd2x %x1,%y0"
>    [(set_attr "type" "vecstore")])
> 
> +(define_insn_and_split "vsx_stxvd2x4_le_const_<mode>"
> +  [(set (match_operand:VSX_W 0 "memory_operand" "=Z")
> +	(match_operand:VSX_W 1 "immediate_operand" "W"))]
> +  "!BYTES_BIG_ENDIAN
> +   && VECTOR_MEM_VSX_P (<MODE>mode)
> +   && !TARGET_P9_VECTOR
> +   && const_vec_duplicate_p (operands[1])"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 2)
> +	(match_dup 1))
> +   (set (match_dup 0)
> +	(vec_select:VSX_W
> +	  (match_dup 2)
> +	  (parallel [(const_int 2) (const_int 3)
> +		     (const_int 0) (const_int 1)])))]
> +{
> +  /* Here all the constants must be loaded without memory.  */
> +  gcc_assert (easy_altivec_constant (operands[1], <MODE>mode));
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1])
> +					 : operands[1];

For the case of !can_create_pseudo_p (), operands[2] would be a constant vector,
does it match any existing pattern?  If no, I think we want to add
can_create_pseudo_p () to the condition as well.

The others look good to me, thanks!

BR,
Kewen

> +
> +}
> +  [(set_attr "type" "vecstore")
> +   (set_attr "length" "8")])
> +
>  (define_insn "*vsx_stxvd2x8_le_V8HI"
>    [(set (match_operand:V8HI 0 "memory_operand" "=Z")
>          (vec_select:V8HI
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c b/gcc/testsuite/gcc.target/powerpc/pr113325.c
> new file mode 100644
> index 00000000000..3ca1fcbc9ba
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8 -mvsx" } */
> +/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */
> +
> +void* foo (void* s1)
> +{
> +  return __builtin_memset (s1, 0, 32);
> +}
diff mbox series

Patch

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f135fa079bd..89eb32a0758 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3368,6 +3368,32 @@  (define_insn "*vsx_stxvd2x4_le_<mode>"
   "stxvd2x %x1,%y0"
   [(set_attr "type" "vecstore")])

+(define_insn_and_split "vsx_stxvd2x4_le_const_<mode>"
+  [(set (match_operand:VSX_W 0 "memory_operand" "=Z")
+	(match_operand:VSX_W 1 "immediate_operand" "W"))]
+  "!BYTES_BIG_ENDIAN
+   && VECTOR_MEM_VSX_P (<MODE>mode)
+   && !TARGET_P9_VECTOR
+   && const_vec_duplicate_p (operands[1])"
+  "#"
+  "&& 1"
+  [(set (match_dup 2)
+	(match_dup 1))
+   (set (match_dup 0)
+	(vec_select:VSX_W
+	  (match_dup 2)
+	  (parallel [(const_int 2) (const_int 3)
+		     (const_int 0) (const_int 1)])))]
+{
+  /* Here all the constants must be loaded without memory.  */
+  gcc_assert (easy_altivec_constant (operands[1], <MODE>mode));
+  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1])
+					 : operands[1];
+
+}
+  [(set_attr "type" "vecstore")
+   (set_attr "length" "8")])
+
 (define_insn "*vsx_stxvd2x8_le_V8HI"
   [(set (match_operand:V8HI 0 "memory_operand" "=Z")
         (vec_select:V8HI
diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c b/gcc/testsuite/gcc.target/powerpc/pr113325.c
new file mode 100644
index 00000000000..3ca1fcbc9ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c
@@ -0,0 +1,9 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8 -mvsx" } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */
+
+void* foo (void* s1)
+{
+  return __builtin_memset (s1, 0, 32);
+}