diff mbox series

[V2] Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.

Message ID 20240605042217.563013-1-hongtao.liu@intel.com
State New
Headers show
Series [V2] Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode. | expand

Commit Message

liuhongt June 5, 2024, 4:22 a.m. UTC
> Can you add a testcase for this?  I don't mind if it's x86 specific and
> does a bit of asm scanning.
>
> Also note that the context for this patch has changed, so it won't
> automatically apply.  So be extra careful when updating so that it goes
> into the right place (all the more reason to have a testcase validating
> that the optimization works correctly).
>
>
> I think the patch itself is fine.  So further review is just for the
> testcase and should be easy.
rebased and add a testcase.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?


When mask is (1 << (prec - imm) - 1) which is used to clear upper bits
of A, then it can be simplified to LSHIFTRT.

i.e Simplify
(and:v8hi
  (ashifrt:v8hi A 8)
  (const_vector 0xff x8))
to
(lshifrt:v8hi A 8)

gcc/ChangeLog:

	PR target/114428
	* simplify-rtx.cc
	(simplify_context::simplify_binary_operation_1):
	Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for
	specific mask.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr114428-1.c: New test.
---
 gcc/simplify-rtx.cc                        | 25 ++++++++++++++
 gcc/testsuite/gcc.target/i386/pr114428-1.c | 39 ++++++++++++++++++++++
 2 files changed, 64 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr114428-1.c

Comments

Jeff Law June 5, 2024, 2:43 p.m. UTC | #1
On 6/4/24 10:22 PM, liuhongt wrote:
>> Can you add a testcase for this?  I don't mind if it's x86 specific and
>> does a bit of asm scanning.
>>
>> Also note that the context for this patch has changed, so it won't
>> automatically apply.  So be extra careful when updating so that it goes
>> into the right place (all the more reason to have a testcase validating
>> that the optimization works correctly).
>>
>>
>> I think the patch itself is fine.  So further review is just for the
>> testcase and should be easy.
> rebased and add a testcase.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
> 
> 
> When mask is (1 << (prec - imm) - 1) which is used to clear upper bits
> of A, then it can be simplified to LSHIFTRT.
> 
> i.e Simplify
> (and:v8hi
>    (ashifrt:v8hi A 8)
>    (const_vector 0xff x8))
> to
> (lshifrt:v8hi A 8)
> 
> gcc/ChangeLog:
> 
> 	PR target/114428
> 	* simplify-rtx.cc
> 	(simplify_context::simplify_binary_operation_1):
> 	Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for
> 	specific mask.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/i386/pr114428-1.c: New test.
OK.

Being x264 related, I took a quick glance at RISC-V before/after and 
seems to be slightly better as well.

Jeff
Hongtao Liu June 6, 2024, 12:32 a.m. UTC | #2
On Wed, Jun 5, 2024 at 10:44 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 6/4/24 10:22 PM, liuhongt wrote:
> >> Can you add a testcase for this?  I don't mind if it's x86 specific and
> >> does a bit of asm scanning.
> >>
> >> Also note that the context for this patch has changed, so it won't
> >> automatically apply.  So be extra careful when updating so that it goes
> >> into the right place (all the more reason to have a testcase validating
> >> that the optimization works correctly).
> >>
> >>
> >> I think the patch itself is fine.  So further review is just for the
> >> testcase and should be easy.
> > rebased and add a testcase.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
> >
> >
> > When mask is (1 << (prec - imm) - 1) which is used to clear upper bits
> > of A, then it can be simplified to LSHIFTRT.
> >
> > i.e Simplify
> > (and:v8hi
> >    (ashifrt:v8hi A 8)
> >    (const_vector 0xff x8))
> > to
> > (lshifrt:v8hi A 8)
> >
> > gcc/ChangeLog:
> >
> >       PR target/114428
> >       * simplify-rtx.cc
> >       (simplify_context::simplify_binary_operation_1):
> >       Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for
> >       specific mask.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/i386/pr114428-1.c: New test.
> OK.
>
> Being x264 related, I took a quick glance at RISC-V before/after and
> seems to be slightly better as well.
Great, thanks for the review :)
>
> Jeff
diff mbox series

Patch

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 5caf1dfd957..05d410898b3 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -4050,6 +4050,31 @@  simplify_context::simplify_binary_operation_1 (rtx_code code,
 	    return tem;
 	}
 
+      /* (and:v4si
+	   (ashiftrt:v4si A 16)
+	   (const_vector: 0xffff x4))
+	 is just (lshiftrt:v4si A 16).  */
+      if (VECTOR_MODE_P (mode) && GET_CODE (op0) == ASHIFTRT
+	  && (CONST_INT_P (XEXP (op0, 1))
+	      || (GET_CODE (XEXP (op0, 1)) == CONST_VECTOR
+		  && CONST_VECTOR_DUPLICATE_P (XEXP (op0, 1))))
+	  && GET_CODE (op1) == CONST_VECTOR
+	  && CONST_VECTOR_DUPLICATE_P (op1))
+	{
+	  unsigned HOST_WIDE_INT shift_count
+	    = (CONST_INT_P (XEXP (op0, 1))
+	       ? UINTVAL (XEXP (op0, 1))
+	       : UINTVAL (XVECEXP (XEXP (op0, 1), 0, 0)));
+	  unsigned HOST_WIDE_INT inner_prec
+	    = GET_MODE_PRECISION (GET_MODE_INNER (mode));
+
+	  /* Avoid UD shift count.  */
+	  if (shift_count < inner_prec
+	      && (UINTVAL (XVECEXP (op1, 0, 0))
+		  == (HOST_WIDE_INT_1U << (inner_prec - shift_count)) - 1))
+	    return simplify_gen_binary (LSHIFTRT, mode, XEXP (op0, 0), XEXP (op0, 1));
+	}
+
       tem = simplify_byte_swapping_operation (code, mode, op0, op1);
       if (tem)
 	return tem;
diff --git a/gcc/testsuite/gcc.target/i386/pr114428-1.c b/gcc/testsuite/gcc.target/i386/pr114428-1.c
new file mode 100644
index 00000000000..927476f2269
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr114428-1.c
@@ -0,0 +1,39 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+/* { dg-final { scan-assembler-times "psrlw" 1 } } */
+/* { dg-final { scan-assembler-times "psrld" 1 } } */
+/* { dg-final { scan-assembler-times "psrlq" 1 { target { ! ia32 } } } } */
+
+
+#define SHIFTC 12
+
+typedef int v4si __attribute__((vector_size(16)));
+typedef short v8hi __attribute__((vector_size(16)));
+typedef long long v2di __attribute__((vector_size(16)));
+
+v8hi
+foo1 (v8hi a)
+{
+  return
+    (a >> (16 - SHIFTC)) & (__extension__(v8hi){(1<<SHIFTC)-1, (1<<SHIFTC)-1,
+						(1<<SHIFTC)-1,(1<<SHIFTC)-1,
+						(1<<SHIFTC)-1, (1<<SHIFTC)-1,
+						(1<<SHIFTC)-1,(1<<SHIFTC)-1});
+}
+
+v4si
+foo2 (v4si a)
+{
+  return
+    (a >> (32 - SHIFTC)) & (__extension__(v4si){(1<<SHIFTC)-1, (1<<SHIFTC)-1,
+						(1<<SHIFTC)-1,(1<<SHIFTC)-1});
+}
+
+v2di
+__attribute__((target("avx512vl")))
+foo3 (v2di a)
+{
+  return
+    (a >> (long long)(64 - SHIFTC)) & (__extension__(v2di){(1ULL<<SHIFTC)-1,
+							   (1ULL<<SHIFTC)-1});
+}