Message ID | 20240605042217.563013-1-hongtao.liu@intel.com |
---|---|
State | New |
Headers | show |
Series | [V2] Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode. | expand |
On 6/4/24 10:22 PM, liuhongt wrote: >> Can you add a testcase for this? I don't mind if it's x86 specific and >> does a bit of asm scanning. >> >> Also note that the context for this patch has changed, so it won't >> automatically apply. So be extra careful when updating so that it goes >> into the right place (all the more reason to have a testcase validating >> that the optimization works correctly). >> >> >> I think the patch itself is fine. So further review is just for the >> testcase and should be easy. > rebased and add a testcase. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > > When mask is (1 << (prec - imm) - 1) which is used to clear upper bits > of A, then it can be simplified to LSHIFTRT. > > i.e Simplify > (and:v8hi > (ashifrt:v8hi A 8) > (const_vector 0xff x8)) > to > (lshifrt:v8hi A 8) > > gcc/ChangeLog: > > PR target/114428 > * simplify-rtx.cc > (simplify_context::simplify_binary_operation_1): > Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for > specific mask. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr114428-1.c: New test. OK. Being x264 related, I took a quick glance at RISC-V before/after and seems to be slightly better as well. Jeff
On Wed, Jun 5, 2024 at 10:44 PM Jeff Law <jeffreyalaw@gmail.com> wrote: > > > > On 6/4/24 10:22 PM, liuhongt wrote: > >> Can you add a testcase for this? I don't mind if it's x86 specific and > >> does a bit of asm scanning. > >> > >> Also note that the context for this patch has changed, so it won't > >> automatically apply. So be extra careful when updating so that it goes > >> into the right place (all the more reason to have a testcase validating > >> that the optimization works correctly). > >> > >> > >> I think the patch itself is fine. So further review is just for the > >> testcase and should be easy. > > rebased and add a testcase. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ok for trunk? > > > > > > When mask is (1 << (prec - imm) - 1) which is used to clear upper bits > > of A, then it can be simplified to LSHIFTRT. > > > > i.e Simplify > > (and:v8hi > > (ashifrt:v8hi A 8) > > (const_vector 0xff x8)) > > to > > (lshifrt:v8hi A 8) > > > > gcc/ChangeLog: > > > > PR target/114428 > > * simplify-rtx.cc > > (simplify_context::simplify_binary_operation_1): > > Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for > > specific mask. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr114428-1.c: New test. > OK. > > Being x264 related, I took a quick glance at RISC-V before/after and > seems to be slightly better as well. Great, thanks for the review :) > > Jeff
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 5caf1dfd957..05d410898b3 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -4050,6 +4050,31 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, return tem; } + /* (and:v4si + (ashiftrt:v4si A 16) + (const_vector: 0xffff x4)) + is just (lshiftrt:v4si A 16). */ + if (VECTOR_MODE_P (mode) && GET_CODE (op0) == ASHIFTRT + && (CONST_INT_P (XEXP (op0, 1)) + || (GET_CODE (XEXP (op0, 1)) == CONST_VECTOR + && CONST_VECTOR_DUPLICATE_P (XEXP (op0, 1)))) + && GET_CODE (op1) == CONST_VECTOR + && CONST_VECTOR_DUPLICATE_P (op1)) + { + unsigned HOST_WIDE_INT shift_count + = (CONST_INT_P (XEXP (op0, 1)) + ? UINTVAL (XEXP (op0, 1)) + : UINTVAL (XVECEXP (XEXP (op0, 1), 0, 0))); + unsigned HOST_WIDE_INT inner_prec + = GET_MODE_PRECISION (GET_MODE_INNER (mode)); + + /* Avoid UD shift count. */ + if (shift_count < inner_prec + && (UINTVAL (XVECEXP (op1, 0, 0)) + == (HOST_WIDE_INT_1U << (inner_prec - shift_count)) - 1)) + return simplify_gen_binary (LSHIFTRT, mode, XEXP (op0, 0), XEXP (op0, 1)); + } + tem = simplify_byte_swapping_operation (code, mode, op0, op1); if (tem) return tem; diff --git a/gcc/testsuite/gcc.target/i386/pr114428-1.c b/gcc/testsuite/gcc.target/i386/pr114428-1.c new file mode 100644 index 00000000000..927476f2269 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr114428-1.c @@ -0,0 +1,39 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2" } */ +/* { dg-final { scan-assembler-times "psrlw" 1 } } */ +/* { dg-final { scan-assembler-times "psrld" 1 } } */ +/* { dg-final { scan-assembler-times "psrlq" 1 { target { ! ia32 } } } } */ + + +#define SHIFTC 12 + +typedef int v4si __attribute__((vector_size(16))); +typedef short v8hi __attribute__((vector_size(16))); +typedef long long v2di __attribute__((vector_size(16))); + +v8hi +foo1 (v8hi a) +{ + return + (a >> (16 - SHIFTC)) & (__extension__(v8hi){(1<<SHIFTC)-1, (1<<SHIFTC)-1, + (1<<SHIFTC)-1,(1<<SHIFTC)-1, + (1<<SHIFTC)-1, (1<<SHIFTC)-1, + (1<<SHIFTC)-1,(1<<SHIFTC)-1}); +} + +v4si +foo2 (v4si a) +{ + return + (a >> (32 - SHIFTC)) & (__extension__(v4si){(1<<SHIFTC)-1, (1<<SHIFTC)-1, + (1<<SHIFTC)-1,(1<<SHIFTC)-1}); +} + +v2di +__attribute__((target("avx512vl"))) +foo3 (v2di a) +{ + return + (a >> (long long)(64 - SHIFTC)) & (__extension__(v2di){(1ULL<<SHIFTC)-1, + (1ULL<<SHIFTC)-1}); +}