Message ID | 20240515100534.4120288-1-hongtao.liu@intel.com |
---|---|
State | New |
Headers | show |
Series | [x86] Set d.one_operand_p to true when TARGET_SSSE3 in ix86_expand_vecop_qihi_partial. | expand |
On Wed, May 15, 2024 at 12:05 PM liuhongt <hongtao.liu@intel.com> wrote: > > pshufb is available under TARGET_SSSE3, so > ix86_expand_vec_perm_const_1 must return true when TARGET_SSSE3. > w/o TARGET_SSSE3, if we set one_operand_p to true, ix86_expand_vec_perm_const_1 could return false. > > With the patch under -march=x86-64-v2 > > v8qi > foo (v8qi a) > { > return a >> 5; > } > > < pmovsxbw %xmm0, %xmm0 > < psraw $5, %xmm0 > < pshufb .LC0(%rip), %xmm0 > --- > > movdqa %xmm0, %xmm1 > > pcmpeqd %xmm0, %xmm0 > > pmovsxbw %xmm1, %xmm1 > > psrlw $8, %xmm0 > > psraw $5, %xmm1 > > pand %xmm1, %xmm0 > > packuswb %xmm0, %xmm0 > > Although there's a memory load from constant pool, but it should be > better when it's inside a loop. The load from constant pool can be > hoist out. it's 1 instruction vs 4 instructions. > > < pshufb .LC0(%rip), %xmm0 > > vs. > > > pcmpeqd %xmm0, %xmm0 > > psrlw $8, %xmm0 > > pand %xmm1, %xmm0 > > packuswb %xmm0, %xmm0 > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk. > > gcc/ChangeLog: > > PR target/114514 > * config/i386/i386-expand.cc (ix86_expand_vecop_qihi_partial): > Set d.one_operand_p to true when TARGET_SSSE3. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr114514-shufb.c: New test. LGTM. Thanks, Uros. > --- > gcc/config/i386/i386-expand.cc | 2 +- > .../gcc.target/i386/pr114514-shufb.c | 35 +++++++++++++++++++ > 2 files changed, 36 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr114514-shufb.c > > diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc > index ab6631f51e3..ae2e9ab4e05 100644 > --- a/gcc/config/i386/i386-expand.cc > +++ b/gcc/config/i386/i386-expand.cc > @@ -24394,7 +24394,7 @@ ix86_expand_vecop_qihi_partial (enum rtx_code code, rtx dest, rtx op1, rtx op2) > d.op0 = d.op1 = qres; > d.vmode = V16QImode; > d.nelt = 16; > - d.one_operand_p = false; > + d.one_operand_p = TARGET_SSSE3; > d.testing_p = false; > > for (i = 0; i < d.nelt; ++i) > diff --git a/gcc/testsuite/gcc.target/i386/pr114514-shufb.c b/gcc/testsuite/gcc.target/i386/pr114514-shufb.c > new file mode 100644 > index 00000000000..71fdc9d8daf > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr114514-shufb.c > @@ -0,0 +1,35 @@ > +/* { dg-do compile } */ > +/* { dg-options "-msse4.1 -O2 -mno-avx512f" } */ > +/* { dg-final { scan-assembler-not "packuswb" } } */ > +/* { dg-final { scan-assembler-times "pshufb" 4 { target { ! ia32 } } } } */ > +/* { dg-final { scan-assembler-times "pshufb" 6 { target ia32 } } } */ > + > +typedef unsigned char v8uqi __attribute__((vector_size(8))); > +typedef char v8qi __attribute__((vector_size(8))); > +typedef unsigned char v4uqi __attribute__((vector_size(4))); > +typedef char v4qi __attribute__((vector_size(4))); > + > +v8qi > +foo (v8qi a) > +{ > + return a >> 5; > +} > + > +v8uqi > +foo1 (v8uqi a) > +{ > + return a >> 5; > +} > + > +v4qi > +foo2 (v4qi a) > +{ > + return a >> 5; > +} > + > +v4uqi > +foo3 (v4uqi a) > +{ > + return a >> 5; > +} > + > -- > 2.31.1 >
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index ab6631f51e3..ae2e9ab4e05 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -24394,7 +24394,7 @@ ix86_expand_vecop_qihi_partial (enum rtx_code code, rtx dest, rtx op1, rtx op2) d.op0 = d.op1 = qres; d.vmode = V16QImode; d.nelt = 16; - d.one_operand_p = false; + d.one_operand_p = TARGET_SSSE3; d.testing_p = false; for (i = 0; i < d.nelt; ++i) diff --git a/gcc/testsuite/gcc.target/i386/pr114514-shufb.c b/gcc/testsuite/gcc.target/i386/pr114514-shufb.c new file mode 100644 index 00000000000..71fdc9d8daf --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr114514-shufb.c @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-options "-msse4.1 -O2 -mno-avx512f" } */ +/* { dg-final { scan-assembler-not "packuswb" } } */ +/* { dg-final { scan-assembler-times "pshufb" 4 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "pshufb" 6 { target ia32 } } } */ + +typedef unsigned char v8uqi __attribute__((vector_size(8))); +typedef char v8qi __attribute__((vector_size(8))); +typedef unsigned char v4uqi __attribute__((vector_size(4))); +typedef char v4qi __attribute__((vector_size(4))); + +v8qi +foo (v8qi a) +{ + return a >> 5; +} + +v8uqi +foo1 (v8uqi a) +{ + return a >> 5; +} + +v4qi +foo2 (v4qi a) +{ + return a >> 5; +} + +v4uqi +foo3 (v4uqi a) +{ + return a >> 5; +} +