diff mbox series

i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]

Message ID CAMZc-bzvJp+bbZsrk2DxvR7Gb+TdAJVBMSr4MQ7DpMUCq_Y+Cg@mail.gmail.com
State New
Headers show
Series i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461] | expand

Commit Message

Hongtao Liu Jan. 4, 2021, 5:56 a.m. UTC
Hi:
  The following patch adds define_insn_and_split to optimize

       vpmovmskb       %xmm0, %eax
-       movzwl  %ax, %eax
        notl    %eax

  Bootstrapped/regtested on x86_64-linux-gnu {,-m32}.
  Ok for trunk?

gcc/ChangeLog
        PR target/98461
        * config/i386/sse.md (*sse2_pmovskb_zexthisi): New
        define_insn_and_split for zero_extend of subreg HI of pmovskb
        result.

gcc/testsuite/ChangeLog
        * gcc.target/i386/sse-pr98461-2.c: New test.
---
 gcc/config/i386/sse.md                         | 11 +++++++++++
 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c | 13 +++++++++++++
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c

Comments

Uros Bizjak Jan. 4, 2021, 7:40 a.m. UTC | #1
On Mon, Jan 4, 2021 at 6:54 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> Hi:
>   The following patch adds define_insn_and_split to optimize
>
>        vpmovmskb       %xmm0, %eax
> -       movzwl  %ax, %eax
>         notl    %eax
>
>   Bootstrapped/regtested on x86_64-linux-gnu {,-m32}.
>   Ok for trunk?
>
> gcc/ChangeLog
>         PR target/98461
>         * config/i386/sse.md (*sse2_pmovskb_zexthisi): New
>         define_insn_and_split for zero_extend of subreg HI of pmovskb
>         result.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/sse-pr98461-2.c: New test.
> ---
>  gcc/config/i386/sse.md                         | 11 +++++++++++
>  gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c | 13 +++++++++++++
>  2 files changed, 24 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index d84103807ff..4ed6b9ae476 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -16099,6 +16099,17 @@ (define_insn "*sse2_pmovmskb_ext"
>     (set_attr "prefix" "maybe_vex")
>     (set_attr "mode" "SI")])
>
> +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> +  [(set (match_operand:SI 0 "register_operand")
> +       (zero_extend:SI (subreg:HI (unspec:SI
> +         [(match_operand:V16QI 1 "register_operand")]
> +          UNSPEC_MOVMSK) 0)))]
> +  "TARGET_SSE2"

This needs ix86_pre_reload_split () in insn predicate.

Uros.

> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +       (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))])
> +
>  (define_split
>    [(set (match_operand:SI 0 "register_operand")
>         (unspec:SI
> diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> new file mode 100644
> index 00000000000..60fc1f3e9c1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> @@ -0,0 +1,13 @@
> +/* PR target/98461 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mno-sse3 -masm=att" } */
> +/* { dg-final { scan-assembler-times "\tpmovmskb\t" 1 } } */
> +/* { dg-final { scan-assembler-not "\tmovzwl" } } */
> +/* { dg-final { scan-assembler-times "\tnotl" 1 } } */
> +
> +#include <immintrin.h>
> +
> +unsigned int movemask_not1(__m128i logical) {
> +  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
> +  return ~res;
> +}
> --
> 2.18.1
>
>
> --
> BR,
> Hongtao
Hongtao Liu Jan. 4, 2021, 7:54 a.m. UTC | #2
On Mon, Jan 4, 2021 at 3:40 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Jan 4, 2021 at 6:54 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > Hi:
> >   The following patch adds define_insn_and_split to optimize
> >
> >        vpmovmskb       %xmm0, %eax
> > -       movzwl  %ax, %eax
> >         notl    %eax
> >
> >   Bootstrapped/regtested on x86_64-linux-gnu {,-m32}.
> >   Ok for trunk?
> >
> > gcc/ChangeLog
> >         PR target/98461
> >         * config/i386/sse.md (*sse2_pmovskb_zexthisi): New
> >         define_insn_and_split for zero_extend of subreg HI of pmovskb
> >         result.
> >
> > gcc/testsuite/ChangeLog
> >         * gcc.target/i386/sse-pr98461-2.c: New test.
> > ---
> >  gcc/config/i386/sse.md                         | 11 +++++++++++
> >  gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c | 13 +++++++++++++
> >  2 files changed, 24 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index d84103807ff..4ed6b9ae476 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -16099,6 +16099,17 @@ (define_insn "*sse2_pmovmskb_ext"
> >     (set_attr "prefix" "maybe_vex")
> >     (set_attr "mode" "SI")])
> >
> > +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> > +  [(set (match_operand:SI 0 "register_operand")
> > +       (zero_extend:SI (subreg:HI (unspec:SI
> > +         [(match_operand:V16QI 1 "register_operand")]
> > +          UNSPEC_MOVMSK) 0)))]
> > +  "TARGET_SSE2"
>
> This needs ix86_pre_reload_split () in insn predicate.
>

Yes, there's subreg in the pattern.
Assume patch is pre-approved with that change and
regtested/bootstrapped on x86_64-linux-gnu{-m32,}.

> Uros.
>
> > +  "#"
> > +  "&& 1"
> > +  [(set (match_dup 0)
> > +       (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))])
> > +
> >  (define_split
> >    [(set (match_operand:SI 0 "register_operand")
> >         (unspec:SI
> > diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> > b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> > new file mode 100644
> > index 00000000000..60fc1f3e9c1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> > @@ -0,0 +1,13 @@
> > +/* PR target/98461 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -msse2 -mno-sse3 -masm=att" } */
> > +/* { dg-final { scan-assembler-times "\tpmovmskb\t" 1 } } */
> > +/* { dg-final { scan-assembler-not "\tmovzwl" } } */
> > +/* { dg-final { scan-assembler-times "\tnotl" 1 } } */
> > +
> > +#include <immintrin.h>
> > +
> > +unsigned int movemask_not1(__m128i logical) {
> > +  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
> > +  return ~res;
> > +}
> > --
> > 2.18.1
> >
> >
> > --
> > BR,
> > Hongtao
Uros Bizjak Jan. 4, 2021, 8:42 a.m. UTC | #3
On Mon, Jan 4, 2021 at 8:52 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Jan 4, 2021 at 3:40 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Mon, Jan 4, 2021 at 6:54 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > Hi:
> > >   The following patch adds define_insn_and_split to optimize
> > >
> > >        vpmovmskb       %xmm0, %eax
> > > -       movzwl  %ax, %eax
> > >         notl    %eax
> > >
> > >   Bootstrapped/regtested on x86_64-linux-gnu {,-m32}.
> > >   Ok for trunk?
> > >
> > > gcc/ChangeLog
> > >         PR target/98461
> > >         * config/i386/sse.md (*sse2_pmovskb_zexthisi): New
> > >         define_insn_and_split for zero_extend of subreg HI of pmovskb
> > >         result.
> > >
> > > gcc/testsuite/ChangeLog
> > >         * gcc.target/i386/sse-pr98461-2.c: New test.
> > > ---
> > >  gcc/config/i386/sse.md                         | 11 +++++++++++
> > >  gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c | 13 +++++++++++++
> > >  2 files changed, 24 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> > >
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > index d84103807ff..4ed6b9ae476 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -16099,6 +16099,17 @@ (define_insn "*sse2_pmovmskb_ext"
> > >     (set_attr "prefix" "maybe_vex")
> > >     (set_attr "mode" "SI")])
> > >
> > > +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> > > +  [(set (match_operand:SI 0 "register_operand")
> > > +       (zero_extend:SI (subreg:HI (unspec:SI
> > > +         [(match_operand:V16QI 1 "register_operand")]
> > > +          UNSPEC_MOVMSK) 0)))]
> > > +  "TARGET_SSE2"
> >
> > This needs ix86_pre_reload_split () in insn predicate.
> >
>
> Yes, there's subreg in the pattern.

Also the insn pattern does not have operand constraints.

> Assume patch is pre-approved with that change and
> regtested/bootstrapped on x86_64-linux-gnu{-m32,}.

LGTM with the above addition.

Uros.
Jakub Jelinek Jan. 4, 2021, 8:49 a.m. UTC | #4
On Mon, Jan 04, 2021 at 01:56:44PM +0800, Hongtao Liu via Gcc-patches wrote:
> +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> +  [(set (match_operand:SI 0 "register_operand")
> +       (zero_extend:SI (subreg:HI (unspec:SI
> +         [(match_operand:V16QI 1 "register_operand")]
> +          UNSPEC_MOVMSK) 0)))]

Also, please fix up formatting.  Should be:
	(zero_extend:SI
	  (subreg:HI
	    (unspec:SI
	      [(match_operand:V16QI 1 "register_operand")]
	      UNSPEC_MOVMSK) 0)))]
I think.

	Jakub
Hongtao Liu Jan. 4, 2021, 8:59 a.m. UTC | #5
On Mon, Jan 4, 2021 at 4:49 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Jan 04, 2021 at 01:56:44PM +0800, Hongtao Liu via Gcc-patches wrote:
> > +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> > +  [(set (match_operand:SI 0 "register_operand")
> > +       (zero_extend:SI (subreg:HI (unspec:SI
> > +         [(match_operand:V16QI 1 "register_operand")]
> > +          UNSPEC_MOVMSK) 0)))]
>
> Also, please fix up formatting.  Should be:
>         (zero_extend:SI
>           (subreg:HI
>             (unspec:SI
>               [(match_operand:V16QI 1 "register_operand")]
>               UNSPEC_MOVMSK) 0)))]
> I think.
>
>         Jakub
>

Yes, thanks for the review both, and happy new year!
Hongtao Liu Jan. 5, 2021, 6:32 a.m. UTC | #6
On Mon, Jan 4, 2021 at 4:59 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Jan 4, 2021 at 4:49 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Mon, Jan 04, 2021 at 01:56:44PM +0800, Hongtao Liu via Gcc-patches wrote:
> > > +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> > > +  [(set (match_operand:SI 0 "register_operand")
> > > +       (zero_extend:SI (subreg:HI (unspec:SI
> > > +         [(match_operand:V16QI 1 "register_operand")]
> > > +          UNSPEC_MOVMSK) 0)))]
> >
> > Also, please fix up formatting.  Should be:
> >         (zero_extend:SI
> >           (subreg:HI
> >             (unspec:SI
> >               [(match_operand:V16QI 1 "register_operand")]
> >               UNSPEC_MOVMSK) 0)))]
> > I think.
> >
> >         Jakub
> >
>
> Yes, thanks for the review both, and happy new year!
>
> --
> BR,
> Hongtao

Sorry for the bother, this is an incremental patch to split
(zero_extend:SI (not:HI (subreg:HI (pmovmskb result:SI)))) to

        pmovmskb        %xmm0, %eax
-       notl    %eax
-       movzwl  %ax, %eax
+       xorl    $65535, %eax


The patch is below, regtestes and bootstrapped on x86_64-linux-gnu{-m32,}.
  Ok for trunk?

The following patch adds define_insn_and_split to optimize

       vpmovmskb       %xmm0, %eax
-       movzwl  %ax, %eax
        notl    %eax

and combine splitter to optimize

        pmovmskb        %xmm0, %eax
-       notl    %eax
-       movzwl  %ax, %eax
+       xorl    $65535, %eax

gcc/ChangeLog
        PR target/98461
        * config/i386/sse.md (*sse2_pmovskb_zexthisi): New
        define_insn_and_split for zero_extend of subreg HI of pmovskb
        result.
        (*sse2_pmovskb_zexthisi): Add new combine splitters for
        zero_extend of not of subreg HI of pmovskb result.

gcc/testsuite/ChangeLog
        * gcc.target/i386/sse-pr98461-2.c: New test.
---
 gcc/config/i386/sse.md                        | 32 +++++++++++++++++++
 .../gcc.target/i386/sse2-pr98461-2.c          | 25 +++++++++++++++
 2 files changed, 57 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d84103807ff..4fcff0800c0 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -16099,6 +16099,38 @@ (define_insn "*sse2_pmovmskb_ext"
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SI")])

+(define_insn_and_split "*sse2_pmovskb_zexthisi"
+  [(set (match_operand:SI 0 "register_operand")
+        (zero_extend:SI
+          (subreg:HI
+            (unspec:SI
+              [(match_operand:V16QI 1 "register_operand")]
+              UNSPEC_MOVMSK) 0)))]
+  "TARGET_SSE2 && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+        (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))])
+
+(define_split
+  [(set (match_operand:SI 0 "register_operand")
+        (zero_extend:SI
+          (not:HI
+            (subreg:HI
+              (unspec:SI
+                [(match_operand:V16QI 1 "register_operand")]
+                UNSPEC_MOVMSK) 0))))]
+  "TARGET_SSE2"
+  [(set (match_dup 2)
+        (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
+   (set (match_dup 0)
+        (match_dup 3))]
+{
+  operands[2] = gen_reg_rtx (SImode);
+  operands[3] = gen_int_mode ((HOST_WIDE_INT_1 << 16) - 1, SImode);
+  operands[3] = gen_rtx_XOR (SImode, operands[2], operands[3]);
+})
+
 (define_split
   [(set (match_operand:SI 0 "register_operand")
         (unspec:SI
diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
new file mode 100644
index 00000000000..330272c69bc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
@@ -0,0 +1,25 @@
+/* PR target/98461 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-sse3 -masm=att" } */
+/* { dg-final { scan-assembler-times "\tpmovmskb\t" 3 } } */
+/* { dg-final { scan-assembler-not "\tmovzwl" } } */
+/* { dg-final { scan-assembler-times "\tnotl" 1 } } *
+/* { dg-final { scan-assembler-times "\txorl" 1 } } */
+
+#include <immintrin.h>
+
+unsigned int movemask_not1(__m128i logical) {
+  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
+  return ~res;
+}
+
+unsigned int movemask_not2(__m128i logical) {
+  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
+  res = ~res;
+  return res;
+}
+
+unsigned int movemask_zero_extend(__m128i logical) {
+  unsigned int res = _mm_movemask_epi8(logical);
+  return res & 0xffff;
+}
Uros Bizjak Jan. 5, 2021, 7:04 a.m. UTC | #7
On Tue, Jan 5, 2021 at 7:30 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Jan 4, 2021 at 4:59 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Mon, Jan 4, 2021 at 4:49 PM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Mon, Jan 04, 2021 at 01:56:44PM +0800, Hongtao Liu via Gcc-patches wrote:
> > > > +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> > > > +  [(set (match_operand:SI 0 "register_operand")
> > > > +       (zero_extend:SI (subreg:HI (unspec:SI
> > > > +         [(match_operand:V16QI 1 "register_operand")]
> > > > +          UNSPEC_MOVMSK) 0)))]
> > >
> > > Also, please fix up formatting.  Should be:
> > >         (zero_extend:SI
> > >           (subreg:HI
> > >             (unspec:SI
> > >               [(match_operand:V16QI 1 "register_operand")]
> > >               UNSPEC_MOVMSK) 0)))]
> > > I think.
> > >
> > >         Jakub
> > >
> >
> > Yes, thanks for the review both, and happy new year!
> >
> > --
> > BR,
> > Hongtao
>
> Sorry for the bother, this is an incremental patch to split
> (zero_extend:SI (not:HI (subreg:HI (pmovmskb result:SI)))) to
>
>         pmovmskb        %xmm0, %eax
> -       notl    %eax
> -       movzwl  %ax, %eax
> +       xorl    $65535, %eax
>
>
> The patch is below, regtestes and bootstrapped on x86_64-linux-gnu{-m32,}.
>   Ok for trunk?
>
> The following patch adds define_insn_and_split to optimize
>
>        vpmovmskb       %xmm0, %eax
> -       movzwl  %ax, %eax
>         notl    %eax
>
> and combine splitter to optimize
>
>         pmovmskb        %xmm0, %eax
> -       notl    %eax
> -       movzwl  %ax, %eax
> +       xorl    $65535, %eax
>
> gcc/ChangeLog
>         PR target/98461
>         * config/i386/sse.md (*sse2_pmovskb_zexthisi): New
>         define_insn_and_split for zero_extend of subreg HI of pmovskb
>         result.
>         (*sse2_pmovskb_zexthisi): Add new combine splitters for
>         zero_extend of not of subreg HI of pmovskb result.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/sse-pr98461-2.c: New test.
> ---
>  gcc/config/i386/sse.md                        | 32 +++++++++++++++++++
>  .../gcc.target/i386/sse2-pr98461-2.c          | 25 +++++++++++++++
>  2 files changed, 57 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index d84103807ff..4fcff0800c0 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -16099,6 +16099,38 @@ (define_insn "*sse2_pmovmskb_ext"
>     (set_attr "prefix" "maybe_vex")
>     (set_attr "mode" "SI")])
>
> +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> +  [(set (match_operand:SI 0 "register_operand")
> +        (zero_extend:SI
> +          (subreg:HI
> +            (unspec:SI
> +              [(match_operand:V16QI 1 "register_operand")]
> +              UNSPEC_MOVMSK) 0)))]
> +  "TARGET_SSE2 && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +        (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))])
> +
> +(define_split
> +  [(set (match_operand:SI 0 "register_operand")
> +        (zero_extend:SI
> +          (not:HI
> +            (subreg:HI
> +              (unspec:SI
> +                [(match_operand:V16QI 1 "register_operand")]
> +                UNSPEC_MOVMSK) 0))))]
> +  "TARGET_SSE2"
> +  [(set (match_dup 2)
> +        (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
> +   (set (match_dup 0)
> +        (match_dup 3))]

Just write:

(set (match_dup 0)
    (xor:SI (match_dup 2)(const_int 65535))

Uros.

>
> +{
> +  operands[2] = gen_reg_rtx (SImode);
> +  operands[3] = gen_int_mode ((HOST_WIDE_INT_1 << 16) - 1, SImode);
> +  operands[3] = gen_rtx_XOR (SImode, operands[2], operands[3]);
> +})
> +
>  (define_split
>    [(set (match_operand:SI 0 "register_operand")
>          (unspec:SI
> diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> new file mode 100644
> index 00000000000..330272c69bc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> @@ -0,0 +1,25 @@
> +/* PR target/98461 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mno-sse3 -masm=att" } */
> +/* { dg-final { scan-assembler-times "\tpmovmskb\t" 3 } } */
> +/* { dg-final { scan-assembler-not "\tmovzwl" } } */
> +/* { dg-final { scan-assembler-times "\tnotl" 1 } } *
> +/* { dg-final { scan-assembler-times "\txorl" 1 } } */
> +
> +#include <immintrin.h>
> +
> +unsigned int movemask_not1(__m128i logical) {
> +  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
> +  return ~res;
> +}
> +
> +unsigned int movemask_not2(__m128i logical) {
> +  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
> +  res = ~res;
> +  return res;
> +}
> +
> +unsigned int movemask_zero_extend(__m128i logical) {
> +  unsigned int res = _mm_movemask_epi8(logical);
> +  return res & 0xffff;
> +}
> --
> 2.18.1
>
>
> --
> BR,
> Hongtao
Uros Bizjak Jan. 5, 2021, 7:19 a.m. UTC | #8
On Tue, Jan 5, 2021 at 8:04 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > +(define_split
> > +  [(set (match_operand:SI 0 "register_operand")
> > +        (zero_extend:SI
> > +          (not:HI
> > +            (subreg:HI
> > +              (unspec:SI
> > +                [(match_operand:V16QI 1 "register_operand")]
> > +                UNSPEC_MOVMSK) 0))))]
> > +  "TARGET_SSE2"
> > +  [(set (match_dup 2)
> > +        (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
> > +   (set (match_dup 0)
> > +        (match_dup 3))]
>
> Just write:
>
> (set (match_dup 0)
>     (xor:SI (match_dup 2)(const_int 65535))

BTW: This could be a universal combine splitter to simplify

unsigned int foo (unsigned short z)
{
    return (unsigned short)~z;
}

Trying 7 -> 8:
   7: r87:HI=~r88:SI#0
     REG_DEAD r88:SI
   8: r86:SI=zero_extend(r87:HI)
     REG_DEAD r87:HI
Failed to match this instruction:
(set (reg:SI 86)
   (zero_extend:SI (not:HI (subreg:HI (reg:SI 88) 0))))

But combine does not "split" to one insns.

Uros.
Hongtao Liu Jan. 5, 2021, 10:28 a.m. UTC | #9
On Tue, Jan 5, 2021 at 3:20 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Tue, Jan 5, 2021 at 8:04 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > +(define_split
> > > +  [(set (match_operand:SI 0 "register_operand")
> > > +        (zero_extend:SI
> > > +          (not:HI
> > > +            (subreg:HI
> > > +              (unspec:SI
> > > +                [(match_operand:V16QI 1 "register_operand")]
> > > +                UNSPEC_MOVMSK) 0))))]
> > > +  "TARGET_SSE2"
> > > +  [(set (match_dup 2)
> > > +        (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
> > > +   (set (match_dup 0)
> > > +        (match_dup 3))]
> >
> > Just write:
> >
> > (set (match_dup 0)
> >     (xor:SI (match_dup 2)(const_int 65535))
>

Yes, changed.

> BTW: This could be a universal combine splitter to simplify
>
> unsigned int foo (unsigned short z)
> {
>     return (unsigned short)~z;
> }
>
> Trying 7 -> 8:
>    7: r87:HI=~r88:SI#0
>      REG_DEAD r88:SI
>    8: r86:SI=zero_extend(r87:HI)
>      REG_DEAD r87:HI
> Failed to match this instruction:
> (set (reg:SI 86)
>    (zero_extend:SI (not:HI (subreg:HI (reg:SI 88) 0))))
>
> But combine does not "split" to one insns.

Yes, according to PSabi, the top half of the register is not
necessarily 0, so if you add the splitter, it just changes from notl +
movzwl to xor + movzwl, which doesn't look better?

>
> Uros.
Uros Bizjak Jan. 5, 2021, 10:30 a.m. UTC | #10
On Tue, Jan 5, 2021 at 11:25 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Tue, Jan 5, 2021 at 3:20 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Tue, Jan 5, 2021 at 8:04 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > +(define_split
> > > > +  [(set (match_operand:SI 0 "register_operand")
> > > > +        (zero_extend:SI
> > > > +          (not:HI
> > > > +            (subreg:HI
> > > > +              (unspec:SI
> > > > +                [(match_operand:V16QI 1 "register_operand")]
> > > > +                UNSPEC_MOVMSK) 0))))]
> > > > +  "TARGET_SSE2"
> > > > +  [(set (match_dup 2)
> > > > +        (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
> > > > +   (set (match_dup 0)
> > > > +        (match_dup 3))]
> > >
> > > Just write:
> > >
> > > (set (match_dup 0)
> > >     (xor:SI (match_dup 2)(const_int 65535))
> >
>
> Yes, changed.
>
> > BTW: This could be a universal combine splitter to simplify
> >
> > unsigned int foo (unsigned short z)
> > {
> >     return (unsigned short)~z;
> > }
> >
> > Trying 7 -> 8:
> >    7: r87:HI=~r88:SI#0
> >      REG_DEAD r88:SI
> >    8: r86:SI=zero_extend(r87:HI)
> >      REG_DEAD r87:HI
> > Failed to match this instruction:
> > (set (reg:SI 86)
> >    (zero_extend:SI (not:HI (subreg:HI (reg:SI 88) 0))))
> >
> > But combine does not "split" to one insns.
>
> Yes, according to PSabi, the top half of the register is not
> necessarily 0, so if you add the splitter, it just changes from notl +
> movzwl to xor + movzwl, which doesn't look better?

Indeed.

The patch is OK.

Uros.
diff mbox series

Patch

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d84103807ff..4ed6b9ae476 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -16099,6 +16099,17 @@  (define_insn "*sse2_pmovmskb_ext"
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SI")])

+(define_insn_and_split "*sse2_pmovskb_zexthisi"
+  [(set (match_operand:SI 0 "register_operand")
+       (zero_extend:SI (subreg:HI (unspec:SI
+         [(match_operand:V16QI 1 "register_operand")]
+          UNSPEC_MOVMSK) 0)))]
+  "TARGET_SSE2"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+       (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))])
+
 (define_split
   [(set (match_operand:SI 0 "register_operand")
        (unspec:SI
diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
new file mode 100644
index 00000000000..60fc1f3e9c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
@@ -0,0 +1,13 @@ 
+/* PR target/98461 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-sse3 -masm=att" } */
+/* { dg-final { scan-assembler-times "\tpmovmskb\t" 1 } } */
+/* { dg-final { scan-assembler-not "\tmovzwl" } } */
+/* { dg-final { scan-assembler-times "\tnotl" 1 } } */
+
+#include <immintrin.h>
+
+unsigned int movemask_not1(__m128i logical) {
+  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
+  return ~res;
+}