Message ID | CAMZc-bz3nqJmZ-042cYAzTw7tquvwE0fc-MOG73s-r+C+BKX6Q@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | [AVX512] Lower AVX512 vector compare to AVX version when dest is vector | expand |
On Wed, Sep 2, 2020 at 2:33 AM Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi: > Add define_peephole2 to eliminate potential redundant conversion > from mask to vector. > Bootstrap is ok, regression test is ok for i386/x86-64 backend. > Ok for trunk? > > gcc/ChangeLog: > PR target/96891 > * config/i386/sse.md (VI_128_256): New mode iterator. > (define_peephole2): Lower avx512 vector compare to avx version > when dest is vector. > > gcc/testsuite/ChangeLog: Missing PR target/96891 > * gcc.target/i386/avx512bw-pr96891-1.c: New test. > * gcc.target/i386/avx512f-pr96891-1.c: New test. > * gcc.target/i386/avx512f-pr96891-2.c: New test. > > -- > BR, > Hongtao
On 9/2/20 3:34 AM, Hongtao Liu via Gcc-patches wrote: > Hi: > Add define_peephole2 to eliminate potential redundant conversion > from mask to vector. > Bootstrap is ok, regression test is ok for i386/x86-64 backend. > Ok for trunk? > > gcc/ChangeLog: > PR target/96891 > * config/i386/sse.md (VI_128_256): New mode iterator. > (define_peephole2): Lower avx512 vector compare to avx version > when dest is vector. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/avx512bw-pr96891-1.c: New test. > * gcc.target/i386/avx512f-pr96891-1.c: New test. > * gcc.target/i386/avx512f-pr96891-2.c: New test. Aren't these the two insns in question: (insn 7 4 8 2 (set (reg:QI 86) (unspec:QI [ (reg:V8SF 90) (reg:V8SF 89) (const_int 2 [0x2]) ] UNSPEC_PCMP)) "j.c":4:14 1911 {avx512vl_cmpv8sf3} (expr_list:REG_DEAD (reg:V8SF 90) (expr_list:REG_DEAD (reg:V8SF 89) (nil)))) (note 8 7 9 2 NOTE_INSN_DELETED) (insn 9 8 14 2 (set (reg:V8SI 82 [ _2 ]) (vec_merge:V8SI (const_vector:V8SI [ (const_int -1 [0xffffffffffffffff]) repeated x8 ]) (const_vector:V8SI [ (const_int 0 [0]) repeated x8 ]) (reg:QI 86))) "j.c":4:14 2705 {*avx512vl_cvtmask2dv8si} (expr_list:REG_DEAD (reg:QI 86) (nil))) Note there's a data dependency between them. insn 7 feeds insn 9. When there's a data dependency, combiner patterns are usually the better choice than peepholes. I think you'd be looking to match something likethis (from the . combine dump): (set (reg:V8SI 82 [ _2 ]) (vec_merge:V8SI (const_vector:V8SI [ (const_int -1 [0xffffffffffffffff]) repeated x8 ]) (const_vector:V8SI [ (const_int 0 [0]) repeated x8 ]) (unspec:QI [ (reg:V8SF 90) (reg:V8SF 89) (const_int 2 [0x2]) ] UNSPEC_PCMP))) Jeff
On Tue, Nov 17, 2020 at 8:05 AM Jeff Law <law@redhat.com> wrote: > > > On 9/2/20 3:34 AM, Hongtao Liu via Gcc-patches wrote: > > Hi: > > Add define_peephole2 to eliminate potential redundant conversion > > from mask to vector. > > Bootstrap is ok, regression test is ok for i386/x86-64 backend. > > Ok for trunk? > > > > gcc/ChangeLog: > > PR target/96891 > > * config/i386/sse.md (VI_128_256): New mode iterator. > > (define_peephole2): Lower avx512 vector compare to avx version > > when dest is vector. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/avx512bw-pr96891-1.c: New test. > > * gcc.target/i386/avx512f-pr96891-1.c: New test. > > * gcc.target/i386/avx512f-pr96891-2.c: New test. > > Aren't these the two insns in question: > > > (insn 7 4 8 2 (set (reg:QI 86) > (unspec:QI [ > (reg:V8SF 90) > (reg:V8SF 89) > (const_int 2 [0x2]) > ] UNSPEC_PCMP)) "j.c":4:14 1911 {avx512vl_cmpv8sf3} > (expr_list:REG_DEAD (reg:V8SF 90) > (expr_list:REG_DEAD (reg:V8SF 89) > (nil)))) > (note 8 7 9 2 NOTE_INSN_DELETED) > (insn 9 8 14 2 (set (reg:V8SI 82 [ _2 ]) > (vec_merge:V8SI (const_vector:V8SI [ > (const_int -1 [0xffffffffffffffff]) repeated x8 > ]) > (const_vector:V8SI [ > (const_int 0 [0]) repeated x8 > ]) > (reg:QI 86))) "j.c":4:14 2705 {*avx512vl_cvtmask2dv8si} > (expr_list:REG_DEAD (reg:QI 86) > (nil))) > > > Note there's a data dependency between them. insn 7 feeds insn 9. When > there's a data dependency, combiner patterns are usually the better > choice than peepholes. I think you'd be looking to match something > likethis (from the . combine dump): > > (set (reg:V8SI 82 [ _2 ]) > (vec_merge:V8SI (const_vector:V8SI [ > (const_int -1 [0xffffffffffffffff]) repeated x8 > ]) > (const_vector:V8SI [ > (const_int 0 [0]) repeated x8 > ]) > (unspec:QI [ > (reg:V8SF 90) > (reg:V8SF 89) > (const_int 2 [0x2]) > ] UNSPEC_PCMP))) > > > Jeff > Yes, as discussed in [1], maybe it's better to refactor avx512 integer mask with VnBImode. Then unspec_pcmp could be dropped and simplify_rtx could handle vector comparison more effectively. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521#c4
On 11/16/20 8:10 PM, Hongtao Liu wrote: > On Tue, Nov 17, 2020 at 8:05 AM Jeff Law <law@redhat.com> wrote: >> >> On 9/2/20 3:34 AM, Hongtao Liu via Gcc-patches wrote: >>> Hi: >>> Add define_peephole2 to eliminate potential redundant conversion >>> from mask to vector. >>> Bootstrap is ok, regression test is ok for i386/x86-64 backend. >>> Ok for trunk? >>> >>> gcc/ChangeLog: >>> PR target/96891 >>> * config/i386/sse.md (VI_128_256): New mode iterator. >>> (define_peephole2): Lower avx512 vector compare to avx version >>> when dest is vector. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/i386/avx512bw-pr96891-1.c: New test. >>> * gcc.target/i386/avx512f-pr96891-1.c: New test. >>> * gcc.target/i386/avx512f-pr96891-2.c: New test. >> Aren't these the two insns in question: >> >> >> (insn 7 4 8 2 (set (reg:QI 86) >> (unspec:QI [ >> (reg:V8SF 90) >> (reg:V8SF 89) >> (const_int 2 [0x2]) >> ] UNSPEC_PCMP)) "j.c":4:14 1911 {avx512vl_cmpv8sf3} >> (expr_list:REG_DEAD (reg:V8SF 90) >> (expr_list:REG_DEAD (reg:V8SF 89) >> (nil)))) >> (note 8 7 9 2 NOTE_INSN_DELETED) >> (insn 9 8 14 2 (set (reg:V8SI 82 [ _2 ]) >> (vec_merge:V8SI (const_vector:V8SI [ >> (const_int -1 [0xffffffffffffffff]) repeated x8 >> ]) >> (const_vector:V8SI [ >> (const_int 0 [0]) repeated x8 >> ]) >> (reg:QI 86))) "j.c":4:14 2705 {*avx512vl_cvtmask2dv8si} >> (expr_list:REG_DEAD (reg:QI 86) >> (nil))) >> >> >> Note there's a data dependency between them. insn 7 feeds insn 9. When >> there's a data dependency, combiner patterns are usually the better >> choice than peepholes. I think you'd be looking to match something >> likethis (from the . combine dump): >> >> (set (reg:V8SI 82 [ _2 ]) >> (vec_merge:V8SI (const_vector:V8SI [ >> (const_int -1 [0xffffffffffffffff]) repeated x8 >> ]) >> (const_vector:V8SI [ >> (const_int 0 [0]) repeated x8 >> ]) >> (unspec:QI [ >> (reg:V8SF 90) >> (reg:V8SF 89) >> (const_int 2 [0x2]) >> ] UNSPEC_PCMP))) >> >> >> Jeff >> > Yes, as discussed in [1], maybe it's better to refactor avx512 integer > mask with VnBImode. Then unspec_pcmp could be dropped and simplify_rtx > could handle vector comparison more effectively. > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521#c4 Thanks for the pointer. I didn't realize this patch was essentially abandoned. Jeff
> >> > >> Note there's a data dependency between them. insn 7 feeds insn 9. When > >> there's a data dependency, combiner patterns are usually the better > >> choice than peepholes. I think you'd be looking to match something > >> likethis (from the . combine dump): > >> Using combiner patterns, details is discussed in PR98348 Boottrapped and regtested on x86_64-linux-gnu{-m32,} for both GCC10 and trunk. gcc/ChangeLog: PR target/96891 PR target/98348 * config/i386/sse.md (VI_128_256): New mode iterator. (*avx_cmp<mode>3_1, *avx_cmp<mode>3_2, *avx_cmp<mode>3_3, *avx_cmp<mode>3_4, *avx2_eq<mode>3, *avx2_pcmp<mode>3_1, *avx2_pcmp<mode>3_2, *avx2_gt<mode>3): New define_insn_and_split to lower avx512 vector comparison to avx version when dest is vector. (*<avx512>_cmp<mode>3,*<avx512>_cmp<mode>3,*<avx512>_ucmp<mode>3): define_insn_and_split for negating the comparison result. * config/i386/predicates.md (float_vector_all_ones_operand): New predicate. * config/i386/i386-expand.c (ix86_expand_sse_movcc): Use general NOT operator without UNSPEC_MASKOP. gcc/testsuite/ChangeLog: PR target/96891 PR target/98348 * gcc.target/i386/avx512bw-pr96891-1.c: New test. * gcc.target/i386/avx512f-pr96891-1.c: New test. * gcc.target/i386/avx512f-pr96891-2.c: New test. * gcc.target/i386/avx512f-pr96891-3.c: New test. * g++.target/i386/avx512f-pr96891-1.C: New test. * gcc.target/i386/bitwise_mask_op-3.c: Adjust testcase. > > Jeff > -- BR, Hongtao
On Wed, Jan 06, 2021 at 11:34:32AM +0800, Hongtao Liu via Gcc-patches wrote: > > >> > > >> Note there's a data dependency between them. insn 7 feeds insn 9. When > > >> there's a data dependency, combiner patterns are usually the better > > >> choice than peepholes. I think you'd be looking to match something > > >> likethis (from the . combine dump): > > >> > > Using combiner patterns, details is discussed in PR98348 > > Boottrapped and regtested on x86_64-linux-gnu{-m32,} for both GCC10 and trunk. > gcc/ChangeLog: > > PR target/96891 > PR target/98348 > * config/i386/sse.md (VI_128_256): New mode iterator. > (*avx_cmp<mode>3_1, *avx_cmp<mode>3_2, *avx_cmp<mode>3_3, > *avx_cmp<mode>3_4, *avx2_eq<mode>3, *avx2_pcmp<mode>3_1, > *avx2_pcmp<mode>3_2, *avx2_gt<mode>3): New > define_insn_and_split to lower avx512 vector comparison to avx > version when dest is vector. > (*<avx512>_cmp<mode>3,*<avx512>_cmp<mode>3,*<avx512>_ucmp<mode>3): > define_insn_and_split for negating the comparison result. > * config/i386/predicates.md (float_vector_all_ones_operand): > New predicate. > * config/i386/i386-expand.c (ix86_expand_sse_movcc): Use > general NOT operator without UNSPEC_MASKOP. > > gcc/testsuite/ChangeLog: > > PR target/96891 > PR target/98348 > * gcc.target/i386/avx512bw-pr96891-1.c: New test. > * gcc.target/i386/avx512f-pr96891-1.c: New test. > * gcc.target/i386/avx512f-pr96891-2.c: New test. > * gcc.target/i386/avx512f-pr96891-3.c: New test. > * g++.target/i386/avx512f-pr96891-1.C: New test. > * gcc.target/i386/bitwise_mask_op-3.c: Adjust testcase. Ok for trunk. I'd prefer not to backport it to GCC 10. Jakub
From ba76432c08f47e4ecc1f355c0dfdea8908aaf9f4 Mon Sep 17 00:00:00 2001 From: liuhongt <hongtao.liu@intel.com> Date: Wed, 2 Sep 2020 17:14:39 +0800 Subject: [PATCH] Lower AVX512 vector compare to AVX version when dest is vector. gcc/ChangeLog: PR target/96891 * config/i386/sse.md (VI_128_256): New mode iterator. (define_peephole2): Lower avx512 vector compare to avx version when dest is vector. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bw-pr96891-1.c: New test. * gcc.target/i386/avx512f-pr96891-1.c: New test. * gcc.target/i386/avx512f-pr96891-2.c: New test. --- gcc/config/i386/sse.md | 93 +++++++++++++++++++ .../gcc.target/i386/avx512bw-pr96891-1.c | 36 +++++++ .../gcc.target/i386/avx512f-pr96891-1.c | 40 ++++++++ .../gcc.target/i386/avx512f-pr96891-2.c | 30 ++++++ 4 files changed, 199 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 8250325e1a3..31e0dc2a600 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -629,6 +629,9 @@ (define_mode_iterator VI_128 [V16QI V8HI V4SI V2DI]) ;; All 256bit vector integer modes (define_mode_iterator VI_256 [V32QI V16HI V8SI V4DI]) +;; All 128 and 256bit vector integer modes +(define_mode_iterator VI_128_256 [V16QI V8HI V4SI V2DI V32QI V16HI V8SI V4DI]) + ;; Various 128bit vector integer mode combinations (define_mode_iterator VI12_128 [V16QI V8HI]) (define_mode_iterator VI14_128 [V16QI V4SI]) @@ -6703,6 +6706,96 @@ (define_insn "*<avx512>_cvtmask2<ssemodesuffix><mode>" (set_attr "prefix" "evex") (set_attr "mode" "<sseinsnmode>")]) +/* Lower avx512 parallel floating compare to avx compare when dst is vector. */ +(define_peephole2 + [(set (match_operand:<avx512fmaskmode> 0 "register_operand") + (unspec:<avx512fmaskmode> + [(match_operand:VF_128_256 1 "register_operand") + (match_operand:VF_128_256 2 "nonimmediate_operand") + (match_operand:SI 3 "const_0_to_31_operand")] + UNSPEC_PCMP)) + (set (match_operand:<sseintvecmode> 4 "register_operand") + (vec_merge:<sseintvecmode> + (match_operand:<sseintvecmode> 5 "vector_all_ones_operand") + (match_operand:<sseintvecmode> 6 "const0_operand") + (match_dup 0)))] + "!EXT_REX_SSE_REGNO_P (REGNO (operands[4])) + && !EXT_REX_SSE_REGNO_P (REGNO (operands[1])) + && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2]))) + && peep2_reg_dead_p (2, operands[0])" + [(set (match_dup 7) + (unspec:VF_128_256 + [(match_dup 1) + (match_dup 2) + (match_dup 3)] UNSPEC_PCMP))] + "operands[7] = gen_rtx_REG (<MODE>mode, REGNO (operands[4]));") + +/* Lower avx512 parallel integral compare to avx compare when dst is vector. */ +(define_peephole2 + [(set (match_operand:<avx512fmaskmode> 0 "register_operand") + (unspec:<avx512fmaskmode> + [(match_operand:VI_128_256 1 "register_operand") + (match_operand:VI_128_256 2 "nonimmediate_operand")] + UNSPEC_MASKED_EQ)) + (set (match_operand:VI_128_256 4 "register_operand") + (vec_merge:VI_128_256 + (match_operand:VI_128_256 5 "vector_all_ones_operand") + (match_operand:VI_128_256 6 "const0_operand") + (match_dup 0)))] + "!EXT_REX_SSE_REGNO_P (REGNO (operands[4])) + && !EXT_REX_SSE_REGNO_P (REGNO (operands[1])) + && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2]))) + && peep2_reg_dead_p (2, operands[0])" + [(set (match_dup 4) + (eq:VI_128_256 + (match_dup 1) + (match_dup 2)))]) + +(define_peephole2 + [(set (match_operand:<avx512fmaskmode> 0 "register_operand") + (unspec:<avx512fmaskmode> + [(match_operand:VI_128_256 1 "register_operand") + (match_operand:VI_128_256 2 "nonimmediate_operand")] + UNSPEC_MASKED_GT)) + (set (match_operand:VI_128_256 4 "register_operand") + (vec_merge:VI_128_256 + (match_operand:VI_128_256 5 "vector_all_ones_operand") + (match_operand:VI_128_256 6 "const0_operand") + (match_dup 0)))] + "!EXT_REX_SSE_REGNO_P (REGNO (operands[4])) + && !EXT_REX_SSE_REGNO_P (REGNO (operands[1])) + && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2]))) + && peep2_reg_dead_p (2, operands[0])" + [(set (match_dup 4) + (gt:VI_128_256 + (match_dup 1) + (match_dup 2)))]) + +(define_peephole2 + [(set (match_operand:<avx512fmaskmode> 0 "register_operand") + (unspec:<avx512fmaskmode> + [(match_operand:VI_128_256 1 "register_operand") + (match_operand:VI_128_256 2 "nonimmediate_operand") + (match_operand:SI 3 "const_0_to_7_operand")] + UNSPEC_PCMP)) + (set (match_operand:VI_128_256 4 "register_operand") + (vec_merge:VI_128_256 + (match_operand:VI_128_256 5 "vector_all_ones_operand") + (match_operand:VI_128_256 6 "const0_operand") + (match_dup 0)))] + "(INTVAL (operands[3]) == 0 || INTVAL (operands[3]) == 6) + && !EXT_REX_SSE_REGNO_P (REGNO (operands[4])) + && !EXT_REX_SSE_REGNO_P (REGNO (operands[1])) + && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2]))) + && peep2_reg_dead_p (2, operands[0])" + [(const_int 0)] +{ + enum rtx_code code = INTVAL (operands[3]) ? GT : EQ; + emit_move_insn (operands[4], gen_rtx_fmt_ee (code, <MODE>mode, + operands[1], operands[2])); + DONE; +}) + (define_insn "sse2_cvtps2pd<mask_name>" [(set (match_operand:V2DF 0 "register_operand" "=v") (float_extend:V2DF diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c new file mode 100644 index 00000000000..45efff4e0f0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512bw -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-not "%k\[0-7\]" } } */ + +typedef char v16qi __attribute__ ((vector_size (16))); +typedef char v32qi __attribute__ ((vector_size (32))); +typedef short v8hi __attribute__ ((vector_size (16))); +typedef short v16hi __attribute__ ((vector_size (32))); +typedef int v4si __attribute__ ((vector_size (16))); +typedef int v8si __attribute__ ((vector_size (32))); +typedef long long v2di __attribute__ ((vector_size (16))); +typedef long long v4di __attribute__ ((vector_size (32))); + +#define FOO(VTYPE, OPNAME, OP) \ + VTYPE \ + foo_##VTYPE##_##OPNAME (VTYPE a, VTYPE b) \ + { \ + return a OP b; \ + } \ + +FOO (v16qi, eq, ==) +FOO (v16qi, gt, >) +FOO (v32qi, eq, ==) +FOO (v32qi, gt, >) +FOO (v8hi, eq, ==) +FOO (v8hi, gt, >) +FOO (v16hi, eq, ==) +FOO (v16hi, gt, >) +FOO (v4si, eq, ==) +FOO (v4si, gt, >) +FOO (v8si, eq, ==) +FOO (v8si, gt, >) +FOO (v2di, eq, ==) +FOO (v2di, gt, >) +FOO (v4di, eq, ==) +FOO (v4di, gt, >) diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c new file mode 100644 index 00000000000..48ba943e151 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -O2" } */ +/* { dg-final { scan-assembler-not "%k\[0-7\]" } } */ + +typedef float v4sf __attribute__ ((vector_size (16))); +typedef float v8sf __attribute__ ((vector_size (32))); +typedef double v2df __attribute__ ((vector_size (16))); +typedef double v4df __attribute__ ((vector_size (32))); + +#define FOO(VTYPE, OPNAME, OP) \ + VTYPE \ + foo_##VTYPE##_##OPNAME (VTYPE a, VTYPE b) \ + { \ + return a OP b; \ + } \ + +FOO (v4sf, eq, ==) +FOO (v4sf, neq, !=) +FOO (v4sf, gt, >) +FOO (v4sf, ge, >=) +FOO (v4sf, lt, <) +FOO (v4sf, le, <=) +FOO (v8sf, eq, ==) +FOO (v8sf, neq, !=) +FOO (v8sf, gt, >) +FOO (v8sf, ge, >=) +FOO (v8sf, lt, <) +FOO (v8sf, le, <=) +FOO (v2df, eq, ==) +FOO (v2df, neq, !=) +FOO (v2df, gt, >) +FOO (v2df, ge, >=) +FOO (v2df, lt, <) +FOO (v2df, le, <=) +FOO (v4df, eq, ==) +FOO (v4df, neq, !=) +FOO (v4df, gt, >) +FOO (v4df, ge, >=) +FOO (v4df, lt, <) +FOO (v4df, le, <=) diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c new file mode 100644 index 00000000000..5192a00e0f4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -mavx512bw -mavx512dq -O2" } */ +/* { dg-final { scan-assembler-not "%k\[0-7\]" } } */ + +#include<immintrin.h> + +#define FOO(VTYPE,PREFIX,SUFFIX,OPNAME,MASK,LEN) \ + VTYPE \ + foo_##LEN##_##SUFFIX##_##OPNAME (VTYPE a, VTYPE b) \ + { \ + MASK m = _mm##PREFIX##_cmp##OPNAME##_##SUFFIX##_mask (a, b); \ + return _mm##PREFIX##_movm_##SUFFIX (m); \ + } \ + +FOO (__m128i,, epi8, eq, __mmask16, 128); +FOO (__m128i,, epi16, eq, __mmask8, 128); +FOO (__m128i,, epi32, eq, __mmask8, 128); +FOO (__m128i,, epi64, eq, __mmask8, 128); +FOO (__m128i,, epi8, gt, __mmask16, 128); +FOO (__m128i,, epi16, gt, __mmask8, 128); +FOO (__m128i,, epi32, gt, __mmask8, 128); +FOO (__m128i,, epi64, gt, __mmask8, 128); +FOO (__m256i, 256, epi8, eq, __mmask32, 256); +FOO (__m256i, 256, epi16, eq, __mmask16, 256); +FOO (__m256i, 256, epi32, eq, __mmask8, 256); +FOO (__m256i, 256, epi64, eq, __mmask8, 256); +FOO (__m256i, 256, epi8, gt, __mmask32, 256); +FOO (__m256i, 256, epi16, gt, __mmask16, 256); +FOO (__m256i, 256, epi32, gt, __mmask8, 256); +FOO (__m256i, 256, epi64, gt, __mmask8, 256); -- 2.18.1