Message ID | CAMZc-bx_S+zvcH6fr6KEZAkjw+5m6OxJSU1a5YVnjsBOvcEG_g@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | [PR,target/97194,AVX2] Support variable index vec_set. | expand |
On Mon, Oct 19, 2020 at 10:21 AM Hongtao Liu <crazylht@gmail.com> wrote: > > Hi: > It's implemented as below: > V setg (V v, int idx, T val) > > { > V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx}; > V valv = (V){val, val, val, val, val, val, val, val}; > V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv); > v = (v & ~mask) | (valv & mask); > return v; > } > > Bootstrap is fine, regression test for i386/x86-64 backend is ok. > Ok for trunk? Hmm, I guess you're trying to keep the code for !AVX512BW simple but isn't just splitting the compare into clow = {0, 1, 2, 3 ... } == idxv chigh = {16, 17, 18, ... } == idxv; cmp = {clow, chigh} faster, smaller and eventually even easier during expansion? + gcc_assert (ix86_expand_vector_init_duplicate (false, mode, valv, val)); + gcc_assert (ix86_expand_vector_init_duplicate (false, cmp_mode, idxv, idx_tmp)); side-effects in gcc_assert is considered bad style, use ok = ix86_expand_vector_init_duplicate (false, mode, valv, val); gcc_assert (ok); + vec[5] = constv; + ix86_expand_int_vcond (vec); this also returns a bool you probably should assert true. Otherwise thanks for tackling this. Richard. > gcc/ChangeLog: > > PR target/97194 > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function. > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl. > * config/i386/predicates.md (vec_setm_operand): New predicate, > true for const_int_operand or register_operand under TARGET_AVX2. > * config/i386/sse.md (vec_set<mode>): Support both constant > and variable index vec_set. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/avx2-vec-set-1.c: New test. > * gcc.target/i386/avx2-vec-set-2.c: New test. > * gcc.target/i386/avx512bw-vec-set-1.c: New test. > * gcc.target/i386/avx512bw-vec-set-2.c: New test. > * gcc.target/i386/avx512f-vec-set-2.c: New test. > * gcc.target/i386/avx512vl-vec-set-2.c: New test. > > -- > BR, > Hongtao
On Mon, Oct 19, 2020 at 5:07 PM Richard Biener <richard.guenther@gmail.com> wrote: > > On Mon, Oct 19, 2020 at 10:21 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > Hi: > > It's implemented as below: > > V setg (V v, int idx, T val) > > > > { > > V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx}; > > V valv = (V){val, val, val, val, val, val, val, val}; > > V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv); > > v = (v & ~mask) | (valv & mask); > > return v; > > } > > > > Bootstrap is fine, regression test for i386/x86-64 backend is ok. > > Ok for trunk? > > Hmm, I guess you're trying to keep the code for !AVX512BW simple > but isn't just splitting the compare into > > clow = {0, 1, 2, 3 ... } == idxv > chigh = {16, 17, 18, ... } == idxv; > cmp = {clow, chigh} > We also don't have 512-bits byte/word blend instructions without TARGET_AVX512W, so how to use 512-bits cmp? cut from i386-expand.c: in ix86_expand_sse_movcc 3682 case E_V64QImode: 3683 gen = gen_avx512bw_blendmv64qi; ---> TARGET_AVX512BW needed 3684 break; 3685 case E_V32HImode: 3686 gen = gen_avx512bw_blendmv32hi; --> TARGET_AVX512BW needed 3687 break; 3688 case E_V16SImode: 3689 gen = gen_avx512f_blendmv16si; 3690 break; 3691 case E_V8DImode: 3692 gen = gen_avx512f_blendmv8di; 3693 break; 3694 case E_V8DFmode: > faster, smaller and eventually even easier during expansion? > > + gcc_assert (ix86_expand_vector_init_duplicate (false, mode, valv, val)); > + gcc_assert (ix86_expand_vector_init_duplicate (false, cmp_mode, > idxv, idx_tmp)); > > side-effects in gcc_assert is considered bad style, use > > ok = ix86_expand_vector_init_duplicate (false, mode, valv, val); > gcc_assert (ok); > > + vec[5] = constv; > + ix86_expand_int_vcond (vec); > > this also returns a bool you probably should assert true. > Yes, will change. > Otherwise thanks for tackling this. > > Richard. > > > gcc/ChangeLog: > > > > PR target/97194 > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function. > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl. > > * config/i386/predicates.md (vec_setm_operand): New predicate, > > true for const_int_operand or register_operand under TARGET_AVX2. > > * config/i386/sse.md (vec_set<mode>): Support both constant > > and variable index vec_set. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/avx2-vec-set-1.c: New test. > > * gcc.target/i386/avx2-vec-set-2.c: New test. > > * gcc.target/i386/avx512bw-vec-set-1.c: New test. > > * gcc.target/i386/avx512bw-vec-set-2.c: New test. > > * gcc.target/i386/avx512f-vec-set-2.c: New test. > > * gcc.target/i386/avx512vl-vec-set-2.c: New test. > > > > -- > > BR, > > Hongtao
On Mon, Oct 19, 2020 at 11:37 AM Hongtao Liu <crazylht@gmail.com> wrote: > > On Mon, Oct 19, 2020 at 5:07 PM Richard Biener > <richard.guenther@gmail.com> wrote: > > > > On Mon, Oct 19, 2020 at 10:21 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > Hi: > > > It's implemented as below: > > > V setg (V v, int idx, T val) > > > > > > { > > > V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx}; > > > V valv = (V){val, val, val, val, val, val, val, val}; > > > V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv); > > > v = (v & ~mask) | (valv & mask); > > > return v; > > > } > > > > > > Bootstrap is fine, regression test for i386/x86-64 backend is ok. > > > Ok for trunk? > > > > Hmm, I guess you're trying to keep the code for !AVX512BW simple > > but isn't just splitting the compare into > > > > clow = {0, 1, 2, 3 ... } == idxv > > chigh = {16, 17, 18, ... } == idxv; > > cmp = {clow, chigh} > > > > We also don't have 512-bits byte/word blend instructions without > TARGET_AVX512W, so how to use 512-bits cmp? Oh, I see. Guess two back-to-back vpternlog could emulate the blend? Not sure if important - I recall only knl didn't have bw? > cut from i386-expand.c: > in ix86_expand_sse_movcc > 3682 case E_V64QImode: > 3683 gen = gen_avx512bw_blendmv64qi; ---> TARGET_AVX512BW needed > 3684 break; > 3685 case E_V32HImode: > 3686 gen = gen_avx512bw_blendmv32hi; --> TARGET_AVX512BW needed > 3687 break; > 3688 case E_V16SImode: > 3689 gen = gen_avx512f_blendmv16si; > 3690 break; > 3691 case E_V8DImode: > 3692 gen = gen_avx512f_blendmv8di; > 3693 break; > 3694 case E_V8DFmode: > > > faster, smaller and eventually even easier during expansion? > > > > + gcc_assert (ix86_expand_vector_init_duplicate (false, mode, valv, val)); > > + gcc_assert (ix86_expand_vector_init_duplicate (false, cmp_mode, > > idxv, idx_tmp)); > > > > side-effects in gcc_assert is considered bad style, use > > > > ok = ix86_expand_vector_init_duplicate (false, mode, valv, val); > > gcc_assert (ok); > > > > + vec[5] = constv; > > + ix86_expand_int_vcond (vec); > > > > this also returns a bool you probably should assert true. > > > > Yes, will change. > > > Otherwise thanks for tackling this. > > > > Richard. > > > > > gcc/ChangeLog: > > > > > > PR target/97194 > > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function. > > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl. > > > * config/i386/predicates.md (vec_setm_operand): New predicate, > > > true for const_int_operand or register_operand under TARGET_AVX2. > > > * config/i386/sse.md (vec_set<mode>): Support both constant > > > and variable index vec_set. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/i386/avx2-vec-set-1.c: New test. > > > * gcc.target/i386/avx2-vec-set-2.c: New test. > > > * gcc.target/i386/avx512bw-vec-set-1.c: New test. > > > * gcc.target/i386/avx512bw-vec-set-2.c: New test. > > > * gcc.target/i386/avx512f-vec-set-2.c: New test. > > > * gcc.target/i386/avx512vl-vec-set-2.c: New test. > > > > > > -- > > > BR, > > > Hongtao > > > > -- > BR, > Hongtao
On Mon, Oct 19, 2020 at 5:55 PM Richard Biener <richard.guenther@gmail.com> wrote: > > On Mon, Oct 19, 2020 at 11:37 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > On Mon, Oct 19, 2020 at 5:07 PM Richard Biener > > <richard.guenther@gmail.com> wrote: > > > > > > On Mon, Oct 19, 2020 at 10:21 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > Hi: > > > > It's implemented as below: > > > > V setg (V v, int idx, T val) > > > > > > > > { > > > > V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx}; > > > > V valv = (V){val, val, val, val, val, val, val, val}; > > > > V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv); > > > > v = (v & ~mask) | (valv & mask); > > > > return v; > > > > } > > > > > > > > Bootstrap is fine, regression test for i386/x86-64 backend is ok. > > > > Ok for trunk? > > > > > > Hmm, I guess you're trying to keep the code for !AVX512BW simple > > > but isn't just splitting the compare into > > > > > > clow = {0, 1, 2, 3 ... } == idxv > > > chigh = {16, 17, 18, ... } == idxv; > > > cmp = {clow, chigh} > > > > > > > We also don't have 512-bits byte/word blend instructions without > > TARGET_AVX512W, so how to use 512-bits cmp? > > Oh, I see. Guess two back-to-back vpternlog could emulate Yes, we can have something like vpternlogd %zmm0, %zmm1, %zmm2, 0xD8, but since we don't have 512-bits bytes/word broadcast instruction, It would need 2 broadcast and 1 vec_concat to get 1 512-bits vector. it wouldn't save many instructions compared to my version(as below). --- leal -16(%rsi), %eax vmovd %edi, %xmm2 vmovdqa .LC0(%rip), %ymm4 vextracti64x4 $0x1, %zmm0, %ymm3 vmovd %eax, %xmm1 vpbroadcastw %xmm2, %ymm2 vpbroadcastw %xmm1, %ymm1 vpcmpeqw %ymm4, %ymm1, %ymm1 vpblendvb %ymm1, %ymm2, %ymm3, %ymm3 vmovd %esi, %xmm1 vpbroadcastw %xmm1, %ymm1 vpcmpeqw %ymm4, %ymm1, %ymm1 vpblendvb %ymm1, %ymm2, %ymm0, %ymm0 vinserti64x4 $0x1, %ymm3, %zmm0, %zmm0 --- > the blend? Not sure if important - I recall only knl didn't have bw? > Yes, after(including) SKX, all avx512 targets will support AVX512BW. And i don't think performance for V32HI/V64QI without AVX512BW is important. > > cut from i386-expand.c: > > in ix86_expand_sse_movcc > > 3682 case E_V64QImode: > > 3683 gen = gen_avx512bw_blendmv64qi; ---> TARGET_AVX512BW needed > > 3684 break; > > 3685 case E_V32HImode: > > 3686 gen = gen_avx512bw_blendmv32hi; --> TARGET_AVX512BW needed > > 3687 break; > > 3688 case E_V16SImode: > > 3689 gen = gen_avx512f_blendmv16si; > > 3690 break; > > 3691 case E_V8DImode: > > 3692 gen = gen_avx512f_blendmv8di; > > 3693 break; > > 3694 case E_V8DFmode: > > > > > faster, smaller and eventually even easier during expansion? > > > > > > + gcc_assert (ix86_expand_vector_init_duplicate (false, mode, valv, val)); > > > + gcc_assert (ix86_expand_vector_init_duplicate (false, cmp_mode, > > > idxv, idx_tmp)); > > > > > > side-effects in gcc_assert is considered bad style, use > > > > > > ok = ix86_expand_vector_init_duplicate (false, mode, valv, val); > > > gcc_assert (ok); > > > > > > + vec[5] = constv; > > > + ix86_expand_int_vcond (vec); > > > > > > this also returns a bool you probably should assert true. > > > > > > > Yes, will change. > > > > > Otherwise thanks for tackling this. > > > > > > Richard. > > > > > > > gcc/ChangeLog: > > > > > > > > PR target/97194 > > > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function. > > > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl. > > > > * config/i386/predicates.md (vec_setm_operand): New predicate, > > > > true for const_int_operand or register_operand under TARGET_AVX2. > > > > * config/i386/sse.md (vec_set<mode>): Support both constant > > > > and variable index vec_set. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * gcc.target/i386/avx2-vec-set-1.c: New test. > > > > * gcc.target/i386/avx2-vec-set-2.c: New test. > > > > * gcc.target/i386/avx512bw-vec-set-1.c: New test. > > > > * gcc.target/i386/avx512bw-vec-set-2.c: New test. > > > > * gcc.target/i386/avx512f-vec-set-2.c: New test. > > > > * gcc.target/i386/avx512vl-vec-set-2.c: New test. > > > > > > > > -- > > > > BR, > > > > Hongtao > > > > > > > > -- > > BR, > > Hongtao
On Tue, Oct 20, 2020 at 4:35 AM Hongtao Liu <crazylht@gmail.com> wrote: > > On Mon, Oct 19, 2020 at 5:55 PM Richard Biener > <richard.guenther@gmail.com> wrote: > > > > On Mon, Oct 19, 2020 at 11:37 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > On Mon, Oct 19, 2020 at 5:07 PM Richard Biener > > > <richard.guenther@gmail.com> wrote: > > > > > > > > On Mon, Oct 19, 2020 at 10:21 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > > > Hi: > > > > > It's implemented as below: > > > > > V setg (V v, int idx, T val) > > > > > > > > > > { > > > > > V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx}; > > > > > V valv = (V){val, val, val, val, val, val, val, val}; > > > > > V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv); > > > > > v = (v & ~mask) | (valv & mask); > > > > > return v; > > > > > } > > > > > > > > > > Bootstrap is fine, regression test for i386/x86-64 backend is ok. > > > > > Ok for trunk? > > > > > > > > Hmm, I guess you're trying to keep the code for !AVX512BW simple > > > > but isn't just splitting the compare into > > > > > > > > clow = {0, 1, 2, 3 ... } == idxv > > > > chigh = {16, 17, 18, ... } == idxv; > > > > cmp = {clow, chigh} > > > > > > > > > > We also don't have 512-bits byte/word blend instructions without > > > TARGET_AVX512W, so how to use 512-bits cmp? > > > > Oh, I see. Guess two back-to-back vpternlog could emulate > > Yes, we can have something like vpternlogd %zmm0, %zmm1, %zmm2, 0xD8, > but since we don't have 512-bits bytes/word broadcast instruction, > It would need 2 broadcast and 1 vec_concat to get 1 512-bits vector. > it wouldn't save many instructions compared to my version(as below). > > --- > leal -16(%rsi), %eax > vmovd %edi, %xmm2 > vmovdqa .LC0(%rip), %ymm4 > vextracti64x4 $0x1, %zmm0, %ymm3 > vmovd %eax, %xmm1 > vpbroadcastw %xmm2, %ymm2 > vpbroadcastw %xmm1, %ymm1 > vpcmpeqw %ymm4, %ymm1, %ymm1 > vpblendvb %ymm1, %ymm2, %ymm3, %ymm3 > vmovd %esi, %xmm1 > vpbroadcastw %xmm1, %ymm1 > vpcmpeqw %ymm4, %ymm1, %ymm1 > vpblendvb %ymm1, %ymm2, %ymm0, %ymm0 > vinserti64x4 $0x1, %ymm3, %zmm0, %zmm0 > --- > > > the blend? Not sure if important - I recall only knl didn't have bw? > > > > Yes, after(including) SKX, all avx512 targets will support AVX512BW. > And i don't think performance for V32HI/V64QI without AVX512BW is important. True. I have no further comments on the patch then - it still needs i386 maintainer approval though. Thanks, Richard. > > > > cut from i386-expand.c: > > > in ix86_expand_sse_movcc > > > 3682 case E_V64QImode: > > > 3683 gen = gen_avx512bw_blendmv64qi; ---> TARGET_AVX512BW needed > > > 3684 break; > > > 3685 case E_V32HImode: > > > 3686 gen = gen_avx512bw_blendmv32hi; --> TARGET_AVX512BW needed > > > 3687 break; > > > 3688 case E_V16SImode: > > > 3689 gen = gen_avx512f_blendmv16si; > > > 3690 break; > > > 3691 case E_V8DImode: > > > 3692 gen = gen_avx512f_blendmv8di; > > > 3693 break; > > > 3694 case E_V8DFmode: > > > > > > > faster, smaller and eventually even easier during expansion? > > > > > > > > + gcc_assert (ix86_expand_vector_init_duplicate (false, mode, valv, val)); > > > > + gcc_assert (ix86_expand_vector_init_duplicate (false, cmp_mode, > > > > idxv, idx_tmp)); > > > > > > > > side-effects in gcc_assert is considered bad style, use > > > > > > > > ok = ix86_expand_vector_init_duplicate (false, mode, valv, val); > > > > gcc_assert (ok); > > > > > > > > + vec[5] = constv; > > > > + ix86_expand_int_vcond (vec); > > > > > > > > this also returns a bool you probably should assert true. > > > > > > > > > > Yes, will change. > > > > > > > Otherwise thanks for tackling this. > > > > > > > > Richard. > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > PR target/97194 > > > > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function. > > > > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl. > > > > > * config/i386/predicates.md (vec_setm_operand): New predicate, > > > > > true for const_int_operand or register_operand under TARGET_AVX2. > > > > > * config/i386/sse.md (vec_set<mode>): Support both constant > > > > > and variable index vec_set. > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > * gcc.target/i386/avx2-vec-set-1.c: New test. > > > > > * gcc.target/i386/avx2-vec-set-2.c: New test. > > > > > * gcc.target/i386/avx512bw-vec-set-1.c: New test. > > > > > * gcc.target/i386/avx512bw-vec-set-2.c: New test. > > > > > * gcc.target/i386/avx512f-vec-set-2.c: New test. > > > > > * gcc.target/i386/avx512vl-vec-set-2.c: New test. > > > > > > > > > > -- > > > > > BR, > > > > > Hongtao > > > > > > > > > > > > -- > > > BR, > > > Hongtao > > > > -- > BR, > Hongtao
ping^1 On Tue, Oct 20, 2020 at 3:36 PM Richard Biener <richard.guenther@gmail.com> wrote: > > On Tue, Oct 20, 2020 at 4:35 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > On Mon, Oct 19, 2020 at 5:55 PM Richard Biener > > <richard.guenther@gmail.com> wrote: > > > > > > On Mon, Oct 19, 2020 at 11:37 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > On Mon, Oct 19, 2020 at 5:07 PM Richard Biener > > > > <richard.guenther@gmail.com> wrote: > > > > > > > > > > On Mon, Oct 19, 2020 at 10:21 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > > > > > Hi: > > > > > > It's implemented as below: > > > > > > V setg (V v, int idx, T val) > > > > > > > > > > > > { > > > > > > V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx}; > > > > > > V valv = (V){val, val, val, val, val, val, val, val}; > > > > > > V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv); > > > > > > v = (v & ~mask) | (valv & mask); > > > > > > return v; > > > > > > } > > > > > > > > > > > > Bootstrap is fine, regression test for i386/x86-64 backend is ok. > > > > > > Ok for trunk? > > > > > > > > > > Hmm, I guess you're trying to keep the code for !AVX512BW simple > > > > > but isn't just splitting the compare into > > > > > > > > > > clow = {0, 1, 2, 3 ... } == idxv > > > > > chigh = {16, 17, 18, ... } == idxv; > > > > > cmp = {clow, chigh} > > > > > > > > > > > > > We also don't have 512-bits byte/word blend instructions without > > > > TARGET_AVX512W, so how to use 512-bits cmp? > > > > > > Oh, I see. Guess two back-to-back vpternlog could emulate > > > > Yes, we can have something like vpternlogd %zmm0, %zmm1, %zmm2, 0xD8, > > but since we don't have 512-bits bytes/word broadcast instruction, > > It would need 2 broadcast and 1 vec_concat to get 1 512-bits vector. > > it wouldn't save many instructions compared to my version(as below). > > > > --- > > leal -16(%rsi), %eax > > vmovd %edi, %xmm2 > > vmovdqa .LC0(%rip), %ymm4 > > vextracti64x4 $0x1, %zmm0, %ymm3 > > vmovd %eax, %xmm1 > > vpbroadcastw %xmm2, %ymm2 > > vpbroadcastw %xmm1, %ymm1 > > vpcmpeqw %ymm4, %ymm1, %ymm1 > > vpblendvb %ymm1, %ymm2, %ymm3, %ymm3 > > vmovd %esi, %xmm1 > > vpbroadcastw %xmm1, %ymm1 > > vpcmpeqw %ymm4, %ymm1, %ymm1 > > vpblendvb %ymm1, %ymm2, %ymm0, %ymm0 > > vinserti64x4 $0x1, %ymm3, %zmm0, %zmm0 > > --- > > > > > the blend? Not sure if important - I recall only knl didn't have bw? > > > > > > > Yes, after(including) SKX, all avx512 targets will support AVX512BW. > > And i don't think performance for V32HI/V64QI without AVX512BW is important. > > True. > > I have no further comments on the patch then - it still needs i386 maintainer > approval though. > > Thanks, > Richard. > > > > > > > cut from i386-expand.c: > > > > in ix86_expand_sse_movcc > > > > 3682 case E_V64QImode: > > > > 3683 gen = gen_avx512bw_blendmv64qi; ---> TARGET_AVX512BW needed > > > > 3684 break; > > > > 3685 case E_V32HImode: > > > > 3686 gen = gen_avx512bw_blendmv32hi; --> TARGET_AVX512BW needed > > > > 3687 break; > > > > 3688 case E_V16SImode: > > > > 3689 gen = gen_avx512f_blendmv16si; > > > > 3690 break; > > > > 3691 case E_V8DImode: > > > > 3692 gen = gen_avx512f_blendmv8di; > > > > 3693 break; > > > > 3694 case E_V8DFmode: > > > > > > > > > faster, smaller and eventually even easier during expansion? > > > > > > > > > > + gcc_assert (ix86_expand_vector_init_duplicate (false, mode, valv, val)); > > > > > + gcc_assert (ix86_expand_vector_init_duplicate (false, cmp_mode, > > > > > idxv, idx_tmp)); > > > > > > > > > > side-effects in gcc_assert is considered bad style, use > > > > > > > > > > ok = ix86_expand_vector_init_duplicate (false, mode, valv, val); > > > > > gcc_assert (ok); > > > > > > > > > > + vec[5] = constv; > > > > > + ix86_expand_int_vcond (vec); > > > > > > > > > > this also returns a bool you probably should assert true. > > > > > > > > > > > > > Yes, will change. > > > > > > > > > Otherwise thanks for tackling this. > > > > > > > > > > Richard. > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > PR target/97194 > > > > > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function. > > > > > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl. > > > > > > * config/i386/predicates.md (vec_setm_operand): New predicate, > > > > > > true for const_int_operand or register_operand under TARGET_AVX2. > > > > > > * config/i386/sse.md (vec_set<mode>): Support both constant > > > > > > and variable index vec_set. > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > * gcc.target/i386/avx2-vec-set-1.c: New test. > > > > > > * gcc.target/i386/avx2-vec-set-2.c: New test. > > > > > > * gcc.target/i386/avx512bw-vec-set-1.c: New test. > > > > > > * gcc.target/i386/avx512bw-vec-set-2.c: New test. > > > > > > * gcc.target/i386/avx512f-vec-set-2.c: New test. > > > > > > * gcc.target/i386/avx512vl-vec-set-2.c: New test. > > > > > > > > > > > > -- > > > > > > BR, > > > > > > Hongtao > > > > > > > > > > > > > > > > -- > > > > BR, > > > > Hongtao > > > > > > > > -- > > BR, > > Hongtao
ping ^3 Rebase patch on latest trunk. On Tue, Oct 27, 2020 at 3:51 PM Hongtao Liu <crazylht@gmail.com> wrote: > > ping^1 > > On Tue, Oct 20, 2020 at 3:36 PM Richard Biener > <richard.guenther@gmail.com> wrote: > > > > On Tue, Oct 20, 2020 at 4:35 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > On Mon, Oct 19, 2020 at 5:55 PM Richard Biener > > > <richard.guenther@gmail.com> wrote: > > > > > > > > On Mon, Oct 19, 2020 at 11:37 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > > > On Mon, Oct 19, 2020 at 5:07 PM Richard Biener > > > > > <richard.guenther@gmail.com> wrote: > > > > > > > > > > > > On Mon, Oct 19, 2020 at 10:21 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > > > > > > > Hi: > > > > > > > It's implemented as below: > > > > > > > V setg (V v, int idx, T val) > > > > > > > > > > > > > > { > > > > > > > V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx}; > > > > > > > V valv = (V){val, val, val, val, val, val, val, val}; > > > > > > > V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv); > > > > > > > v = (v & ~mask) | (valv & mask); > > > > > > > return v; > > > > > > > } > > > > > > > > > > > > > > Bootstrap is fine, regression test for i386/x86-64 backend is ok. > > > > > > > Ok for trunk? > > > > > > > > > > > > Hmm, I guess you're trying to keep the code for !AVX512BW simple > > > > > > but isn't just splitting the compare into > > > > > > > > > > > > clow = {0, 1, 2, 3 ... } == idxv > > > > > > chigh = {16, 17, 18, ... } == idxv; > > > > > > cmp = {clow, chigh} > > > > > > > > > > > > > > > > We also don't have 512-bits byte/word blend instructions without > > > > > TARGET_AVX512W, so how to use 512-bits cmp? > > > > > > > > Oh, I see. Guess two back-to-back vpternlog could emulate > > > > > > Yes, we can have something like vpternlogd %zmm0, %zmm1, %zmm2, 0xD8, > > > but since we don't have 512-bits bytes/word broadcast instruction, > > > It would need 2 broadcast and 1 vec_concat to get 1 512-bits vector. > > > it wouldn't save many instructions compared to my version(as below). > > > > > > --- > > > leal -16(%rsi), %eax > > > vmovd %edi, %xmm2 > > > vmovdqa .LC0(%rip), %ymm4 > > > vextracti64x4 $0x1, %zmm0, %ymm3 > > > vmovd %eax, %xmm1 > > > vpbroadcastw %xmm2, %ymm2 > > > vpbroadcastw %xmm1, %ymm1 > > > vpcmpeqw %ymm4, %ymm1, %ymm1 > > > vpblendvb %ymm1, %ymm2, %ymm3, %ymm3 > > > vmovd %esi, %xmm1 > > > vpbroadcastw %xmm1, %ymm1 > > > vpcmpeqw %ymm4, %ymm1, %ymm1 > > > vpblendvb %ymm1, %ymm2, %ymm0, %ymm0 > > > vinserti64x4 $0x1, %ymm3, %zmm0, %zmm0 > > > --- > > > > > > > the blend? Not sure if important - I recall only knl didn't have bw? > > > > > > > > > > Yes, after(including) SKX, all avx512 targets will support AVX512BW. > > > And i don't think performance for V32HI/V64QI without AVX512BW is important. > > > > True. > > > > I have no further comments on the patch then - it still needs i386 maintainer > > approval though. > > > > Thanks, > > Richard. > > > > > > > > > > cut from i386-expand.c: > > > > > in ix86_expand_sse_movcc > > > > > 3682 case E_V64QImode: > > > > > 3683 gen = gen_avx512bw_blendmv64qi; ---> TARGET_AVX512BW needed > > > > > 3684 break; > > > > > 3685 case E_V32HImode: > > > > > 3686 gen = gen_avx512bw_blendmv32hi; --> TARGET_AVX512BW needed > > > > > 3687 break; > > > > > 3688 case E_V16SImode: > > > > > 3689 gen = gen_avx512f_blendmv16si; > > > > > 3690 break; > > > > > 3691 case E_V8DImode: > > > > > 3692 gen = gen_avx512f_blendmv8di; > > > > > 3693 break; > > > > > 3694 case E_V8DFmode: > > > > > > > > > > > faster, smaller and eventually even easier during expansion? > > > > > > > > > > > > + gcc_assert (ix86_expand_vector_init_duplicate (false, mode, valv, val)); > > > > > > + gcc_assert (ix86_expand_vector_init_duplicate (false, cmp_mode, > > > > > > idxv, idx_tmp)); > > > > > > > > > > > > side-effects in gcc_assert is considered bad style, use > > > > > > > > > > > > ok = ix86_expand_vector_init_duplicate (false, mode, valv, val); > > > > > > gcc_assert (ok); > > > > > > > > > > > > + vec[5] = constv; > > > > > > + ix86_expand_int_vcond (vec); > > > > > > > > > > > > this also returns a bool you probably should assert true. > > > > > > > > > > > > > > > > Yes, will change. > > > > > > > > > > > Otherwise thanks for tackling this. > > > > > > > > > > > > Richard. > > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > > > PR target/97194 > > > > > > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function. > > > > > > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl. > > > > > > > * config/i386/predicates.md (vec_setm_operand): New predicate, > > > > > > > true for const_int_operand or register_operand under TARGET_AVX2. > > > > > > > * config/i386/sse.md (vec_set<mode>): Support both constant > > > > > > > and variable index vec_set. > > > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > > > * gcc.target/i386/avx2-vec-set-1.c: New test. > > > > > > > * gcc.target/i386/avx2-vec-set-2.c: New test. > > > > > > > * gcc.target/i386/avx512bw-vec-set-1.c: New test. > > > > > > > * gcc.target/i386/avx512bw-vec-set-2.c: New test. > > > > > > > * gcc.target/i386/avx512f-vec-set-2.c: New test. > > > > > > > * gcc.target/i386/avx512vl-vec-set-2.c: New test. > > > > > > > > > > > > > > -- > > > > > > > BR, > > > > > > > Hongtao > > > > > > > > > > > > > > > > > > > > -- > > > > > BR, > > > > > Hongtao > > > > > > > > > > > > -- > > > BR, > > > Hongtao > > > > -- > BR, > Hongtao
On 11/11/20 1:03 AM, Hongtao Liu via Gcc-patches wrote: > > > > vec_set_rebaserebase_onr11-4901.patch > > From c9d684c37b5f79f68f938f39eeb9e7989b10302d Mon Sep 17 00:00:00 2001 > From: liuhongt <hongtao.liu@intel.com> > Date: Mon, 19 Oct 2020 16:04:39 +0800 > Subject: [PATCH] Support variable index vec_set. > > gcc/ChangeLog: > > PR target/97194 > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function. > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl. > * config/i386/predicates.md (vec_setm_operand): New predicate, > true for const_int_operand or register_operand under TARGET_AVX2. > * config/i386/sse.md (vec_set<mode>): Support both constant > and variable index vec_set. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/avx2-vec-set-1.c: New test. > * gcc.target/i386/avx2-vec-set-2.c: New test. > * gcc.target/i386/avx512bw-vec-set-1.c: New test. > * gcc.target/i386/avx512bw-vec-set-2.c: New test. > * gcc.target/i386/avx512f-vec-set-2.c: New test. > * gcc.target/i386/avx512vl-vec-set-2.c: New test. This is OK. Sorry for the delays. jeff
Thanks for the review. BTW, the patch is already installed because uros helped to review this patch in another thread https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558682.html On Thu, Nov 26, 2020 at 3:15 AM Jeff Law <law@redhat.com> wrote: > > > > On 11/11/20 1:03 AM, Hongtao Liu via Gcc-patches wrote: > > > > > > > > > vec_set_rebaserebase_onr11-4901.patch > > > > From c9d684c37b5f79f68f938f39eeb9e7989b10302d Mon Sep 17 00:00:00 2001 > > From: liuhongt <hongtao.liu@intel.com> > > Date: Mon, 19 Oct 2020 16:04:39 +0800 > > Subject: [PATCH] Support variable index vec_set. > > > > gcc/ChangeLog: > > > > PR target/97194 > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function. > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl. > > * config/i386/predicates.md (vec_setm_operand): New predicate, > > true for const_int_operand or register_operand under TARGET_AVX2. > > * config/i386/sse.md (vec_set<mode>): Support both constant > > and variable index vec_set. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/avx2-vec-set-1.c: New test. > > * gcc.target/i386/avx2-vec-set-2.c: New test. > > * gcc.target/i386/avx512bw-vec-set-1.c: New test. > > * gcc.target/i386/avx512bw-vec-set-2.c: New test. > > * gcc.target/i386/avx512f-vec-set-2.c: New test. > > * gcc.target/i386/avx512vl-vec-set-2.c: New test. > This is OK. Sorry for the delays. > > jeff >
From d00b6ada0fc420da7cf1e91ccbf12ee0a923d8a3 Mon Sep 17 00:00:00 2001 From: liuhongt <hongtao.liu@intel.com> Date: Mon, 19 Oct 2020 16:04:39 +0800 Subject: [PATCH] Support variable index vec_set. gcc/ChangeLog: PR target/97194 * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function. * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl. * config/i386/predicates.md (vec_setm_operand): New predicate, true for const_int_operand or register_operand under TARGET_AVX2. * config/i386/sse.md (vec_set<mode>): Support both constant and variable index vec_set. gcc/testsuite/ChangeLog: * gcc.target/i386/avx2-vec-set-1.c: New test. * gcc.target/i386/avx2-vec-set-2.c: New test. * gcc.target/i386/avx512bw-vec-set-1.c: New test. * gcc.target/i386/avx512bw-vec-set-2.c: New test. * gcc.target/i386/avx512f-vec-set-2.c: New test. * gcc.target/i386/avx512vl-vec-set-2.c: New test. --- gcc/config/i386/i386-expand.c | 102 ++++++++++++++++++ gcc/config/i386/i386-protos.h | 1 + gcc/config/i386/predicates.md | 6 ++ gcc/config/i386/sse.md | 9 +- .../gcc.target/i386/avx2-vec-set-1.c | 49 +++++++++ .../gcc.target/i386/avx2-vec-set-2.c | 50 +++++++++ .../gcc.target/i386/avx512bw-vec-set-1.c | 20 ++++ .../gcc.target/i386/avx512bw-vec-set-2.c | 44 ++++++++ .../gcc.target/i386/avx512f-vec-set-2.c | 42 ++++++++ .../gcc.target/i386/avx512vl-vec-set-2.c | 55 ++++++++++ 10 files changed, 375 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx2-vec-set-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx2-vec-set-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-vec-set-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-vec-set-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vec-set-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-vec-set-2.c diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index e6f8b314f18..63b11e5f945 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -14201,6 +14201,108 @@ ix86_expand_vector_init (bool mmx_ok, rtx target, rtx vals) ix86_expand_vector_init_general (mmx_ok, mode, target, vals); } +/* Implemented as + V setg (V v, int idx, T val) + { + V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx}; + V valv = (V){val, val, val, val, val, val, val, val}; + V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv); + v = (v & ~mask) | (valv & mask); + return v; + }. */ +void +ix86_expand_vector_set_var (rtx target, rtx val, rtx idx) +{ + rtx vec[64]; + machine_mode mode = GET_MODE (target); + machine_mode cmp_mode = mode; + int n_elts = GET_MODE_NUNITS (mode); + rtx valv,idxv,constv,idx_tmp; + + /* 512-bits vector byte/word broadcast and comparison only available + under TARGET_AVX512BW, break 512-bits vector into two 256-bits vector + when without TARGET_AVX512BW. */ + if ((mode == V32HImode || mode == V64QImode) && !TARGET_AVX512BW) + { + gcc_assert (TARGET_AVX512F); + rtx vhi, vlo, idx_hi; + machine_mode half_mode; + rtx (*extract_hi)(rtx, rtx); + rtx (*extract_lo)(rtx, rtx); + + if (mode == V32HImode) + { + half_mode = V16HImode; + extract_hi = gen_vec_extract_hi_v32hi; + extract_lo = gen_vec_extract_lo_v32hi; + } + else + { + half_mode = V32QImode; + extract_hi = gen_vec_extract_hi_v64qi; + extract_lo = gen_vec_extract_lo_v64qi; + } + + vhi = gen_reg_rtx (half_mode); + vlo = gen_reg_rtx (half_mode); + idx_hi = gen_reg_rtx (GET_MODE (idx)); + emit_insn (extract_hi (vhi, target)); + emit_insn (extract_lo (vlo, target)); + vec[0] = idx_hi; + vec[1] = idx; + vec[2] = GEN_INT (n_elts/2); + ix86_expand_binary_operator (MINUS, GET_MODE (idx), vec); + ix86_expand_vector_set_var (vhi, val, idx_hi); + ix86_expand_vector_set_var (vlo, val, idx); + emit_insn (gen_rtx_SET (target, gen_rtx_VEC_CONCAT (mode, vlo, vhi))); + return; + } + + if (FLOAT_MODE_P (GET_MODE_INNER (mode))) + { + switch (mode) + { + case E_V2DFmode: + cmp_mode = V2DImode; + break; + case E_V4DFmode: + cmp_mode = V4DImode; + break; + case E_V8DFmode: + cmp_mode = V8DImode; + break; + case E_V4SFmode: + cmp_mode = V4SImode; + break; + case E_V8SFmode: + cmp_mode = V8SImode; + break; + case E_V16SFmode: + cmp_mode = V16SImode; + break; + default: + gcc_unreachable (); + } + } + + for (int i = 0; i != n_elts; i++) + vec[i] = GEN_INT (i); + constv = gen_rtx_CONST_VECTOR (cmp_mode, gen_rtvec_v (n_elts, vec)); + valv = gen_reg_rtx (mode); + idxv = gen_reg_rtx (cmp_mode); + idx_tmp = convert_to_mode (GET_MODE_INNER (cmp_mode), idx, 1); + + gcc_assert (ix86_expand_vector_init_duplicate (false, mode, valv, val)); + gcc_assert (ix86_expand_vector_init_duplicate (false, cmp_mode, idxv, idx_tmp)); + vec[0] = target; + vec[1] = valv; + vec[2] = target; + vec[3] = gen_rtx_EQ (mode, idxv, constv); + vec[4] = idxv; + vec[5] = constv; + ix86_expand_int_vcond (vec); +} + void ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt) { diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index c5b700efd0e..7a1dc3d4d64 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -242,6 +242,7 @@ extern rtx ix86_rewrite_tls_address (rtx); extern void ix86_expand_vector_init (bool, rtx, rtx); extern void ix86_expand_vector_set (bool, rtx, rtx, int); +extern void ix86_expand_vector_set_var (rtx, rtx, rtx); extern void ix86_expand_vector_extract (bool, rtx, rtx, int); extern void ix86_expand_reduc (rtx (*)(rtx, rtx, rtx), rtx, rtx); diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index b03f9cd1c8c..5941a50d5d0 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -1023,6 +1023,12 @@ (define_predicate "incdec_operand" return op == const1_rtx || op == constm1_rtx; }) +;; True for registers, or const_int_operand, used to vec_setm expander. +(define_predicate "vec_setm_operand" + (ior (and (match_operand 0 "register_operand") + (match_test "TARGET_AVX2")) + (match_code "const_int"))) + ;; True for registers, or 1 or -1. Used to optimize double-word shifts. (define_predicate "reg_or_pm1_operand" (ior (match_operand 0 "register_operand") diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 934b60a288f..30e39b41260 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -8310,11 +8310,14 @@ (define_insn "vec_setv2df_0" (define_expand "vec_set<mode>" [(match_operand:V 0 "register_operand") (match_operand:<ssescalarmode> 1 "register_operand") - (match_operand 2 "const_int_operand")] + (match_operand 2 "vec_setm_operand")] "TARGET_SSE" { - ix86_expand_vector_set (false, operands[0], operands[1], - INTVAL (operands[2])); + if (CONST_INT_P (operands[2])) + ix86_expand_vector_set (false, operands[0], operands[1], + INTVAL (operands[2])); + else + ix86_expand_vector_set_var (operands[0], operands[1], operands[2]); DONE; }) diff --git a/gcc/testsuite/gcc.target/i386/avx2-vec-set-1.c b/gcc/testsuite/gcc.target/i386/avx2-vec-set-1.c new file mode 100644 index 00000000000..4c16ec5dfc4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx2-vec-set-1.c @@ -0,0 +1,49 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx2 -O2 -mno-avx512f" } */ +/* { dg-final { scan-assembler-times {(?n)vpcmpeq[bwdq]} 12 } } */ +/* { dg-final { scan-assembler-times {(?n)vp?blendv} 12 } } */ + +typedef char v32qi __attribute__ ((vector_size (32))); +typedef char v16qi __attribute__ ((vector_size (16))); + +typedef short v16hi __attribute__ ((vector_size (32))); +typedef short v8hi __attribute__ ((vector_size (16))); + +typedef int v8si __attribute__ ((vector_size (32))); +typedef int v4si __attribute__ ((vector_size (16))); + +typedef long long v4di __attribute__ ((vector_size (32))); +typedef long long v2di __attribute__ ((vector_size (16))); + +typedef float v8sf __attribute__ ((vector_size (32))); +typedef float v4sf __attribute__ ((vector_size (16))); + +typedef double v4df __attribute__ ((vector_size (32))); +typedef double v2df __attribute__ ((vector_size (16))); + +#define FOO(VTYPE, TYPE) \ + VTYPE \ + __attribute__ ((noipa)) \ + foo_##VTYPE (VTYPE a, TYPE b, unsigned int c) \ + { \ + a[c] = b; \ + return a; \ + } \ + +FOO (v16qi, char); +FOO (v32qi, char); + +FOO (v8hi, short); +FOO (v16hi, short); + +FOO (v4si, int); +FOO (v8si, int); + +FOO (v2di, long long); +FOO (v4di, long long); + +FOO (v4sf, float); +FOO (v8sf, float); + +FOO (v2df, double); +FOO (v4df, double); diff --git a/gcc/testsuite/gcc.target/i386/avx2-vec-set-2.c b/gcc/testsuite/gcc.target/i386/avx2-vec-set-2.c new file mode 100644 index 00000000000..9086ef406f1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx2-vec-set-2.c @@ -0,0 +1,50 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx2 } */ +/* { dg-options "-O2 -mavx2" } */ + + +#ifndef CHECK +#define CHECK "avx2-check.h" +#endif + +#ifndef TEST +#define TEST avx2_test +#endif + +#include CHECK + +#include "avx2-vec-set-1.c" + +#define CALC_TEST(vtype, type, N, idx) \ +do \ + { \ + int i,val = idx * idx - idx * 3 + 16; \ + type res[N],exp[N]; \ + vtype resv; \ + for (i = 0; i < N; i++) \ + { \ + res[i] = i * i - i * 3 + 15; \ + exp[i] = res[i]; \ + } \ + exp[idx] = val; \ + resv = foo_##vtype (*(vtype *)&res[0], val, idx); \ + for (i = 0; i < N; i++) \ + { \ + if (resv[i] != exp[i]) \ + abort (); \ + } \ + } \ +while (0) + +static void +TEST (void) +{ + CALC_TEST (v32qi, char, 32, 17); + CALC_TEST (v16qi, char, 16, 5); + CALC_TEST (v16hi, short, 16, 9); + CALC_TEST (v8hi, short, 8, 6); + CALC_TEST (v8si, int, 8, 3); + CALC_TEST (v4si, int, 4, 2); + CALC_TEST (v4di, long long, 4, 1); + CALC_TEST (v2di, long long, 2, 0); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-1.c new file mode 100644 index 00000000000..5cfbc85732e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-1.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -mavx512bw -O2" } */ +/* { dg-final { scan-assembler-times {(?n)(?:vp?broadcast|vmovddup)} 36 } } */ +/* { dg-final { scan-assembler-times {(?n)vpcmp[bwdq][ \t]+\$0} 18 } } */ + +typedef char v64qi __attribute__ ((vector_size (64))); +typedef short v32hi __attribute__ ((vector_size (64))); +typedef int v16si __attribute__ ((vector_size (64))); +typedef long long v8di __attribute__ ((vector_size (64))); +typedef float v16sf __attribute__ ((vector_size (64))); +typedef double v8df __attribute__ ((vector_size (64))); + +#include "avx2-vec-set-1.c" + +FOO (v64qi, char); +FOO (v32hi, short); +FOO (v16si, int); +FOO (v8di, long long); +FOO (v16sf, float); +FOO (v8df, double); diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-2.c new file mode 100644 index 00000000000..22e64183ebd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-2.c @@ -0,0 +1,44 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512bw } */ +/* { dg-options "-O2 -mavx512bw" } */ + + +#ifndef CHECK +#define CHECK "avx512f-check.h" +#endif + +#define AVX512BW + +#include CHECK + +#include "avx512bw-vec-set-1.c" + +#define CALC_TEST(vtype, type, N, idx) \ +do \ + { \ + int i,val = idx * idx - idx * 3 + 16; \ + type res[N],exp[N]; \ + vtype resv; \ + for (i = 0; i < N; i++) \ + { \ + res[i] = i * i - i * 3 + 15; \ + exp[i] = res[i]; \ + } \ + exp[idx] = val; \ + resv = foo_##vtype (*(vtype *)&res[0], val, idx); \ + for (i = 0; i < N; i++) \ + { \ + if (resv[i] != exp[i]) \ + abort (); \ + } \ + } \ +while (0) + +static void +test_512 (void) +{ + CALC_TEST (v64qi, char, 64, 50); + CALC_TEST (v32hi, short, 32, 30); + CALC_TEST (v16si, int, 16, 15); + CALC_TEST (v8di, long long, 8, 7); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vec-set-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vec-set-2.c new file mode 100644 index 00000000000..8f2aa03ec11 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-vec-set-2.c @@ -0,0 +1,42 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512f } */ +/* { dg-options "-O2 -mavx512f -mno-avx512bw" } */ + + +#ifndef CHECK +#define CHECK "avx512f-check.h" +#endif + +#define AVX512F + +#include CHECK + +#include "avx512bw-vec-set-1.c" + +#define CALC_TEST(vtype, type, N, idx) \ +do \ + { \ + int i,val = idx * idx - idx * 3 + 16; \ + type res[N],exp[N]; \ + vtype resv; \ + for (i = 0; i < N; i++) \ + { \ + res[i] = i * i - i * 3 + 15; \ + exp[i] = res[i]; \ + } \ + exp[idx] = val; \ + resv = foo_##vtype (*(vtype *)&res[0], val, idx); \ + for (i = 0; i < N; i++) \ + { \ + if (resv[i] != exp[i]) \ + abort (); \ + } \ + } \ +while (0) + +static void +test_512 (void) +{ + CALC_TEST (v64qi, char, 64, 50); + CALC_TEST (v32hi, short, 32, 30); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vec-set-2.c b/gcc/testsuite/gcc.target/i386/avx512vl-vec-set-2.c new file mode 100644 index 00000000000..4f327427a64 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vec-set-2.c @@ -0,0 +1,55 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512bw } */ +/* { dg-require-effective-target avx512vl } */ +/* { dg-options "-O2 -mavx512bw -mavx512vl" } */ + + +#ifndef CHECK +#define CHECK "avx512f-check.h" +#endif + +#define AVX512VL +#define AVX512BW + +#include CHECK + +#include "avx512bw-vec-set-1.c" + +#define CALC_TEST(vtype, type, N, idx) \ +do \ + { \ + int i,val = idx * idx - idx * 3 + 16; \ + type res[N],exp[N]; \ + vtype resv; \ + for (i = 0; i < N; i++) \ + { \ + res[i] = i * i - i * 3 + 15; \ + exp[i] = res[i]; \ + } \ + exp[idx] = val; \ + resv = foo_##vtype (*(vtype *)&res[0], val, idx); \ + for (i = 0; i < N; i++) \ + { \ + if (resv[i] != exp[i]) \ + abort (); \ + } \ + } \ +while (0) + +static void +test_256 (void) +{ + CALC_TEST (v32qi, char, 32, 17); + CALC_TEST (v16hi, short, 16, 9); + CALC_TEST (v8si, int, 8, 3); + CALC_TEST (v4di, long long, 4, 1); +} + +static void +test_128 (void) +{ + CALC_TEST (v16qi, char, 16, 5); + CALC_TEST (v8hi, short, 8, 6); + CALC_TEST (v4si, int, 4, 2); + CALC_TEST (v2di, long long, 2, 0); +} -- 2.18.1