Message ID | 00ce01d99e10$a7e04b20$f7a0e160$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | [x86] Convert ptestz of pandn into ptestc. | expand |
On Tue, Jun 13, 2023 at 6:03 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > This patch is the next instalment in a set of backend patches around > improvements to ptest/vptest. A previous patch optimized the sequence > t=pand(x,y); ptestz(t,t) into the equivalent ptestz(x,y), using the > property that ZF is set to (X&Y) == 0. This patch performs a similar > transformation, converting t=pandn(x,y); ptestz(t,t) into the (almost) > equivalent ptestc(y,x), using the property that the CF flags is set to > (~X&Y) == 0. The tricky bit is that this sets the CF flag instead of > the ZF flag, so we can only perform this transformation when we can > also convert the flags' consumer, as well as the producer. > > For the test case: > > int foo (__m128i x, __m128i y) > { > __m128i a = x & ~y; > return __builtin_ia32_ptestz128 (a, a); > } > > With -O2 -msse4.1 we previously generated: > > foo: pandn %xmm0, %xmm1 > xorl %eax, %eax > ptest %xmm1, %xmm1 > sete %al > ret > > with this patch we now generate: > > foo: xorl %eax, %eax > ptest %xmm0, %xmm1 > setc %al > ret > > At the same time, this patch also provides alternative fixes for > PR target/109973 and PR target/110118, by recognizing that ptestc(x,x) > always sets the carry flag (X&~X is always zero). This is achieved > both by recognizing the special case in ix86_expand_sse_ptest and with > a splitter to convert an eligible ptest into an stc. > > The next piece is, of course, STV of "if (x & ~y)..." > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > 2023-06-13 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386-expand.cc (ix86_expand_sse_ptest): Recognize > expansion of ptestc with equal operands as returning const1_rtx. > * config/i386/i386.cc (ix86_rtx_costs): Provide accurate cost > estimates of UNSPEC_PTEST, where the ptest performs the PAND > or PAND of its operands. > * config/i386/sse.md (define_split): Transform CCCmode UNSPEC_PTEST > of reg_equal_p operands into an x86_stc instruction. > (define_split): Split pandn/ptestz/setne into ptestc/setnc. > (define_split): Split pandn/ptestz/sete into ptestc/setc. > (define_split): Split pandn/ptestz/je into ptestc/jc. > (define_split): Split pandn/ptestz/jne into ptestc/jnc. > > gcc/testsuite/ChangeLog > * gcc.target/i386/avx-vptest-4.c: New test case. > * gcc.target/i386/avx-vptest-5.c: Likewise. > * gcc.target/i386/avx-vptest-6.c: Likewise. > * gcc.target/i386/pr109973-1.c: Update test case. > * gcc.target/i386/pr109973-2.c: Likewise. > * gcc.target/i386/sse4_1-ptest-4.c: New test case. > * gcc.target/i386/sse4_1-ptest-5.c: Likewise. > * gcc.target/i386/sse4_1-ptest-6.c: Likewise. > > > Thanks in advance, > Roger + /* ptest reg, reg sets the carry flag. */ + if (comparison == LTU + && (d->code == IX86_BUILTIN_PTESTC + || d->code == IX86_BUILTIN_PTESTC256) + && rtx_equal_p (op0, op1)) + return const1_rtx; In this function, a RTX that sets a target reg should be emitted, and a target register returned. I don't think the above code is correct. +;; pandn/ptestz/setne -> ptestc/setnc +(define_split + [(set (match_operand:QI 0 "register_operand") + (ne:QI Please note that setcc is a bit tricky on x86. You can actually set a register in QI/HI/SI/DImode, and post-reload splitters will do the correct extension (see the patterns in i386.md, after "For all sCOND expanders ..." comment). But you have to account for all these modes in the pre-reload splitter. Maybe you should use the "int248_register_operand" predicate to avoid pattern explosion. + (unspec:CCZ [ + (and:V_AVX (not:V_AVX (match_operand:V_AVX 1 "register_operand")) + (match_operand:V_AVX 2 "register_operand")) + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] + UNSPEC_PTEST) + (const_int 0)))] + "TARGET_SSE4_1" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) + (set (strict_low_part (subreg:QI (match_dup 0) 0)) + (geu:QI (reg:CCC FLAGS_REG) (const_int 0)))]) No need to set strict_low_part, just set a register with EQ/NE of CCCmode and post-reload splitters will do their magic. Please also note that you emit a QI subreg of a QI register here, which doesn't seem right. + +;; Changing the CCmode of FLAGS_REG requires updating both def and use. Does the above comment also apply to the above pattern? +;; pandn/ptestz/sete -> ptestc/setc +(define_split + [(set (strict_low_part (subreg:QI (match_operand:SI 0 "register_operand") 0)) + (eq:QI + (unspec:CCZ [ + (and:V_AVX (not:V_AVX (match_operand:V_AVX 1 "register_operand")) + (match_operand:V_AVX 2 "register_operand")) + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] + UNSPEC_PTEST) + (const_int 0)))] + "TARGET_SSE4_1" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) + (set (strict_low_part (subreg:QI (match_dup 0) 0)) + (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))]) These two patterns can be merged into one using the (somehow unfortunately named) "bt_comparison_operator" (a new name is much welcome...) operator predicate. The setc/setnc can easily be emitted using eq/ne:QI (reg:CCC FLAGS_REG) (const_int 0). +;; pandn/ptestz/je -> ptestc/jc +(define_split + [(set (pc) + (if_then_else + (ne + (unspec:CCZ [ + (and:V_AVX + (not:V_AVX (match_operand:V_AVX 1 "register_operand")) + (match_operand:V_AVX 2 "register_operand")) + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] + UNSPEC_PTEST) + (const_int 0)) + (match_operand 0) + (pc)))] + "TARGET_SSE4_1" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) + (set (pc) (if_then_else (geu (reg:CCC FLAGS_REG) (const_int 0)) + (match_dup 0) + (pc)))]) + +;; pandn/ptestz/jne -> ptestc/jnc +(define_split + [(set (pc) + (if_then_else + (eq + (unspec:CCZ [ + (and:V_AVX + (not:V_AVX (match_operand:V_AVX 1 "register_operand")) + (match_operand:V_AVX 2 "register_operand")) + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] + UNSPEC_PTEST) + (const_int 0)) + (match_operand 0) + (pc)))] + "TARGET_SSE4_1" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) + (set (pc) (if_then_else (ltu (reg:CCC FLAGS_REG) (const_int 0)) + (match_dup 0) + (pc)))]) Also the above two can be merged using "bt_comparison_operator" operator predicate. +/* { dg-final { scan-assembler "ptest" } } */ +/* { dg-final { scan-assembler "jn?c" } } */ +/* { dg-final { scan-assembler-not "pandn" } } */ +/* { dg-final { scan-assembler-not "jne" } } */ +/* { dg-final { scan-assembler-not "je" } } */ Please better use scan-assembler-times when checking asm of several functions. Otherwise, there could only be only one ptest (out of four) generated, and the test will still pass. +/* { dg-final { scan-assembler "ptest" } } */ +/* { dg-final { scan-assembler "setnc" } } */ +/* { dg-final { scan-assembler-not "pandn" } } */ +/* { dg-final { scan-assembler-not "setne" } } */ Also use scan-assembler-times when two functions are checked, same reason as above. Uros.
Hi Uros, Here's an updated version of this patch incorporating your comments. It uses emit_insn (target, const1_rtx), bt_comparison operator to combine the sete/setne to setc/setnc, and je/jne to jc/jnc patterns, uses scan-assembler-times in the test cases, and cleans up the silly cut'n'paste issue that mangled strict_low_part/subreg of a register that was already QImode. I tried, but the strict_low_part variant really is required (some of the new test cases fail without it), but things are much neater now, and have few patterns than the original. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-06-16 Roger Sayle <roger@nextmovesoftware.com> Uros Bizjak <ubizjak@gmail.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_sse_ptest): Recognize expansion of ptestc with equal operands as producing const1_rtx. * config/i386/i386.cc (ix86_rtx_costs): Provide accurate cost estimates of UNSPEC_PTEST, where the ptest performs the PAND or PAND of its operands. * config/i386/sse.md (define_split): Transform CCCmode UNSPEC_PTEST of reg_equal_p operands into an x86_stc instruction. (define_split): Split pandn/ptestz/set{n?}e into ptestc/set{n?}c. (define_split): Similar to above for strict_low_part destinations. (define_split): Split pandn/ptestz/j{n?}e into ptestc/j{n?}c. gcc/testsuite/ChangeLog * gcc.target/i386/avx-vptest-4.c: New test case. * gcc.target/i386/avx-vptest-5.c: Likewise. * gcc.target/i386/avx-vptest-6.c: Likewise. * gcc.target/i386/pr109973-1.c: Update test case. * gcc.target/i386/pr109973-2.c: Likewise. * gcc.target/i386/sse4_1-ptest-4.c: New test case. * gcc.target/i386/sse4_1-ptest-5.c: Likewise. * gcc.target/i386/sse4_1-ptest-6.c: Likewise. Thanks, Roger -- > -----Original Message----- > From: Uros Bizjak <ubizjak@gmail.com> > Sent: 14 June 2023 09:31 > To: Roger Sayle <roger@nextmovesoftware.com> > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [x86 PATCH] Convert ptestz of pandn into ptestc. > > On Tue, Jun 13, 2023 at 6:03 PM Roger Sayle <roger@nextmovesoftware.com> > wrote: > > > > > > This patch is the next instalment in a set of backend patches around > > improvements to ptest/vptest. A previous patch optimized the sequence > > t=pand(x,y); ptestz(t,t) into the equivalent ptestz(x,y), using the > > property that ZF is set to (X&Y) == 0. This patch performs a similar > > transformation, converting t=pandn(x,y); ptestz(t,t) into the (almost) > > equivalent ptestc(y,x), using the property that the CF flags is set to > > (~X&Y) == 0. The tricky bit is that this sets the CF flag instead of > > the ZF flag, so we can only perform this transformation when we can > > also convert the flags' consumer, as well as the producer. > > > > For the test case: > > > > int foo (__m128i x, __m128i y) > > { > > __m128i a = x & ~y; > > return __builtin_ia32_ptestz128 (a, a); } > > > > With -O2 -msse4.1 we previously generated: > > > > foo: pandn %xmm0, %xmm1 > > xorl %eax, %eax > > ptest %xmm1, %xmm1 > > sete %al > > ret > > > > with this patch we now generate: > > > > foo: xorl %eax, %eax > > ptest %xmm0, %xmm1 > > setc %al > > ret > > > > At the same time, this patch also provides alternative fixes for PR > > target/109973 and PR target/110118, by recognizing that ptestc(x,x) > > always sets the carry flag (X&~X is always zero). This is achieved > > both by recognizing the special case in ix86_expand_sse_ptest and with > > a splitter to convert an eligible ptest into an stc. > > > > The next piece is, of course, STV of "if (x & ~y)..." > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32} > > with no new failures. Ok for mainline? > > > > 2023-06-13 Roger Sayle <roger@nextmovesoftware.com> > > > > gcc/ChangeLog > > * config/i386/i386-expand.cc (ix86_expand_sse_ptest): Recognize > > expansion of ptestc with equal operands as returning const1_rtx. > > * config/i386/i386.cc (ix86_rtx_costs): Provide accurate cost > > estimates of UNSPEC_PTEST, where the ptest performs the PAND > > or PAND of its operands. > > * config/i386/sse.md (define_split): Transform CCCmode UNSPEC_PTEST > > of reg_equal_p operands into an x86_stc instruction. > > (define_split): Split pandn/ptestz/setne into ptestc/setnc. > > (define_split): Split pandn/ptestz/sete into ptestc/setc. > > (define_split): Split pandn/ptestz/je into ptestc/jc. > > (define_split): Split pandn/ptestz/jne into ptestc/jnc. > > > > gcc/testsuite/ChangeLog > > * gcc.target/i386/avx-vptest-4.c: New test case. > > * gcc.target/i386/avx-vptest-5.c: Likewise. > > * gcc.target/i386/avx-vptest-6.c: Likewise. > > * gcc.target/i386/pr109973-1.c: Update test case. > > * gcc.target/i386/pr109973-2.c: Likewise. > > * gcc.target/i386/sse4_1-ptest-4.c: New test case. > > * gcc.target/i386/sse4_1-ptest-5.c: Likewise. > > * gcc.target/i386/sse4_1-ptest-6.c: Likewise. > > > > > > Thanks in advance, > > Roger > > + /* ptest reg, reg sets the carry flag. */ if (comparison == LTU > + && (d->code == IX86_BUILTIN_PTESTC > + || d->code == IX86_BUILTIN_PTESTC256) > + && rtx_equal_p (op0, op1)) > + return const1_rtx; > > In this function, a RTX that sets a target reg should be emitted, and a target > register returned. I don't think the above code is correct. > > +;; pandn/ptestz/setne -> ptestc/setnc > +(define_split > + [(set (match_operand:QI 0 "register_operand") > + (ne:QI > > Please note that setcc is a bit tricky on x86. You can actually set a register in > QI/HI/SI/DImode, and post-reload splitters will do the correct extension (see the > patterns in i386.md, after "For all sCOND expanders ..." comment). But you have > to account for all these modes in the pre-reload splitter. Maybe you should use > the "int248_register_operand" predicate to avoid pattern explosion. > > + (unspec:CCZ [ > + (and:V_AVX (not:V_AVX (match_operand:V_AVX 1 "register_operand")) > + (match_operand:V_AVX 2 "register_operand")) > + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] > + UNSPEC_PTEST) > + (const_int 0)))] > + "TARGET_SSE4_1" > + [(set (reg:CCC FLAGS_REG) > + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) > + (set (strict_low_part (subreg:QI (match_dup 0) 0)) > + (geu:QI (reg:CCC FLAGS_REG) (const_int 0)))]) > > No need to set strict_low_part, just set a register with EQ/NE of CCCmode and > post-reload splitters will do their magic. Please also note that you emit a QI > subreg of a QI register here, which doesn't seem right. > > + > +;; Changing the CCmode of FLAGS_REG requires updating both def and use. > > Does the above comment also apply to the above pattern? > > +;; pandn/ptestz/sete -> ptestc/setc > +(define_split > + [(set (strict_low_part (subreg:QI (match_operand:SI 0 "register_operand") 0)) > + (eq:QI > + (unspec:CCZ [ > + (and:V_AVX (not:V_AVX (match_operand:V_AVX 1 "register_operand")) > + (match_operand:V_AVX 2 "register_operand")) > + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] > + UNSPEC_PTEST) > + (const_int 0)))] > + "TARGET_SSE4_1" > + [(set (reg:CCC FLAGS_REG) > + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) > + (set (strict_low_part (subreg:QI (match_dup 0) 0)) > + (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))]) > > These two patterns can be merged into one using the (somehow unfortunately > named) "bt_comparison_operator" (a new name is much > welcome...) operator predicate. The setc/setnc can easily be emitted using > eq/ne:QI (reg:CCC FLAGS_REG) (const_int 0). > > +;; pandn/ptestz/je -> ptestc/jc > +(define_split > + [(set (pc) > + (if_then_else > + (ne > + (unspec:CCZ [ > + (and:V_AVX > + (not:V_AVX (match_operand:V_AVX 1 "register_operand")) > + (match_operand:V_AVX 2 "register_operand")) > + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] > + UNSPEC_PTEST) > + (const_int 0)) > + (match_operand 0) > + (pc)))] > + "TARGET_SSE4_1" > + [(set (reg:CCC FLAGS_REG) > + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) > + (set (pc) (if_then_else (geu (reg:CCC FLAGS_REG) (const_int 0)) > + (match_dup 0) > + (pc)))]) > + > +;; pandn/ptestz/jne -> ptestc/jnc > +(define_split > + [(set (pc) > + (if_then_else > + (eq > + (unspec:CCZ [ > + (and:V_AVX > + (not:V_AVX (match_operand:V_AVX 1 "register_operand")) > + (match_operand:V_AVX 2 "register_operand")) > + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] > + UNSPEC_PTEST) > + (const_int 0)) > + (match_operand 0) > + (pc)))] > + "TARGET_SSE4_1" > + [(set (reg:CCC FLAGS_REG) > + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) > + (set (pc) (if_then_else (ltu (reg:CCC FLAGS_REG) (const_int 0)) > + (match_dup 0) > + (pc)))]) > > Also the above two can be merged using "bt_comparison_operator" > operator predicate. > > +/* { dg-final { scan-assembler "ptest" } } */ > +/* { dg-final { scan-assembler "jn?c" } } */ > +/* { dg-final { scan-assembler-not "pandn" } } */ > +/* { dg-final { scan-assembler-not "jne" } } */ > +/* { dg-final { scan-assembler-not "je" } } */ > > Please better use scan-assembler-times when checking asm of several functions. > Otherwise, there could only be only one ptest (out of four) generated, and the test > will still pass. > > +/* { dg-final { scan-assembler "ptest" } } */ > +/* { dg-final { scan-assembler "setnc" } } */ > +/* { dg-final { scan-assembler-not "pandn" } } */ > +/* { dg-final { scan-assembler-not "setne" } } */ > > Also use scan-assembler-times when two functions are checked, same reason as > above. > > Uros. diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index def060a..e844467 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -10222,6 +10222,18 @@ ix86_expand_sse_ptest (const struct builtin_description *d, tree exp, machine_mode mode1 = insn_data[d->icode].operand[1].mode; enum rtx_code comparison = d->comparison; + /* ptest reg, reg sets the carry flag. */ + if (comparison == LTU + && (d->code == IX86_BUILTIN_PTESTC + || d->code == IX86_BUILTIN_PTESTC256) + && rtx_equal_p (op0, op1)) + { + if (!target) + target = gen_reg_rtx (SImode); + emit_move_insn (target, const1_rtx); + return target; + } + if (VECTOR_MODE_P (mode0)) op0 = safe_vector_operand (op0, mode0); if (VECTOR_MODE_P (mode1)) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 3a1444d..3e99e23 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -21423,16 +21423,23 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, else if (XINT (x, 1) == UNSPEC_PTEST) { *total = cost->sse_op; - if (XVECLEN (x, 0) == 2 - && GET_CODE (XVECEXP (x, 0, 0)) == AND) + rtx test_op0 = XVECEXP (x, 0, 0); + if (!rtx_equal_p (test_op0, XVECEXP (x, 0, 1))) + return false; + if (GET_CODE (test_op0) == AND) { - rtx andop = XVECEXP (x, 0, 0); - *total += rtx_cost (XEXP (andop, 0), GET_MODE (andop), - AND, opno, speed) - + rtx_cost (XEXP (andop, 1), GET_MODE (andop), - AND, opno, speed); - return true; + rtx and_op0 = XEXP (test_op0, 0); + if (GET_CODE (and_op0) == NOT) + and_op0 = XEXP (and_op0, 0); + *total += rtx_cost (and_op0, GET_MODE (and_op0), + AND, 0, speed) + + rtx_cost (XEXP (test_op0, 1), GET_MODE (and_op0), + AND, 1, speed); } + else + *total = rtx_cost (test_op0, GET_MODE (test_op0), + UNSPEC, 0, speed); + return true; } return false; diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 9bec09d..ce90ae4 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -23147,6 +23147,70 @@ [(set (reg:CCZ FLAGS_REG) (unspec:CCZ [(match_dup 0) (match_dup 1)] UNSPEC_PTEST))]) +;; ptest reg,reg sets the carry flag. +(define_split + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_operand:V_AVX 0 "register_operand") + (match_operand:V_AVX 1 "register_operand")] + UNSPEC_PTEST))] + "TARGET_SSE4_1 + && rtx_equal_p (operands[0], operands[1])" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(const_int 0)] UNSPEC_STC))]) + +;; Changing the CCmode of FLAGS_REG requires updating both def and use. +;; pandn/ptestz/set{n?}e -> ptestc/set{n?}c +(define_split + [(set (match_operand:SWI 0 "register_operand") + (match_operator:SWI 3 "bt_comparison_operator" + [(unspec:CCZ [ + (and:V_AVX (not:V_AVX (match_operand:V_AVX 1 "register_operand")) + (match_operand:V_AVX 2 "register_operand")) + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] + UNSPEC_PTEST) + (const_int 0)]))] + "TARGET_SSE4_1" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) + (set (match_dup 0) + (match_op_dup 3 [(reg:CCC FLAGS_REG) (const_int 0)]))]) + +(define_split + [(set (strict_low_part (subreg:QI (match_operand:SI 0 "register_operand") 0)) + (match_operator:QI 3 "bt_comparison_operator" + [(unspec:CCZ [ + (and:V_AVX (not:V_AVX (match_operand:V_AVX 1 "register_operand")) + (match_operand:V_AVX 2 "register_operand")) + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] + UNSPEC_PTEST) + (const_int 0)]))] + "TARGET_SSE4_1" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) + (set (strict_low_part (subreg:QI (match_dup 0) 0)) + (match_op_dup 3 [(reg:CCC FLAGS_REG) (const_int 0)]))]) + +;; pandn/ptestz/j{n?}e -> ptestc/j{n?}c +(define_split + [(set (pc) + (if_then_else + (match_operator 3 "bt_comparison_operator" + [(unspec:CCZ [ + (and:V_AVX + (not:V_AVX (match_operand:V_AVX 1 "register_operand")) + (match_operand:V_AVX 2 "register_operand")) + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] + UNSPEC_PTEST) + (const_int 0)]) + (match_operand 0) + (pc)))] + "TARGET_SSE4_1" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) + (set (pc) (if_then_else (match_op_dup 3 [(reg:CCC FLAGS_REG) (const_int 0)]) + (match_dup 0) + (pc)))]) + (define_expand "nearbyint<mode>2" [(set (match_operand:VFH 0 "register_operand") (unspec:VFH diff --git a/gcc/testsuite/gcc.target/i386/avx-vptest-4.c b/gcc/testsuite/gcc.target/i386/avx-vptest-4.c new file mode 100644 index 0000000..0a234e1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-vptest-4.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ + +typedef long long __m256i __attribute__ ((__vector_size__ (32))); + +int foo (__m256i x, __m256i y) +{ + __m256i a = x & ~y; + return __builtin_ia32_ptestz256 (a, a); +} + +int bar (__m256i x, __m256i y) +{ + __m256i a = ~x & y; + return __builtin_ia32_ptestz256 (a, a); +} + +/* { dg-final { scan-assembler-times "vptest\[ \\t\]+%" 2 } } */ +/* { dg-final { scan-assembler-times "setc" 2 } } */ +/* { dg-final { scan-assembler-not "vpandn" } } */ +/* { dg-final { scan-assembler-not "sete" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx-vptest-5.c b/gcc/testsuite/gcc.target/i386/avx-vptest-5.c new file mode 100644 index 0000000..fd0e5e2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-vptest-5.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ + +typedef long long __m256i __attribute__ ((__vector_size__ (32))); + +int foo (__m256i x, __m256i y) +{ + __m256i a = x & ~y; + return !__builtin_ia32_ptestz256 (a, a); +} + +int bar (__m256i x, __m256i y) +{ + __m256i a = ~x & y; + return !__builtin_ia32_ptestz256 (a, a); +} + +/* { dg-final { scan-assembler-times "vptest\[ \\t\]+%" 2} } */ +/* { dg-final { scan-assembler-times "setnc" 2 } } */ +/* { dg-final { scan-assembler-not "vpandn" } } */ +/* { dg-final { scan-assembler-not "setne" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx-vptest-6.c b/gcc/testsuite/gcc.target/i386/avx-vptest-6.c new file mode 100644 index 0000000..5821a92 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-vptest-6.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ + +typedef long long __m256i __attribute__ ((__vector_size__ (32))); + +extern void ext (void); + +void foo (__m256i x, __m256i y) +{ + __m256i a = x & ~y; + if (__builtin_ia32_ptestz256 (a, a)) + ext(); +} + +void bar (__m256i x, __m256i y) +{ + __m256i a = ~x & y; + if (__builtin_ia32_ptestz256 (a, a)) + ext(); +} + +void foo2 (__m256i x, __m256i y) +{ + __m256i a = x & ~y; + if (__builtin_ia32_ptestz256 (a, a)) + ext(); +} + +void bar2 (__m256i x, __m256i y) +{ + __m256i a = ~x & y; + if (__builtin_ia32_ptestz256 (a, a)) + ext(); +} + +/* { dg-final { scan-assembler-times "ptest\[ \\t\]+%" 4 } } */ +/* { dg-final { scan-assembler-times "jn?c" 4 } } */ +/* { dg-final { scan-assembler-not "pandn" } } */ +/* { dg-final { scan-assembler-not "jne" } } */ +/* { dg-final { scan-assembler-not "je" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr109973-1.c b/gcc/testsuite/gcc.target/i386/pr109973-1.c index a1b6136b..1d812dd 100644 --- a/gcc/testsuite/gcc.target/i386/pr109973-1.c +++ b/gcc/testsuite/gcc.target/i386/pr109973-1.c @@ -10,4 +10,4 @@ foo (__m256i x, __m256i y) return __builtin_ia32_ptestc256 (a, a); } -/* { dg-final { scan-assembler "vpand" } } */ +/* { dg-final { scan-assembler "movl\[ \\t]*\\\$1, %eax" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr109973-2.c b/gcc/testsuite/gcc.target/i386/pr109973-2.c index 167f6ee..1068c3e 100644 --- a/gcc/testsuite/gcc.target/i386/pr109973-2.c +++ b/gcc/testsuite/gcc.target/i386/pr109973-2.c @@ -10,4 +10,4 @@ foo (__m128i x, __m128i y) return __builtin_ia32_ptestc128 (a, a); } -/* { dg-final { scan-assembler "pand" } } */ +/* { dg-final { scan-assembler "movl\[ \\t]*\\\$1, %eax" } } */ diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-ptest-4.c b/gcc/testsuite/gcc.target/i386/sse4_1-ptest-4.c new file mode 100644 index 0000000..e74ddb3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse4_1-ptest-4.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse4.1" } */ + +typedef long long __m128i __attribute__ ((__vector_size__ (16))); + +int foo (__m128i x, __m128i y) +{ + __m128i a = x & ~y; + return __builtin_ia32_ptestz128 (a, a); +} + +int bar (__m128i x, __m128i y) +{ + __m128i a = ~x & y; + return __builtin_ia32_ptestz128 (a, a); +} + +/* { dg-final { scan-assembler-times "ptest\[ \\t\]+%" 2 } } */ +/* { dg-final { scan-assembler-times "setc" 2 } } */ +/* { dg-final { scan-assembler-not "pandn" } } */ +/* { dg-final { scan-assembler-not "sete" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-ptest-5.c b/gcc/testsuite/gcc.target/i386/sse4_1-ptest-5.c new file mode 100644 index 0000000..74b0a8c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse4_1-ptest-5.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse4.1" } */ + +typedef long long __m128i __attribute__ ((__vector_size__ (16))); + +int foo (__m128i x, __m128i y) +{ + __m128i a = x & ~y; + return !__builtin_ia32_ptestz128 (a, a); +} + +int bar (__m128i x, __m128i y) +{ + __m128i a = ~x & y; + return !__builtin_ia32_ptestz128 (a, a); +} + +/* { dg-final { scan-assembler-times "ptest\[ \\t\]+%" 2 } } */ +/* { dg-final { scan-assembler-times "setnc" 2 } } */ +/* { dg-final { scan-assembler-not "pandn" } } */ +/* { dg-final { scan-assembler-not "setne" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-ptest-6.c b/gcc/testsuite/gcc.target/i386/sse4_1-ptest-6.c new file mode 100644 index 0000000..d9114bb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse4_1-ptest-6.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse4.1" } */ + +typedef long long __m128i __attribute__ ((__vector_size__ (16))); + +extern void ext (void); + +void foo (__m128i x, __m128i y) +{ + __m128i a = x & ~y; + if (__builtin_ia32_ptestz128 (a, a)) + ext(); +} + +void bar (__m128i x, __m128i y) +{ + __m128i a = ~x & y; + if (__builtin_ia32_ptestz128 (a, a)) + ext(); +} + +void foo2 (__m128i x, __m128i y) +{ + __m128i a = x & ~y; + if (__builtin_ia32_ptestz128 (a, a)) + ext(); +} + +void bar2 (__m128i x, __m128i y) +{ + __m128i a = ~x & y; + if (__builtin_ia32_ptestz128 (a, a)) + ext(); +} + +/* { dg-final { scan-assembler-times "ptest\[ \\t\]+%" 4 } } */ +/* { dg-final { scan-assembler-times "jn?c" 4 } } */ +/* { dg-final { scan-assembler-not "pandn" } } */ +/* { dg-final { scan-assembler-not "jne" } } */ +/* { dg-final { scan-assembler-not "je" } } */
On Fri, Jun 16, 2023 at 3:27 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > Hi Uros, > Here's an updated version of this patch incorporating your comments. > It uses emit_insn (target, const1_rtx), bt_comparison operator to > combine the sete/setne to setc/setnc, and je/jne to jc/jnc patterns, > uses scan-assembler-times in the test cases, and cleans up the silly > cut'n'paste issue that mangled strict_low_part/subreg of a register > that was already QImode. I tried, but the strict_low_part variant > really is required (some of the new test cases fail without it), but > things are much neater now, and have few patterns than the original. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2023-06-16 Roger Sayle <roger@nextmovesoftware.com> > Uros Bizjak <ubizjak@gmail.com> > > gcc/ChangeLog > * config/i386/i386-expand.cc (ix86_expand_sse_ptest): Recognize > expansion of ptestc with equal operands as producing const1_rtx. > * config/i386/i386.cc (ix86_rtx_costs): Provide accurate cost > estimates of UNSPEC_PTEST, where the ptest performs the PAND > or PAND of its operands. > * config/i386/sse.md (define_split): Transform CCCmode UNSPEC_PTEST > of reg_equal_p operands into an x86_stc instruction. > (define_split): Split pandn/ptestz/set{n?}e into ptestc/set{n?}c. > (define_split): Similar to above for strict_low_part destinations. > (define_split): Split pandn/ptestz/j{n?}e into ptestc/j{n?}c. > > gcc/testsuite/ChangeLog > * gcc.target/i386/avx-vptest-4.c: New test case. > * gcc.target/i386/avx-vptest-5.c: Likewise. > * gcc.target/i386/avx-vptest-6.c: Likewise. > * gcc.target/i386/pr109973-1.c: Update test case. > * gcc.target/i386/pr109973-2.c: Likewise. > * gcc.target/i386/sse4_1-ptest-4.c: New test case. > * gcc.target/i386/sse4_1-ptest-5.c: Likewise. > * gcc.target/i386/sse4_1-ptest-6.c: Likewise. +(define_split + [(set (strict_low_part (subreg:QI (match_operand:SI 0 "register_operand") 0)) I think you should use (set (strict_low_part (match_operand:QI 0 "register_operand")) ... here and ... + (set (strict_low_part (subreg:QI (match_dup 0) 0)) corresponding (set (strict_low_part (match_dup 0))... without explicit SUBREG here. This will handle all subregs automatically, as they are also matched by "register_operand" predicate. OK with the above change. Thanks, Uros.
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index def060a..1d11af2 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -10222,6 +10222,13 @@ ix86_expand_sse_ptest (const struct builtin_description *d, tree exp, machine_mode mode1 = insn_data[d->icode].operand[1].mode; enum rtx_code comparison = d->comparison; + /* ptest reg, reg sets the carry flag. */ + if (comparison == LTU + && (d->code == IX86_BUILTIN_PTESTC + || d->code == IX86_BUILTIN_PTESTC256) + && rtx_equal_p (op0, op1)) + return const1_rtx; + if (VECTOR_MODE_P (mode0)) op0 = safe_vector_operand (op0, mode0); if (VECTOR_MODE_P (mode1)) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 3a1444d..3e99e23 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -21423,16 +21423,23 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, else if (XINT (x, 1) == UNSPEC_PTEST) { *total = cost->sse_op; - if (XVECLEN (x, 0) == 2 - && GET_CODE (XVECEXP (x, 0, 0)) == AND) + rtx test_op0 = XVECEXP (x, 0, 0); + if (!rtx_equal_p (test_op0, XVECEXP (x, 0, 1))) + return false; + if (GET_CODE (test_op0) == AND) { - rtx andop = XVECEXP (x, 0, 0); - *total += rtx_cost (XEXP (andop, 0), GET_MODE (andop), - AND, opno, speed) - + rtx_cost (XEXP (andop, 1), GET_MODE (andop), - AND, opno, speed); - return true; + rtx and_op0 = XEXP (test_op0, 0); + if (GET_CODE (and_op0) == NOT) + and_op0 = XEXP (and_op0, 0); + *total += rtx_cost (and_op0, GET_MODE (and_op0), + AND, 0, speed) + + rtx_cost (XEXP (test_op0, 1), GET_MODE (and_op0), + AND, 1, speed); } + else + *total = rtx_cost (test_op0, GET_MODE (test_op0), + UNSPEC, 0, speed); + return true; } return false; diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 9bec09d..282bcbe 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -23147,6 +23147,92 @@ [(set (reg:CCZ FLAGS_REG) (unspec:CCZ [(match_dup 0) (match_dup 1)] UNSPEC_PTEST))]) +;; ptest reg,reg sets the carry flag. +(define_split + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_operand:V_AVX 0 "register_operand") + (match_operand:V_AVX 1 "register_operand")] + UNSPEC_PTEST))] + "TARGET_SSE4_1 + && rtx_equal_p (operands[0], operands[1])" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(const_int 0)] UNSPEC_STC))]) + +;; pandn/ptestz/setne -> ptestc/setnc +(define_split + [(set (match_operand:QI 0 "register_operand") + (ne:QI + (unspec:CCZ [ + (and:V_AVX (not:V_AVX (match_operand:V_AVX 1 "register_operand")) + (match_operand:V_AVX 2 "register_operand")) + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] + UNSPEC_PTEST) + (const_int 0)))] + "TARGET_SSE4_1" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) + (set (strict_low_part (subreg:QI (match_dup 0) 0)) + (geu:QI (reg:CCC FLAGS_REG) (const_int 0)))]) + +;; Changing the CCmode of FLAGS_REG requires updating both def and use. +;; pandn/ptestz/sete -> ptestc/setc +(define_split + [(set (strict_low_part (subreg:QI (match_operand:SI 0 "register_operand") 0)) + (eq:QI + (unspec:CCZ [ + (and:V_AVX (not:V_AVX (match_operand:V_AVX 1 "register_operand")) + (match_operand:V_AVX 2 "register_operand")) + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] + UNSPEC_PTEST) + (const_int 0)))] + "TARGET_SSE4_1" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) + (set (strict_low_part (subreg:QI (match_dup 0) 0)) + (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))]) + +;; pandn/ptestz/je -> ptestc/jc +(define_split + [(set (pc) + (if_then_else + (ne + (unspec:CCZ [ + (and:V_AVX + (not:V_AVX (match_operand:V_AVX 1 "register_operand")) + (match_operand:V_AVX 2 "register_operand")) + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] + UNSPEC_PTEST) + (const_int 0)) + (match_operand 0) + (pc)))] + "TARGET_SSE4_1" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) + (set (pc) (if_then_else (geu (reg:CCC FLAGS_REG) (const_int 0)) + (match_dup 0) + (pc)))]) + +;; pandn/ptestz/jne -> ptestc/jnc +(define_split + [(set (pc) + (if_then_else + (eq + (unspec:CCZ [ + (and:V_AVX + (not:V_AVX (match_operand:V_AVX 1 "register_operand")) + (match_operand:V_AVX 2 "register_operand")) + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] + UNSPEC_PTEST) + (const_int 0)) + (match_operand 0) + (pc)))] + "TARGET_SSE4_1" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) + (set (pc) (if_then_else (ltu (reg:CCC FLAGS_REG) (const_int 0)) + (match_dup 0) + (pc)))]) + (define_expand "nearbyint<mode>2" [(set (match_operand:VFH 0 "register_operand") (unspec:VFH diff --git a/gcc/testsuite/gcc.target/i386/avx-vptest-4.c b/gcc/testsuite/gcc.target/i386/avx-vptest-4.c new file mode 100644 index 0000000..4f16cc8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-vptest-4.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ + +typedef long long __m256i __attribute__ ((__vector_size__ (32))); + +int foo (__m256i x, __m256i y) +{ + __m256i a = x & ~y; + return __builtin_ia32_ptestz256 (a, a); +} + +int bar (__m256i x, __m256i y) +{ + __m256i a = ~x & y; + return __builtin_ia32_ptestz256 (a, a); +} + +/* { dg-final { scan-assembler "vptest" } } */ +/* { dg-final { scan-assembler "setc" } } */ +/* { dg-final { scan-assembler-not "vpandn" } } */ +/* { dg-final { scan-assembler-not "sete" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx-vptest-5.c b/gcc/testsuite/gcc.target/i386/avx-vptest-5.c new file mode 100644 index 0000000..21b1872 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-vptest-5.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ + +typedef long long __m256i __attribute__ ((__vector_size__ (32))); + +int foo (__m256i x, __m256i y) +{ + __m256i a = x & ~y; + return !__builtin_ia32_ptestz256 (a, a); +} + +int bar (__m256i x, __m256i y) +{ + __m256i a = ~x & y; + return !__builtin_ia32_ptestz256 (a, a); +} + +/* { dg-final { scan-assembler "vptest" } } */ +/* { dg-final { scan-assembler "setnc" } } */ +/* { dg-final { scan-assembler-not "vpandn" } } */ +/* { dg-final { scan-assembler-not "setne" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx-vptest-6.c b/gcc/testsuite/gcc.target/i386/avx-vptest-6.c new file mode 100644 index 0000000..c99e65f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-vptest-6.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ + +typedef long long __m256i __attribute__ ((__vector_size__ (32))); + +extern void ext (void); + +void foo (__m256i x, __m256i y) +{ + __m256i a = x & ~y; + if (__builtin_ia32_ptestz256 (a, a)) + ext(); +} + +void bar (__m256i x, __m256i y) +{ + __m256i a = ~x & y; + if (__builtin_ia32_ptestz256 (a, a)) + ext(); +} + +void foo2 (__m256i x, __m256i y) +{ + __m256i a = x & ~y; + if (__builtin_ia32_ptestz256 (a, a)) + ext(); +} + +void bar2 (__m256i x, __m256i y) +{ + __m256i a = ~x & y; + if (__builtin_ia32_ptestz256 (a, a)) + ext(); +} + +/* { dg-final { scan-assembler "ptest" } } */ +/* { dg-final { scan-assembler "jn?c" } } */ +/* { dg-final { scan-assembler-not "pandn" } } */ +/* { dg-final { scan-assembler-not "jne" } } */ +/* { dg-final { scan-assembler-not "je" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr109973-1.c b/gcc/testsuite/gcc.target/i386/pr109973-1.c index a1b6136b..1d812dd 100644 --- a/gcc/testsuite/gcc.target/i386/pr109973-1.c +++ b/gcc/testsuite/gcc.target/i386/pr109973-1.c @@ -10,4 +10,4 @@ foo (__m256i x, __m256i y) return __builtin_ia32_ptestc256 (a, a); } -/* { dg-final { scan-assembler "vpand" } } */ +/* { dg-final { scan-assembler "movl\[ \\t]*\\\$1, %eax" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr109973-2.c b/gcc/testsuite/gcc.target/i386/pr109973-2.c index 167f6ee..1068c3e 100644 --- a/gcc/testsuite/gcc.target/i386/pr109973-2.c +++ b/gcc/testsuite/gcc.target/i386/pr109973-2.c @@ -10,4 +10,4 @@ foo (__m128i x, __m128i y) return __builtin_ia32_ptestc128 (a, a); } -/* { dg-final { scan-assembler "pand" } } */ +/* { dg-final { scan-assembler "movl\[ \\t]*\\\$1, %eax" } } */ diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-ptest-4.c b/gcc/testsuite/gcc.target/i386/sse4_1-ptest-4.c new file mode 100644 index 0000000..999cff2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse4_1-ptest-4.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse4.1" } */ + +typedef long long __m128i __attribute__ ((__vector_size__ (16))); + +int foo (__m128i x, __m128i y) +{ + __m128i a = x & ~y; + return __builtin_ia32_ptestz128 (a, a); +} + +int bar (__m128i x, __m128i y) +{ + __m128i a = ~x & y; + return __builtin_ia32_ptestz128 (a, a); +} + +/* { dg-final { scan-assembler "ptest" } } */ +/* { dg-final { scan-assembler "setc" } } */ +/* { dg-final { scan-assembler-not "pandn" } } */ +/* { dg-final { scan-assembler-not "sete" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-ptest-5.c b/gcc/testsuite/gcc.target/i386/sse4_1-ptest-5.c new file mode 100644 index 0000000..c3a23da --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse4_1-ptest-5.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse4.1" } */ + +typedef long long __m128i __attribute__ ((__vector_size__ (16))); + +int foo (__m128i x, __m128i y) +{ + __m128i a = x & ~y; + return !__builtin_ia32_ptestz128 (a, a); +} + +int bar (__m128i x, __m128i y) +{ + __m128i a = ~x & y; + return !__builtin_ia32_ptestz128 (a, a); +} + +/* { dg-final { scan-assembler "ptest" } } */ +/* { dg-final { scan-assembler "setnc" } } */ +/* { dg-final { scan-assembler-not "pandn" } } */ +/* { dg-final { scan-assembler-not "setne" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-ptest-6.c b/gcc/testsuite/gcc.target/i386/sse4_1-ptest-6.c new file mode 100644 index 0000000..d49c6bc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse4_1-ptest-6.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse4.1" } */ + +typedef long long __m128i __attribute__ ((__vector_size__ (16))); + +extern void ext (void); + +void foo (__m128i x, __m128i y) +{ + __m128i a = x & ~y; + if (__builtin_ia32_ptestz128 (a, a)) + ext(); +} + +void bar (__m128i x, __m128i y) +{ + __m128i a = ~x & y; + if (__builtin_ia32_ptestz128 (a, a)) + ext(); +} + +void foo2 (__m128i x, __m128i y) +{ + __m128i a = x & ~y; + if (__builtin_ia32_ptestz128 (a, a)) + ext(); +} + +void bar2 (__m128i x, __m128i y) +{ + __m128i a = ~x & y; + if (__builtin_ia32_ptestz128 (a, a)) + ext(); +} + +/* { dg-final { scan-assembler "ptest" } } */ +/* { dg-final { scan-assembler "jn?c" } } */ +/* { dg-final { scan-assembler-not "pandn" } } */ +/* { dg-final { scan-assembler-not "jne" } } */ +/* { dg-final { scan-assembler-not "je" } } */