Message ID | 025701d88954$f281ca40$d7855ec0$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | [x86] PR rtl-optimization/96692: ((A|B)^C)^A using andn with -mbmi. | expand |
On Sun, Jun 26, 2022 at 2:04 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > This patch addresses PR rtl-optimization/96692 on x86_64, by providing > a define_split for combine to convert the three operation ((A|B)^C)^D > into a two operation sequence using andn when either A or B is the same > register as C or D. This is essentially a reassociation problem that's > only a win if the target supports an and-not instruction (as with -mbmi). > > Hence for the new test case: > > int f(int a, int b, int c) > { > return (a ^ b) ^ (a | c); > } > > GCC on x86_64-pc-linux-gnu wth -O2 -mbmi would previously generate: > > xorl %edi, %esi > orl %edx, %edi > movl %esi, %eax > xorl %edi, %eax > ret > > but with this patch now generates: > > andn %edx, %edi, %eax > xorl %esi, %eax > ret > > I'll investigate whether this optimization can also be implemented > more generically in simplify_rtx when the backend provides > accurate rtx_costs for "(and (not ..." (as there's no optab for > andn). > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures. Ok for mainline? > > > 2022-06-26 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > PR rtl-optimization/96692 > * config/i386/i386.md (define_split): Split ((A | B) ^ C) ^ D > as (X & ~Y) ^ Z on target BMI when either C or D is A or B. > > gcc/testsuite/ChangeLog > PR rtl-optimization/96692 > * gcc.target/i386/bmi-andn-4.c: New test case. + "TARGET_BMI + && ix86_pre_reload_split () + && (rtx_equal_p (operands[1], operands[3]) + || rtx_equal_p (operands[1], operands[4]) + || (REG_P (operands[2]) + && (rtx_equal_p (operands[2], operands[3]) + || rtx_equal_p (operands[2], operands[4]))))" You don't need a ix86_pre_reload_split for combine splitter* OTOH, please split the pattern to two for each commutative operand and use (match_dup x) instead. Something similar to [1]. *combine splitter is described in the documentation as the splitter pattern that does *not* match any existing insn pattern. [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596804.html Uros.
Hi! On Sun, Jun 26, 2022 at 07:07:35PM +0200, Uros Bizjak via Gcc-patches wrote: > On Sun, Jun 26, 2022 at 2:04 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > I'll investigate whether this optimization can also be implemented > > more generically in simplify_rtx when the backend provides > > accurate rtx_costs for "(and (not ..." (as there's no optab for > > andn). "Accurate rtx_cost" is a fallacy. insn_cost can be seen as somewhat accurate (in some simplified model of reality), but rtx_cost always misses too much context to be useful as anything but a rough estimate. > *combine splitter is described in the documentation as the splitter > pattern that does *not* match any existing insn pattern. Yeah, that documentation is somewhat misleading, in both directions :-( combine can and will and does use any splitter, whether or not there is an insn with that insn pattern. If there is an insn that passes recog() combine will not even try a split, but this is not the same thing at all: if the insn conditions are not met, the insn does not recog(). It is quite common to have the same instruction pattern with different insn conditions. In the other direction, other passes can call split_insns as well, of course. Nothing currently does, but that does not change much :-) Segher
Hi Uros, Thanks for the review. This patch implements all of your suggestions, both removing ix86_pre_reload_split from the combine splitter(s), and dividing the original splitter up into four simpler variants, that use match_dup to handle the variants/permutations caused by operator commutativity. This revised patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2022-07-04 Roger Sayle <roger@nextmovesoftware.com> Uroš Bizjak <ubizjak@gmail.com> gcc/ChangeLog PR rtl-optimization/96692 * config/i386/i386.md (define_split): Split ((A | B) ^ C) ^ D as (X & ~Y) ^ Z on target BMI when either C or D is A or B. gcc/testsuite/ChangeLog PR rtl-optimization/96692 * gcc.target/i386/bmi-andn-4.c: New test case. Thanks again, Roger -- > -----Original Message----- > From: Uros Bizjak <ubizjak@gmail.com> > Sent: 26 June 2022 18:08 > To: Roger Sayle <roger@nextmovesoftware.com> > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [x86 PATCH] PR rtl-optimization/96692: ((A|B)^C)^A using andn with > -mbmi. > > On Sun, Jun 26, 2022 at 2:04 PM Roger Sayle <roger@nextmovesoftware.com> > wrote: > > > > > > This patch addresses PR rtl-optimization/96692 on x86_64, by providing > > a define_split for combine to convert the three operation ((A|B)^C)^D > > into a two operation sequence using andn when either A or B is the > > same register as C or D. This is essentially a reassociation problem > > that's only a win if the target supports an and-not instruction (as with -mbmi). > > > > Hence for the new test case: > > > > int f(int a, int b, int c) > > { > > return (a ^ b) ^ (a | c); > > } > > > > GCC on x86_64-pc-linux-gnu wth -O2 -mbmi would previously generate: > > > > xorl %edi, %esi > > orl %edx, %edi > > movl %esi, %eax > > xorl %edi, %eax > > ret > > > > but with this patch now generates: > > > > andn %edx, %edi, %eax > > xorl %esi, %eax > > ret > > > > I'll investigate whether this optimization can also be implemented > > more generically in simplify_rtx when the backend provides accurate > > rtx_costs for "(and (not ..." (as there's no optab for andn). > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32}, > > with no new failures. Ok for mainline? > > > > > > 2022-06-26 Roger Sayle <roger@nextmovesoftware.com> > > > > gcc/ChangeLog > > PR rtl-optimization/96692 > > * config/i386/i386.md (define_split): Split ((A | B) ^ C) ^ D > > as (X & ~Y) ^ Z on target BMI when either C or D is A or B. > > > > gcc/testsuite/ChangeLog > > PR rtl-optimization/96692 > > * gcc.target/i386/bmi-andn-4.c: New test case. > > + "TARGET_BMI > + && ix86_pre_reload_split () > + && (rtx_equal_p (operands[1], operands[3]) > + || rtx_equal_p (operands[1], operands[4]) > + || (REG_P (operands[2]) > + && (rtx_equal_p (operands[2], operands[3]) > + || rtx_equal_p (operands[2], operands[4]))))" > > You don't need a ix86_pre_reload_split for combine splitter* > > OTOH, please split the pattern to two for each commutative operand and use > (match_dup x) instead. Something similar to [1]. > > *combine splitter is described in the documentation as the splitter pattern that > does *not* match any existing insn pattern. > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596804.html > > Uros. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 20c3b9a..d114754 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -10522,6 +10522,82 @@ (set (match_dup 0) (match_op_dup 1 [(and:SI (match_dup 3) (match_dup 2)) (const_int 0)]))]) + +;; Variant 1 of 4: Split ((A | B) ^ A) ^ C as (B & ~A) ^ C. +(define_split + [(set (match_operand:SWI48 0 "register_operand") + (xor:SWI48 + (xor:SWI48 + (ior:SWI48 (match_operand:SWI48 1 "register_operand") + (match_operand:SWI48 2 "nonimmediate_operand")) + (match_dup 1)) + (match_operand:SWI48 3 "nonimmediate_operand"))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_BMI" + [(parallel + [(set (match_dup 4) (and:SWI48 (not:SWI48 (match_dup 1)) (match_dup 2))) + (clobber (reg:CC FLAGS_REG))]) + (parallel + [(set (match_dup 0) (xor:SWI48 (match_dup 4) (match_dup 3))) + (clobber (reg:CC FLAGS_REG))])] + "operands[4] = gen_reg_rtx (<MODE>mode);") + +;; Variant 2 of 4: Split ((A | B) ^ B) ^ C as (A & ~B) ^ C. +(define_split + [(set (match_operand:SWI48 0 "register_operand") + (xor:SWI48 + (xor:SWI48 + (ior:SWI48 (match_operand:SWI48 1 "register_operand") + (match_operand:SWI48 2 "register_operand")) + (match_dup 2)) + (match_operand:SWI48 3 "nonimmediate_operand"))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_BMI" + [(parallel + [(set (match_dup 4) (and:SWI48 (not:SWI48 (match_dup 2)) (match_dup 1))) + (clobber (reg:CC FLAGS_REG))]) + (parallel + [(set (match_dup 0) (xor:SWI48 (match_dup 4) (match_dup 3))) + (clobber (reg:CC FLAGS_REG))])] + "operands[4] = gen_reg_rtx (<MODE>mode);") + +;; Variant 3 of 4: Split ((A | B) ^ C) ^ A as (B & ~A) ^ C. +(define_split + [(set (match_operand:SWI48 0 "register_operand") + (xor:SWI48 + (xor:SWI48 + (ior:SWI48 (match_operand:SWI48 1 "register_operand") + (match_operand:SWI48 2 "nonimmediate_operand")) + (match_operand:SWI48 3 "nonimmediate_operand")) + (match_dup 1))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_BMI" + [(parallel + [(set (match_dup 4) (and:SWI48 (not:SWI48 (match_dup 1)) (match_dup 2))) + (clobber (reg:CC FLAGS_REG))]) + (parallel + [(set (match_dup 0) (xor:SWI48 (match_dup 4) (match_dup 3))) + (clobber (reg:CC FLAGS_REG))])] + "operands[4] = gen_reg_rtx (<MODE>mode);") + +;; Variant 4 of 4: Split ((A | B) ^ C) ^ B as (A & ~B) ^ C. +(define_split + [(set (match_operand:SWI48 0 "register_operand") + (xor:SWI48 + (xor:SWI48 + (ior:SWI48 (match_operand:SWI48 1 "register_operand") + (match_operand:SWI48 2 "register_operand")) + (match_operand:SWI48 3 "nonimmediate_operand")) + (match_dup 2))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_BMI" + [(parallel + [(set (match_dup 4) (and:SWI48 (not:SWI48 (match_dup 2)) (match_dup 1))) + (clobber (reg:CC FLAGS_REG))]) + (parallel + [(set (match_dup 0) (xor:SWI48 (match_dup 4) (match_dup 3))) + (clobber (reg:CC FLAGS_REG))])] + "operands[4] = gen_reg_rtx (<MODE>mode);") ;; Logical inclusive and exclusive OR instructions diff --git a/gcc/testsuite/gcc.target/i386/bmi-andn-4.c b/gcc/testsuite/gcc.target/i386/bmi-andn-4.c new file mode 100644 index 0000000..fb89529 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/bmi-andn-4.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mbmi" } */ + +int f(int a, int b, int c) +{ + return (a ^ b) ^ (a | c); +} + +/* { dg-final { scan-assembler "andn\[ \\t\]+" } } */
On Mon, Jul 4, 2022 at 7:27 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > Hi Uros, > Thanks for the review. This patch implements all of your suggestions, both > removing ix86_pre_reload_split from the combine splitter(s), and dividing > the original splitter up into four simpler variants, that use match_dup to > handle the variants/permutations caused by operator commutativity. > > This revised patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} with no > new failures. Ok for mainline? > > 2022-07-04 Roger Sayle <roger@nextmovesoftware.com> > Uroš Bizjak <ubizjak@gmail.com> > > gcc/ChangeLog > PR rtl-optimization/96692 > * config/i386/i386.md (define_split): Split ((A | B) ^ C) ^ D > as (X & ~Y) ^ Z on target BMI when either C or D is A or B. > > gcc/testsuite/ChangeLog > PR rtl-optimization/96692 > * gcc.target/i386/bmi-andn-4.c: New test case. OK. Thanks, Uros. > > Thanks again, > Roger > -- > > > -----Original Message----- > > From: Uros Bizjak <ubizjak@gmail.com> > > Sent: 26 June 2022 18:08 > > To: Roger Sayle <roger@nextmovesoftware.com> > > Cc: gcc-patches@gcc.gnu.org > > Subject: Re: [x86 PATCH] PR rtl-optimization/96692: ((A|B)^C)^A using andn with > > -mbmi. > > > > On Sun, Jun 26, 2022 at 2:04 PM Roger Sayle <roger@nextmovesoftware.com> > > wrote: > > > > > > > > > This patch addresses PR rtl-optimization/96692 on x86_64, by providing > > > a define_split for combine to convert the three operation ((A|B)^C)^D > > > into a two operation sequence using andn when either A or B is the > > > same register as C or D. This is essentially a reassociation problem > > > that's only a win if the target supports an and-not instruction (as with -mbmi). > > > > > > Hence for the new test case: > > > > > > int f(int a, int b, int c) > > > { > > > return (a ^ b) ^ (a | c); > > > } > > > > > > GCC on x86_64-pc-linux-gnu wth -O2 -mbmi would previously generate: > > > > > > xorl %edi, %esi > > > orl %edx, %edi > > > movl %esi, %eax > > > xorl %edi, %eax > > > ret > > > > > > but with this patch now generates: > > > > > > andn %edx, %edi, %eax > > > xorl %esi, %eax > > > ret > > > > > > I'll investigate whether this optimization can also be implemented > > > more generically in simplify_rtx when the backend provides accurate > > > rtx_costs for "(and (not ..." (as there's no optab for andn). > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > > and make -k check, both with and without --target_board=unix{-m32}, > > > with no new failures. Ok for mainline? > > > > > > > > > 2022-06-26 Roger Sayle <roger@nextmovesoftware.com> > > > > > > gcc/ChangeLog > > > PR rtl-optimization/96692 > > > * config/i386/i386.md (define_split): Split ((A | B) ^ C) ^ D > > > as (X & ~Y) ^ Z on target BMI when either C or D is A or B. > > > > > > gcc/testsuite/ChangeLog > > > PR rtl-optimization/96692 > > > * gcc.target/i386/bmi-andn-4.c: New test case. > > > > + "TARGET_BMI > > + && ix86_pre_reload_split () > > + && (rtx_equal_p (operands[1], operands[3]) > > + || rtx_equal_p (operands[1], operands[4]) > > + || (REG_P (operands[2]) > > + && (rtx_equal_p (operands[2], operands[3]) > > + || rtx_equal_p (operands[2], operands[4]))))" > > > > You don't need a ix86_pre_reload_split for combine splitter* > > > > OTOH, please split the pattern to two for each commutative operand and use > > (match_dup x) instead. Something similar to [1]. > > > > *combine splitter is described in the documentation as the splitter pattern that > > does *not* match any existing insn pattern. > > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596804.html > > > > Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 3093cb5..27dddaf 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -10525,6 +10525,57 @@ (set (match_dup 0) (match_op_dup 1 [(and:SI (match_dup 3) (match_dup 2)) (const_int 0)]))]) + +;; Split ((A | B) ^ C) ^ D as (X & ~Y) ^ Z. +(define_split + [(set (match_operand:SWI48 0 "register_operand") + (xor:SWI48 + (xor:SWI48 + (ior:SWI48 (match_operand:SWI48 1 "register_operand") + (match_operand:SWI48 2 "nonimmediate_operand")) + (match_operand:SWI48 3 "nonimmediate_operand")) + (match_operand:SWI48 4 "nonimmediate_operand"))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_BMI + && ix86_pre_reload_split () + && (rtx_equal_p (operands[1], operands[3]) + || rtx_equal_p (operands[1], operands[4]) + || (REG_P (operands[2]) + && (rtx_equal_p (operands[2], operands[3]) + || rtx_equal_p (operands[2], operands[4]))))" + [(parallel + [(set (match_dup 5) (and:SWI48 (not:SWI48 (match_dup 6)) (match_dup 7))) + (clobber (reg:CC FLAGS_REG))]) + (parallel + [(set (match_dup 0) (xor:SWI48 (match_dup 5) (match_dup 8))) + (clobber (reg:CC FLAGS_REG))])] +{ + operands[5] = gen_reg_rtx (<MODE>mode); + if (rtx_equal_p (operands[1], operands[3])) + { + operands[6] = operands[1]; + operands[7] = operands[2]; + operands[8] = operands[4]; + } + else if (rtx_equal_p (operands[1], operands[4])) + { + operands[6] = operands[1]; + operands[7] = operands[2]; + operands[8] = operands[3]; + } + else if (rtx_equal_p (operands[2], operands[3])) + { + operands[6] = operands[2]; + operands[7] = operands[1]; + operands[8] = operands[4]; + } + else + { + operands[6] = operands[2]; + operands[7] = operands[1]; + operands[8] = operands[3]; + } +}) ;; Logical inclusive and exclusive OR instructions diff --git a/gcc/testsuite/gcc.target/i386/bmi-andn-4.c b/gcc/testsuite/gcc.target/i386/bmi-andn-4.c new file mode 100644 index 0000000..fb89529 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/bmi-andn-4.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mbmi" } */ + +int f(int a, int b, int c) +{ + return (a ^ b) ^ (a | c); +} + +/* { dg-final { scan-assembler "andn\[ \\t\]+" } } */