Message ID | CAFULd4Z5tQunoxdjn8D9UT2EN31PU-Ajbeg0UgCRUBDaUZ+SMg@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477] | expand |
On Mon, Jan 8, 2024 at 6:44 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > Instead of converting XOR or PLUS of two values, ANDed with two constants that > have no bits in common, to IOR expression, convert IOR or XOR of said two > ANDed values to PLUS expression. I think this only helps targets which have leal like instruction. Also I think it is the same issue as I recorded as PR 111763 . I suspect BIT_IOR is more of a Canonical form for GIMPLE while we should handle this in expand to decide if we want to use PLUS or IOR. Thanks, Andrew Pinski > > If we consider the following testcase: > > --cut here-- > unsigned int foo (unsigned int a, unsigned int b) > { > unsigned int r = a & 0x1; > unsigned int p = b & ~0x3; > > return r + p + 2; > } > > unsigned int bar (unsigned int a, unsigned int b) > { > unsigned int r = a & 0x1; > unsigned int p = b & ~0x3; > > return r | p | 2; > } > --cut here-- > > the above testcase compiles (x86_64 -O2) to: > > foo: > andl $1, %edi > andl $-4, %esi > orl %esi, %edi > leal 2(%rdi), %eax > ret > > bar: > andl $1, %edi > andl $-4, %esi > orl %esi, %edi > movl %edi, %eax > orl $2, %eax > ret > > There is no further simplification possible in any case, we can't combine > OR with a PLUS in the first case, and we don't have OR instruction with > multiple inputs in the second case. > > If we switch around the logic in the conversion and convert from IOR/XOR > to PLUS, then the resulting assembly reads: > > foo: > andl $-4, %esi > andl $1, %edi > leal 2(%rsi,%rdi), %eax > ret > > bar: > andl $1, %edi > andl $-4, %esi > leal (%rdi,%rsi), %eax > orl $2, %eax > ret > > On x86, the conversion can now use LEA instruction, which is much more > usable than OR instruction. In the first case, LEA implements three input > ADD instruction, while in the second case, even though the instruction > can't be combined with a follow-up OR, the non-destructive LEA avoids a move. > > PR target/108477 > > gcc/ChangeLog: > > * match.pd (A & CST1 | B & CST2 -> A & CST1 + B & CST2): > Do not convert PLUS of two values, ANDed with two constants > that have no bits in common to IOR exporession, convert > IOR or XOR of said two ANDed values to PLUS expression. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr108477.c: New test. > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > OK for mainline? > > Uros.
On 1/8/24 09:57, Andrew Pinski wrote: > On Mon, Jan 8, 2024 at 6:44 AM Uros Bizjak <ubizjak@gmail.com> wrote: >> >> Instead of converting XOR or PLUS of two values, ANDed with two constants that >> have no bits in common, to IOR expression, convert IOR or XOR of said two >> ANDed values to PLUS expression. > > I think this only helps targets which have leal like instruction. Also > I think it is the same issue as I recorded as PR 111763 . I suspect > BIT_IOR is more of a Canonical form for GIMPLE while we should handle > this in expand to decide if we want to use PLUS or IOR. Actually there's benefit on RISC-V to using PLUS over IOR/XOR when there's no bits in common. In fact, I've been asked to do that by Andrew W. for a case where we know ahead of time there's no bits in common in a sequence that currently uses IOR. Specifically it can allow more use of the compact instructions as the compact PLUS allows the full set of hard registers while compact IOR/XOR only allow a subset of registers. jeff
On Mon, Jan 8, 2024 at 5:57 PM Andrew Pinski <pinskia@gmail.com> wrote: > > On Mon, Jan 8, 2024 at 6:44 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that > > have no bits in common, to IOR expression, convert IOR or XOR of said two > > ANDed values to PLUS expression. > > I think this only helps targets which have leal like instruction. Also > I think it is the same issue as I recorded as PR 111763 . I suspect > BIT_IOR is more of a Canonical form for GIMPLE while we should handle > this in expand to decide if we want to use PLUS or IOR. For the pr108477.c testcase, expand pass expands: r_3 = a_2(D) & 1; p_5 = b_4(D) & 4294967292; _1 = r_3 | p_5; _6 = _1 + 2; return _6; The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so we need to determine values of constants. Is this information available in the expand pass? IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in the shown testcase would be beneficial when constructing control register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD sequence in this case. Uros.
On Mon, 8 Jan 2024, Jeff Law wrote: > > > On 1/8/24 09:57, Andrew Pinski wrote: > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote: > >> > >> Instead of converting XOR or PLUS of two values, ANDed with two constants > >> that > >> have no bits in common, to IOR expression, convert IOR or XOR of said two > >> ANDed values to PLUS expression. > > > > I think this only helps targets which have leal like instruction. Also > > I think it is the same issue as I recorded as PR 111763 . I suspect > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle > > this in expand to decide if we want to use PLUS or IOR. > Actually there's benefit on RISC-V to using PLUS over IOR/XOR when there's no > bits in common. In fact, I've been asked to do that by Andrew W. for a case > where we know ahead of time there's no bits in common in a sequence that > currently uses IOR. > > Specifically it can allow more use of the compact instructions as the compact > PLUS allows the full set of hard registers while compact IOR/XOR only allow a > subset of registers. Still the benefit of IOR is no undefined overflow issues and thus better re-association (on GIMPLE). Of course in the end what is better "depends", but when "depends" depends on what the target can do better it's indeed better to leave that to either RTL expansion or instruction selection. Richard.
On Mon, 8 Jan 2024, Uros Bizjak wrote: > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote: > > > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that > > > have no bits in common, to IOR expression, convert IOR or XOR of said two > > > ANDed values to PLUS expression. > > > > I think this only helps targets which have leal like instruction. Also > > I think it is the same issue as I recorded as PR 111763 . I suspect > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle > > this in expand to decide if we want to use PLUS or IOR. > > For the pr108477.c testcase, expand pass expands: > > r_3 = a_2(D) & 1; > p_5 = b_4(D) & 4294967292; > _1 = r_3 | p_5; > _6 = _1 + 2; > return _6; > > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so > we need to determine values of constants. Is this information > available in the expand pass? If there's single-uses then TER makes this info available. > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in > the shown testcase would be beneficial when constructing control > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD > sequence in this case. The other possibility is to expose LEA as optab and making GIMPLE instruction selection generate a direct internal function for that (that would be the "better" way). There is LEA-like &TARGET_MEM_REF but that has constraints on the addends mode (ptr_mode) which might not fit what the target can do? Otherwise that would be an existing way to do this computation as well. Richard.
On Tue, Jan 9, 2024 at 9:58 AM Richard Biener <rguenther@suse.de> wrote: > > On Mon, 8 Jan 2024, Uros Bizjak wrote: > > > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote: > > > > > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that > > > > have no bits in common, to IOR expression, convert IOR or XOR of said two > > > > ANDed values to PLUS expression. > > > > > > I think this only helps targets which have leal like instruction. Also > > > I think it is the same issue as I recorded as PR 111763 . I suspect > > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle > > > this in expand to decide if we want to use PLUS or IOR. > > > > For the pr108477.c testcase, expand pass expands: > > > > r_3 = a_2(D) & 1; > > p_5 = b_4(D) & 4294967292; > > _1 = r_3 | p_5; > > _6 = _1 + 2; > > return _6; > > > > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so > > we need to determine values of constants. Is this information > > available in the expand pass? > > If there's single-uses then TER makes this info available. > > > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in > > the shown testcase would be beneficial when constructing control > > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD > > sequence in this case. > > The other possibility is to expose LEA as optab and making GIMPLE > instruction selection generate a direct internal function for that > (that would be the "better" way). There is LEA-like &TARGET_MEM_REF > but that has constraints on the addends mode (ptr_mode) which might > not fit what the target can do? Otherwise that would be an existing > way to do this computation as well. I think there is no need for a new optab. If we can determine at expand time that ANDed values are fed to the IOR/XOR expressions, then we can check the constants and emit PLUS RTX instead. RTL combine pass will then create LEA instruction from separate PLUS instructions. So, we can emit: op0 = and (a, CST1) op1 = and (b, CST2) op2 = plus (op0, op1) RTX sequence for (a & CST1) | (b & CST2) when CST1 & CST2 == 0 and op0 = and (a, CST1) op1 = plus (op0, CST2) RTX sequence for (a & CST1) | CST2 when CST1 & CST2 == 0 The above transformation is valid for IOR and XOR. x86 can't combine IOR/XOR in any meaningful way, but can combine the sequence of PLUS (together with MULT) RTXes to LEA. (BTW: I am not versed in the expand stuff, so a disclaimer is at hand ;) ) Uros.
On Tue, 9 Jan 2024, Uros Bizjak wrote: > On Tue, Jan 9, 2024 at 9:58?AM Richard Biener <rguenther@suse.de> wrote: > > > > On Mon, 8 Jan 2024, Uros Bizjak wrote: > > > > > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote: > > > > > > > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that > > > > > have no bits in common, to IOR expression, convert IOR or XOR of said two > > > > > ANDed values to PLUS expression. > > > > > > > > I think this only helps targets which have leal like instruction. Also > > > > I think it is the same issue as I recorded as PR 111763 . I suspect > > > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle > > > > this in expand to decide if we want to use PLUS or IOR. > > > > > > For the pr108477.c testcase, expand pass expands: > > > > > > r_3 = a_2(D) & 1; > > > p_5 = b_4(D) & 4294967292; > > > _1 = r_3 | p_5; > > > _6 = _1 + 2; > > > return _6; > > > > > > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so > > > we need to determine values of constants. Is this information > > > available in the expand pass? > > > > If there's single-uses then TER makes this info available. > > > > > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in > > > the shown testcase would be beneficial when constructing control > > > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD > > > sequence in this case. > > > > The other possibility is to expose LEA as optab and making GIMPLE > > instruction selection generate a direct internal function for that > > (that would be the "better" way). There is LEA-like &TARGET_MEM_REF > > but that has constraints on the addends mode (ptr_mode) which might > > not fit what the target can do? Otherwise that would be an existing > > way to do this computation as well. > > I think there is no need for a new optab. If we can determine at > expand time that ANDed values are fed to the IOR/XOR expressions, then > we can check the constants and emit PLUS RTX instead. RTL combine pass > will then create LEA instruction from separate PLUS instructions. > > So, we can emit: > > op0 = and (a, CST1) > op1 = and (b, CST2) > op2 = plus (op0, op1) > > RTX sequence for (a & CST1) | (b & CST2) when CST1 & CST2 == 0 > > and > > op0 = and (a, CST1) > op1 = plus (op0, CST2) > > RTX sequence for (a & CST1) | CST2 when CST1 & CST2 == 0 > > The above transformation is valid for IOR and XOR. > > x86 can't combine IOR/XOR in any meaningful way, but can combine the > sequence of PLUS (together with MULT) RTXes to LEA. Btw, this looks like a three-insn combination even with IOR so a pattern for this case would work as well? Deciding whether to use PLUS or IOR at RTL expansion time would need to somehow figure out what's better for the target in question. I'm not sure how to do that? Richard.
On Tue, Jan 09, 2024 at 10:39:50AM +0100, Richard Biener wrote: > > x86 can't combine IOR/XOR in any meaningful way, but can combine the > > sequence of PLUS (together with MULT) RTXes to LEA. > > Btw, this looks like a three-insn combination even with IOR so a > pattern for this case would work as well? > > Deciding whether to use PLUS or IOR at RTL expansion time would > need to somehow figure out what's better for the target in question. > I'm not sure how to do that? Maybe a new optab which would be used when expanding BIT_IOR_EXPR/BIT_XOR_EXPR with operands which have no common bits (say in (get_nonzero_bits (arg1) & get_nonzero_bits (arg2)) == 0 sense)? x86 and riscv could just emit a PLUS in those cases... Jakub
On Tue, 9 Jan 2024, Uros Bizjak wrote: > On Tue, Jan 9, 2024 at 10:44?AM Richard Biener <rguenther@suse.de> wrote: > > > > On Tue, 9 Jan 2024, Uros Bizjak wrote: > > > > > On Tue, Jan 9, 2024 at 9:58?AM Richard Biener <rguenther@suse.de> wrote: > > > > > > > > On Mon, 8 Jan 2024, Uros Bizjak wrote: > > > > > > > > > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote: > > > > > > > > > > > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > > > > > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that > > > > > > > have no bits in common, to IOR expression, convert IOR or XOR of said two > > > > > > > ANDed values to PLUS expression. > > > > > > > > > > > > I think this only helps targets which have leal like instruction. Also > > > > > > I think it is the same issue as I recorded as PR 111763 . I suspect > > > > > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle > > > > > > this in expand to decide if we want to use PLUS or IOR. > > > > > > > > > > For the pr108477.c testcase, expand pass expands: > > > > > > > > > > r_3 = a_2(D) & 1; > > > > > p_5 = b_4(D) & 4294967292; > > > > > _1 = r_3 | p_5; > > > > > _6 = _1 + 2; > > > > > return _6; > > > > > > > > > > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so > > > > > we need to determine values of constants. Is this information > > > > > available in the expand pass? > > > > > > > > If there's single-uses then TER makes this info available. > > > > > > > > > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in > > > > > the shown testcase would be beneficial when constructing control > > > > > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD > > > > > sequence in this case. > > > > > > > > The other possibility is to expose LEA as optab and making GIMPLE > > > > instruction selection generate a direct internal function for that > > > > (that would be the "better" way). There is LEA-like &TARGET_MEM_REF > > > > but that has constraints on the addends mode (ptr_mode) which might > > > > not fit what the target can do? Otherwise that would be an existing > > > > way to do this computation as well. > > > > > > I think there is no need for a new optab. If we can determine at > > > expand time that ANDed values are fed to the IOR/XOR expressions, then > > > we can check the constants and emit PLUS RTX instead. RTL combine pass > > > will then create LEA instruction from separate PLUS instructions. > > > > > > So, we can emit: > > > > > > op0 = and (a, CST1) > > > op1 = and (b, CST2) > > > op2 = plus (op0, op1) > > > > > > RTX sequence for (a & CST1) | (b & CST2) when CST1 & CST2 == 0 > > > > > > and > > > > > > op0 = and (a, CST1) > > > op1 = plus (op0, CST2) > > > > > > RTX sequence for (a & CST1) | CST2 when CST1 & CST2 == 0 > > > > > > The above transformation is valid for IOR and XOR. > > > > > > x86 can't combine IOR/XOR in any meaningful way, but can combine the > > > sequence of PLUS (together with MULT) RTXes to LEA. > > > > Btw, this looks like a three-insn combination even with IOR so a > > pattern for this case would work as well? > > IIUC the question: x86 does not have three-input IOR, but we want to > emulate it with LEA (three-input PLUS, but one of the arguments has to > be constant). But couldn't you have a define_insn matching LEA but with IOR instead of PLUS (and with the appropriate constraints?). Maybe it could also be combine trying PLUS instead of IOR if that's possible (looking at the constants). > We can do the conversion only when masking constants do > not intersect. *However*, the original problem is that the compiler is > converting PLUS to IOR, preventing the generation of LEA. As shown in > the original PR, we would like to *prevent* the conversion from PLUS > to IOR on x86, because it interferes with (a & CST1 + b & CST2 + CST3) > - we would really want to generate LEA in this case. I'm throwing in the argument that the user could have written a & CST1 | b & CST2 + CST3 as well. > > Deciding whether to use PLUS or IOR at RTL expansion time would > > need to somehow figure out what's better for the target in question. > > I'm not sure how to do that? > > I think that a target hook would be needed. In addition to x86, Jeff > presented a very specialized use case for RISC-V that can't be > described otherwise. > > Uros. >
On Tue, Jan 9, 2024 at 10:44 AM Richard Biener <rguenther@suse.de> wrote: > > On Tue, 9 Jan 2024, Uros Bizjak wrote: > > > On Tue, Jan 9, 2024 at 9:58?AM Richard Biener <rguenther@suse.de> wrote: > > > > > > On Mon, 8 Jan 2024, Uros Bizjak wrote: > > > > > > > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote: > > > > > > > > > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > > > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that > > > > > > have no bits in common, to IOR expression, convert IOR or XOR of said two > > > > > > ANDed values to PLUS expression. > > > > > > > > > > I think this only helps targets which have leal like instruction. Also > > > > > I think it is the same issue as I recorded as PR 111763 . I suspect > > > > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle > > > > > this in expand to decide if we want to use PLUS or IOR. > > > > > > > > For the pr108477.c testcase, expand pass expands: > > > > > > > > r_3 = a_2(D) & 1; > > > > p_5 = b_4(D) & 4294967292; > > > > _1 = r_3 | p_5; > > > > _6 = _1 + 2; > > > > return _6; > > > > > > > > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so > > > > we need to determine values of constants. Is this information > > > > available in the expand pass? > > > > > > If there's single-uses then TER makes this info available. > > > > > > > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in > > > > the shown testcase would be beneficial when constructing control > > > > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD > > > > sequence in this case. > > > > > > The other possibility is to expose LEA as optab and making GIMPLE > > > instruction selection generate a direct internal function for that > > > (that would be the "better" way). There is LEA-like &TARGET_MEM_REF > > > but that has constraints on the addends mode (ptr_mode) which might > > > not fit what the target can do? Otherwise that would be an existing > > > way to do this computation as well. > > > > I think there is no need for a new optab. If we can determine at > > expand time that ANDed values are fed to the IOR/XOR expressions, then > > we can check the constants and emit PLUS RTX instead. RTL combine pass > > will then create LEA instruction from separate PLUS instructions. > > > > So, we can emit: > > > > op0 = and (a, CST1) > > op1 = and (b, CST2) > > op2 = plus (op0, op1) > > > > RTX sequence for (a & CST1) | (b & CST2) when CST1 & CST2 == 0 > > > > and > > > > op0 = and (a, CST1) > > op1 = plus (op0, CST2) > > > > RTX sequence for (a & CST1) | CST2 when CST1 & CST2 == 0 > > > > The above transformation is valid for IOR and XOR. > > > > x86 can't combine IOR/XOR in any meaningful way, but can combine the > > sequence of PLUS (together with MULT) RTXes to LEA. > > Btw, this looks like a three-insn combination even with IOR so a > pattern for this case would work as well? IIUC the question: x86 does not have three-input IOR, but we want to emulate it with LEA (three-input PLUS, but one of the arguments has to be constant). We can do the conversion only when masking constants do not intersect. *However*, the original problem is that the compiler is converting PLUS to IOR, preventing the generation of LEA. As shown in the original PR, we would like to *prevent* the conversion from PLUS to IOR on x86, because it interferes with (a & CST1 + b & CST2 + CST3) - we would really want to generate LEA in this case. > Deciding whether to use PLUS or IOR at RTL expansion time would > need to somehow figure out what's better for the target in question. > I'm not sure how to do that? I think that a target hook would be needed. In addition to x86, Jeff presented a very specialized use case for RISC-V that can't be described otherwise. Uros.
On Tue, Jan 9, 2024 at 11:06 AM Richard Biener <rguenther@suse.de> wrote: > > On Tue, 9 Jan 2024, Uros Bizjak wrote: > > > On Tue, Jan 9, 2024 at 10:44?AM Richard Biener <rguenther@suse.de> wrote: > > > > > > On Tue, 9 Jan 2024, Uros Bizjak wrote: > > > > > > > On Tue, Jan 9, 2024 at 9:58?AM Richard Biener <rguenther@suse.de> wrote: > > > > > > > > > > On Mon, 8 Jan 2024, Uros Bizjak wrote: > > > > > > > > > > > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote: > > > > > > > > > > > > > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > > > > > > > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that > > > > > > > > have no bits in common, to IOR expression, convert IOR or XOR of said two > > > > > > > > ANDed values to PLUS expression. > > > > > > > > > > > > > > I think this only helps targets which have leal like instruction. Also > > > > > > > I think it is the same issue as I recorded as PR 111763 . I suspect > > > > > > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle > > > > > > > this in expand to decide if we want to use PLUS or IOR. > > > > > > > > > > > > For the pr108477.c testcase, expand pass expands: > > > > > > > > > > > > r_3 = a_2(D) & 1; > > > > > > p_5 = b_4(D) & 4294967292; > > > > > > _1 = r_3 | p_5; > > > > > > _6 = _1 + 2; > > > > > > return _6; > > > > > > > > > > > > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so > > > > > > we need to determine values of constants. Is this information > > > > > > available in the expand pass? > > > > > > > > > > If there's single-uses then TER makes this info available. > > > > > > > > > > > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in > > > > > > the shown testcase would be beneficial when constructing control > > > > > > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD > > > > > > sequence in this case. > > > > > > > > > > The other possibility is to expose LEA as optab and making GIMPLE > > > > > instruction selection generate a direct internal function for that > > > > > (that would be the "better" way). There is LEA-like &TARGET_MEM_REF > > > > > but that has constraints on the addends mode (ptr_mode) which might > > > > > not fit what the target can do? Otherwise that would be an existing > > > > > way to do this computation as well. > > > > > > > > I think there is no need for a new optab. If we can determine at > > > > expand time that ANDed values are fed to the IOR/XOR expressions, then > > > > we can check the constants and emit PLUS RTX instead. RTL combine pass > > > > will then create LEA instruction from separate PLUS instructions. > > > > > > > > So, we can emit: > > > > > > > > op0 = and (a, CST1) > > > > op1 = and (b, CST2) > > > > op2 = plus (op0, op1) > > > > > > > > RTX sequence for (a & CST1) | (b & CST2) when CST1 & CST2 == 0 > > > > > > > > and > > > > > > > > op0 = and (a, CST1) > > > > op1 = plus (op0, CST2) > > > > > > > > RTX sequence for (a & CST1) | CST2 when CST1 & CST2 == 0 > > > > > > > > The above transformation is valid for IOR and XOR. > > > > > > > > x86 can't combine IOR/XOR in any meaningful way, but can combine the > > > > sequence of PLUS (together with MULT) RTXes to LEA. > > > > > > Btw, this looks like a three-insn combination even with IOR so a > > > pattern for this case would work as well? > > > > IIUC the question: x86 does not have three-input IOR, but we want to > > emulate it with LEA (three-input PLUS, but one of the arguments has to > > be constant). > > But couldn't you have a define_insn matching LEA but with IOR instead > of PLUS (and with the appropriate constraints?). Maybe it could > also be combine trying PLUS instead of IOR if that's possible > (looking at the constants). we would have to include masking ANDs in the define_insn and add additional conditions regarding mask constants in the insn constraint. So, this define_insn would look something like: (ior (ior (and (op1, CST1), and (op2, CST2)), CST3)) with (CST1 & CST2 & CST3) == 0 condition. and combinations with PLUS / MULT RTXes including all the variants with two arguments. Uros.
On Tue, Jan 9, 2024 at 11:19 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > On Tue, Jan 9, 2024 at 11:06 AM Richard Biener <rguenther@suse.de> wrote: > > > > On Tue, 9 Jan 2024, Uros Bizjak wrote: > > > > > On Tue, Jan 9, 2024 at 10:44?AM Richard Biener <rguenther@suse.de> wrote: > > > > > > > > On Tue, 9 Jan 2024, Uros Bizjak wrote: > > > > > > > > > On Tue, Jan 9, 2024 at 9:58?AM Richard Biener <rguenther@suse.de> wrote: > > > > > > > > > > > > On Mon, 8 Jan 2024, Uros Bizjak wrote: > > > > > > > > > > > > > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote: > > > > > > > > > > > > > > > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > > > > > > > > > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that > > > > > > > > > have no bits in common, to IOR expression, convert IOR or XOR of said two > > > > > > > > > ANDed values to PLUS expression. > > > > > > > > > > > > > > > > I think this only helps targets which have leal like instruction. Also > > > > > > > > I think it is the same issue as I recorded as PR 111763 . I suspect > > > > > > > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle > > > > > > > > this in expand to decide if we want to use PLUS or IOR. > > > > > > > > > > > > > > For the pr108477.c testcase, expand pass expands: > > > > > > > > > > > > > > r_3 = a_2(D) & 1; > > > > > > > p_5 = b_4(D) & 4294967292; > > > > > > > _1 = r_3 | p_5; > > > > > > > _6 = _1 + 2; > > > > > > > return _6; > > > > > > > > > > > > > > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so > > > > > > > we need to determine values of constants. Is this information > > > > > > > available in the expand pass? > > > > > > > > > > > > If there's single-uses then TER makes this info available. > > > > > > > > > > > > > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in > > > > > > > the shown testcase would be beneficial when constructing control > > > > > > > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD > > > > > > > sequence in this case. > > > > > > > > > > > > The other possibility is to expose LEA as optab and making GIMPLE > > > > > > instruction selection generate a direct internal function for that > > > > > > (that would be the "better" way). There is LEA-like &TARGET_MEM_REF > > > > > > but that has constraints on the addends mode (ptr_mode) which might > > > > > > not fit what the target can do? Otherwise that would be an existing > > > > > > way to do this computation as well. > > > > > > > > > > I think there is no need for a new optab. If we can determine at > > > > > expand time that ANDed values are fed to the IOR/XOR expressions, then > > > > > we can check the constants and emit PLUS RTX instead. RTL combine pass > > > > > will then create LEA instruction from separate PLUS instructions. > > > > > > > > > > So, we can emit: > > > > > > > > > > op0 = and (a, CST1) > > > > > op1 = and (b, CST2) > > > > > op2 = plus (op0, op1) > > > > > > > > > > RTX sequence for (a & CST1) | (b & CST2) when CST1 & CST2 == 0 > > > > > > > > > > and > > > > > > > > > > op0 = and (a, CST1) > > > > > op1 = plus (op0, CST2) > > > > > > > > > > RTX sequence for (a & CST1) | CST2 when CST1 & CST2 == 0 > > > > > > > > > > The above transformation is valid for IOR and XOR. > > > > > > > > > > x86 can't combine IOR/XOR in any meaningful way, but can combine the > > > > > sequence of PLUS (together with MULT) RTXes to LEA. > > > > > > > > Btw, this looks like a three-insn combination even with IOR so a > > > > pattern for this case would work as well? > > > > > > IIUC the question: x86 does not have three-input IOR, but we want to > > > emulate it with LEA (three-input PLUS, but one of the arguments has to > > > be constant). > > > > But couldn't you have a define_insn matching LEA but with IOR instead > > of PLUS (and with the appropriate constraints?). Maybe it could > > also be combine trying PLUS instead of IOR if that's possible > > (looking at the constants). > > we would have to include masking ANDs in the define_insn and add > additional conditions regarding mask constants in the insn constraint. > > So, this define_insn would look something like: > > (ior (ior (and (op1, CST1), and (op2, CST2)), CST3)) > > with (CST1 & CST2 & CST3) == 0 condition. > > and combinations with PLUS / MULT RTXes including all the variants > with two arguments. Oh, and then we would have to split masking ANDs out of the above instruction. We can't clobber the inputs, so this should be made with a temporary registers before reload... Uros.
diff --git a/gcc/match.pd b/gcc/match.pd index 7b4b15acc41..deac18a7635 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1830,18 +1830,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && element_precision (type) <= element_precision (TREE_TYPE (@1))) (bit_not (rop (convert @0) (convert @1)))))) -/* If we are XORing or adding two BIT_AND_EXPR's, both of which are and'ing +/* If we are ORing or XORing two BIT_AND_EXPR's, both of which are and'ing with a constant, and the two constants have no bits in common, - we should treat this as a BIT_IOR_EXPR since this may produce more + we should treat this as a PLUS_EXPR since this may produce more simplifications. */ -(for op (bit_xor plus) +(for op (bit_ior bit_xor) (simplify (op (convert1? (bit_and@4 @0 INTEGER_CST@1)) (convert2? (bit_and@5 @2 INTEGER_CST@3))) (if (tree_nop_conversion_p (type, TREE_TYPE (@0)) && tree_nop_conversion_p (type, TREE_TYPE (@2)) && (wi::to_wide (@1) & wi::to_wide (@3)) == 0) - (bit_ior (convert @4) (convert @5))))) + (plus (convert @4) (convert @5))))) /* (X | Y) ^ X -> Y & ~ X*/ (simplify diff --git a/gcc/testsuite/gcc.target/i386/pr108477.c b/gcc/testsuite/gcc.target/i386/pr108477.c new file mode 100644 index 00000000000..fb320a84c6d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr108477.c @@ -0,0 +1,13 @@ +/* PR target/108477 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -masm=att" } */ + +unsigned int foo (unsigned int a, unsigned int b) +{ + unsigned int r = a & 0x1; + unsigned int p = b & ~0x3; + + return r + p + 2; +} + +/* { dg-final { scan-assembler-not "orl" } } */