Message ID | 004b01d9c901$f5bb66b0$e1323410$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | PR target/107671: Make more use of btl/btq on x86_64. | expand |
On Mon, Aug 7, 2023 at 9:37 AM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > This patch is a partial solution to PR target/107671, updating Uros' > patch from comment #4, to catch both bit set (setc) and bit not set > (setnc) cases from the code in comment #2, when compiled on x86_64. > Unfortunately, this is a partial solution, as the pointer variants > in comment #1, aren't yet all optimized, and my attempts to check > whether the 32-bit versions are optimized with -m32 revealed they > also need further improvement. (Some of) These remaining issues > might best be fixed in the middle-end, in either match.pd or the > RTL optimizers, so I thought it reasonable to submit this independent > backend piece, and gain/bank the improvements on x86_64. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2023-08-07 Roger Sayle <roger@nextmovesoftware.com> > Uros Bizjak <ubizjak@gmail.com> > > gcc/ChangeLog > PR target/107671 > * config/i386/i386.md (*bt<mode>_setc<mode>_mask): Allow the > shift count to have a different mode (using SWI248) from either > the bit-test or the result. > (*bt<mode>_setnc<mode>_mask): New define_insn_and_split for the > setnc (bit not set) case of the above pattern. > (*btdi_setncsi_mask): New define_insn_and_split to handle the > SImode result from a DImode bit-test variant of the above patterns. > (*bt<mode>_setncqi_mask_2): New define_insn_and_split for the > setnc (bit not set) version of *bt<mode>_setcqi_mask_2. > > gcc/testsuite/ChangeLog > PR target/107671 > * gcc.target/i386/pr107671-1.c: New test case. > * gcc.target/i386/pr107671-2.c: Likewise. I am worried about the number of existing and new patterns that are introduced to satisfy creativity of the combine pass. The following can be handled via zero-extract RTXes: return ((v & (1 << (bitnum & 31)))) != 0; return ((v & (1L << (bitnum & 63)))) != 0; return (v >> (bitnum & 31)) & 1; return (v >> (bitnum & 63)) & 1; but there is no canonicalization for negative forms of the above constructs. For the above, the combine pass tries: (set (reg:SI 95) (zero_extract:SI (reg:SI 97) (const_int 1 [0x1]) (and:SI (reg:SI 98) (const_int 31 [0x1f])))) that is necessary to handle the change of compare mode from CCZ to CCC. However, negative forms try: (set (reg:QI 96) (eq:QI (zero_extract:SI (reg:SI 97) (const_int 1 [0x1]) (and:SI (reg:SI 98) (const_int 31 [0x1f]))) (const_int 0 [0]))) and: (set (reg:SI 95) (xor:SI (zero_extract:SI (reg:SI 97) (const_int 1 [0x1]) (and:SI (reg:SI 98) (const_int 31 [0x1f]))) (const_int 1 [0x1]))) and these are further different for SImode and DImode. Ideally, we would simplify all forms to: (set (reg:QI 96) (eq:QI (zero_extract:SI (reg:SI 97) (const_int 1 [0x1]) (and:SI (reg:SI 98) (const_int 31 [0x1f]))) (const_int 0 [0]))) where inverted/non-inverted forms would emit ne/eq:QI (....) (const_int 0). The result would be zero-extended to DI or SImode, depending on the target mode. You can already see the problem with missing canonicalization in i386.md with define_insn_and_split pattern with comments: ;; Help combine recognize bt followed by setc ;; Help combine recognize bt followed by setnc where totally different patterns are needed to match what combine produces. The above problem is specific to setcc patterns, where output value is derived from the input operand. jcc and cmov look OK. If the following pattern would be tried by combine, then we would handle all the testcases in the PR (plus some more, where output is in different mode than input): (set (reg:QI 96) ({eq,ne}:QI (zero_extract:SI (reg:SI 97) (const_int 1 [0x1]) (and:SI (reg:SI 98) (const_int 31 [0x1f]))) (const_int 0 [0]))) where QIreg is later zero-extended to the target width. In this case, one define_insn_and_split pattern would handle all testcases from PR107671. Please also note that we can implement this transformation via a combine splitter. The benefit of the combine splitter is that its results are immediately "recognized", and new RTXes can be propagated into subsequent combinations. I have made some measurements with my proposed patch (as posted in the PR), and the transformation never triggered (neither for gcc build, nor when building the linux kernel). So, I wonder if the added number of patterns outweigh the benefits at all. IMO, the correct way is to teach the combine pass some more about bit-test functionality (to also pass negative forms via zero-extract RTXes, see above). This would canonicalize the transformation and prevent pattern explosion. I consider bt to be quite an important instruction, where one instruction can substitute a bunch of shift/and/move insns. Perhaps a separate tree or RTX code would enable optimizations, other than those that combine can offer. [The above is excerpt of some private communication I had with Roger, now published for archival purposes] Uros. > > Roger > -- >
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index ba376f8..aa8946a 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -16405,19 +16405,19 @@ (ne:QI (reg:CCC FLAGS_REG) (const_int 0)))]) ;; Help combine recognize bt followed by setc -(define_insn_and_split "*bt<mode>_setc<mode>_mask" +(define_insn_and_split "*bt<SWI48:mode>_setc<SWI48:mode>_mask" [(set (match_operand:SWI48 0 "register_operand") (zero_extract:SWI48 (match_operand:SWI48 1 "register_operand") (const_int 1) (subreg:QI - (and:SWI48 - (match_operand:SWI48 2 "register_operand") + (and:SWI248 + (match_operand:SWI248 2 "register_operand") (match_operand 3 "const_int_operand")) 0))) (clobber (reg:CC FLAGS_REG))] "TARGET_USE_BT - && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) - == GET_MODE_BITSIZE (<MODE>mode)-1 + && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<SWI48:MODE>mode)-1)) + == GET_MODE_BITSIZE (<SWI48:MODE>mode)-1 && ix86_pre_reload_split ()" "#" "&& 1" @@ -16432,6 +16432,90 @@ operands[2] = gen_lowpart (QImode, operands[2]); operands[3] = gen_reg_rtx (QImode); }) + +;; Help combine recognize bt followed by setnc +(define_insn_and_split "*bt<SWI48:mode>_setnc<SWI48:mode>_mask" + [(set (match_operand:SWI48 0 "register_operand") + (and:SWI48 + (not:SWI48 + (lshiftrt:SWI48 + (match_operand:SWI48 1 "register_operand") + (subreg:QI + (and:SWI248 (match_operand:SWI248 2 "register_operand") + (match_operand 3 "const_int_operand")) 0))) + (const_int 1))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_USE_BT + && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<SWI48:MODE>mode)-1)) + == GET_MODE_BITSIZE (<SWI48:MODE>mode)-1 + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2)) + (const_int 0))) + (set (match_dup 3) + (ne:QI (reg:CCC FLAGS_REG) (const_int 0))) + (set (match_dup 0) (zero_extend:SWI48 (match_dup 3)))] +{ + operands[2] = gen_lowpart (QImode, operands[2]); + operands[3] = gen_reg_rtx (QImode); +}) + +;; Help combine recognize bt followed by setnc +(define_insn_and_split "*btdi_setncsi_mask" + [(set (match_operand:SI 0 "register_operand") + (and:SI + (not:SI + (subreg:SI + (lshiftrt:DI (match_operand:DI 1 "register_operand") + (subreg:QI + (and:SWI248 + (match_operand:SWI248 2 "register_operand") + (const_int 63)) 0)) 0)) + (const_int 1))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT && TARGET_USE_BT && ix86_pre_reload_split ()" + "#" + "&& 1" + [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extract:DI (match_dup 1) (const_int 1) (match_dup 2)) + (const_int 0))) + (set (match_dup 3) + (ne:QI (reg:CCC FLAGS_REG) (const_int 0))) + (set (match_dup 0) (zero_extend:SI (match_dup 3)))] +{ + operands[2] = gen_lowpart (QImode, operands[2]); + operands[3] = gen_reg_rtx (QImode); +}) + +;; Help combine recognize bt followed by setnc +(define_insn_and_split "*bt<SWI48:mode>_setncqi_mask_2" + [(set (match_operand:QI 0 "register_operand") + (eq:QI + (zero_extract:SWI48 + (match_operand:SWI48 1 "register_operand") + (const_int 1) + (subreg:QI + (and:SWI248 (match_operand:SWI248 2 "register_operand") + (match_operand 3 "const_int_operand")) 0)) + (const_int 0))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_USE_BT + && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<SWI48:MODE>mode)-1)) + == GET_MODE_BITSIZE (<SWI48:MODE>mode)-1 + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2)) + (const_int 0))) + (set (match_dup 0) + (ne:QI (reg:CCC FLAGS_REG) (const_int 0)))] + "operands[2] = gen_lowpart (QImode, operands[2]);") ;; Store-flag instructions. diff --git a/gcc/testsuite/gcc.target/i386/pr107671-1.c b/gcc/testsuite/gcc.target/i386/pr107671-1.c new file mode 100644 index 0000000..d05b178 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr107671-1.c @@ -0,0 +1,24 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2" } */ + +int bt32v_setb(const __UINT32_TYPE__ v, __UINT32_TYPE__ bitnum) +{ + return ((v & (1 << (bitnum & 31)))) != 0; +} + +int bt32v_setb2(const __UINT32_TYPE__ v, __UINT32_TYPE__ bitnum) +{ + return (v >> (bitnum & 31)) & 1; +} + +int bt32v_setae(const __UINT32_TYPE__ v, __UINT32_TYPE__ bitnum) +{ + return ((v & (1 << (bitnum & 31)))) == 0; +} + +int bt32v_setae2(const __UINT32_TYPE__ v, __UINT32_TYPE__ bitnum) +{ + return !((v >> (bitnum & 31)) & 1); +} + +/* { dg-final { scan-assembler-times "btl" 4 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr107671-2.c b/gcc/testsuite/gcc.target/i386/pr107671-2.c new file mode 100644 index 0000000..c90faea --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr107671-2.c @@ -0,0 +1,24 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2" } */ + +int bt64v_setb(const __UINT64_TYPE__ v, __UINT64_TYPE__ bitnum) +{ + return ((v & (1LL << (bitnum & 63)))) != 0; +} + +int bt64v_setb2(const __UINT64_TYPE__ v, __UINT64_TYPE__ bitnum) +{ + return (v >> (bitnum & 63)) & 1; +} + +int bt64v_setae(const __UINT64_TYPE__ v, __UINT64_TYPE__ bitnum) +{ + return ((v & (1LL << (bitnum & 63)))) == 0; +} + +int bt64v_setae2(const __UINT64_TYPE__ v, __UINT64_TYPE__ bitnum) +{ + return !((v >> (bitnum & 63)) & 1); +} + +/* { dg-final { scan-assembler-times "btq" 4 } } */