diff mbox series

PR target/107671: Make more use of btl/btq on x86_64.

Message ID 004b01d9c901$f5bb66b0$e1323410$@nextmovesoftware.com
State New
Headers show
Series PR target/107671: Make more use of btl/btq on x86_64. | expand

Commit Message

Roger Sayle Aug. 7, 2023, 7:37 a.m. UTC
This patch is a partial solution to PR target/107671, updating Uros'
patch from comment #4, to catch both bit set (setc) and bit not set
(setnc) cases from the code in comment #2, when compiled on x86_64.
Unfortunately, this is a partial solution, as the pointer variants
in comment #1, aren't yet all optimized, and my attempts to check
whether the 32-bit versions are optimized with -m32 revealed they
also need further improvement.  (Some of) These remaining issues
might best be fixed in the middle-end, in either match.pd or the
RTL optimizers, so I thought it reasonable to submit this independent
backend piece, and gain/bank the improvements on x86_64.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-08-07  Roger Sayle  <roger@nextmovesoftware.com>
            Uros Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog
        PR target/107671
        * config/i386/i386.md (*bt<mode>_setc<mode>_mask): Allow the
        shift count to have a different mode (using SWI248) from either
        the bit-test or the result.
        (*bt<mode>_setnc<mode>_mask): New define_insn_and_split for the
        setnc (bit not set) case of the above pattern.
        (*btdi_setncsi_mask): New define_insn_and_split to handle the
        SImode result from a DImode bit-test variant of the above patterns.
        (*bt<mode>_setncqi_mask_2): New define_insn_and_split for the
        setnc (bit not set) version of *bt<mode>_setcqi_mask_2.

gcc/testsuite/ChangeLog
        PR target/107671
        * gcc.target/i386/pr107671-1.c: New test case.
        * gcc.target/i386/pr107671-2.c: Likewise.


Roger
--

Comments

Uros Bizjak Aug. 7, 2023, 9:45 a.m. UTC | #1
On Mon, Aug 7, 2023 at 9:37 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch is a partial solution to PR target/107671, updating Uros'
> patch from comment #4, to catch both bit set (setc) and bit not set
> (setnc) cases from the code in comment #2, when compiled on x86_64.
> Unfortunately, this is a partial solution, as the pointer variants
> in comment #1, aren't yet all optimized, and my attempts to check
> whether the 32-bit versions are optimized with -m32 revealed they
> also need further improvement.  (Some of) These remaining issues
> might best be fixed in the middle-end, in either match.pd or the
> RTL optimizers, so I thought it reasonable to submit this independent
> backend piece, and gain/bank the improvements on x86_64.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-08-07  Roger Sayle  <roger@nextmovesoftware.com>
>             Uros Bizjak  <ubizjak@gmail.com>
>
> gcc/ChangeLog
>         PR target/107671
>         * config/i386/i386.md (*bt<mode>_setc<mode>_mask): Allow the
>         shift count to have a different mode (using SWI248) from either
>         the bit-test or the result.
>         (*bt<mode>_setnc<mode>_mask): New define_insn_and_split for the
>         setnc (bit not set) case of the above pattern.
>         (*btdi_setncsi_mask): New define_insn_and_split to handle the
>         SImode result from a DImode bit-test variant of the above patterns.
>         (*bt<mode>_setncqi_mask_2): New define_insn_and_split for the
>         setnc (bit not set) version of *bt<mode>_setcqi_mask_2.
>
> gcc/testsuite/ChangeLog
>         PR target/107671
>         * gcc.target/i386/pr107671-1.c: New test case.
>         * gcc.target/i386/pr107671-2.c: Likewise.

I am worried about the number of existing and new patterns that are
introduced to satisfy creativity of the combine pass. The following
can be handled via zero-extract RTXes:

    return ((v & (1  << (bitnum & 31)))) != 0;
    return ((v & (1L << (bitnum & 63)))) != 0;
    return (v >> (bitnum & 31)) & 1;
    return (v >> (bitnum & 63)) & 1;

but there is no canonicalization for negative forms of the above constructs.

For the above, the combine pass tries:

(set (reg:SI 95)
    (zero_extract:SI (reg:SI 97)
        (const_int 1 [0x1])
        (and:SI (reg:SI 98)
            (const_int 31 [0x1f]))))

that is necessary to handle the change of compare mode from CCZ to
CCC. However, negative forms try:

(set (reg:QI 96)
    (eq:QI (zero_extract:SI (reg:SI 97)
            (const_int 1 [0x1])
            (and:SI (reg:SI 98)
                (const_int 31 [0x1f])))
        (const_int 0 [0])))

and:

(set (reg:SI 95)
    (xor:SI (zero_extract:SI (reg:SI 97)
            (const_int 1 [0x1])
            (and:SI (reg:SI 98)
                (const_int 31 [0x1f])))
        (const_int 1 [0x1])))

and these are further different for SImode and DImode.

Ideally, we would simplify all forms to:

(set (reg:QI 96)
    (eq:QI (zero_extract:SI (reg:SI 97)
            (const_int 1 [0x1])
            (and:SI (reg:SI 98)
                (const_int 31 [0x1f])))
        (const_int 0 [0])))

where inverted/non-inverted forms would emit ne/eq:QI (....)
(const_int 0). The result would be zero-extended to DI or SImode,
depending on the target mode.

You can already see the problem with missing canonicalization in
i386.md with define_insn_and_split pattern with comments:

;; Help combine recognize bt followed by setc
;; Help combine recognize bt followed by setnc

where totally different patterns are needed to match what combine produces.

The above problem is specific to setcc patterns, where output value is
derived from the input operand. jcc and cmov look OK.

If the following pattern would be tried by combine, then we would
handle all the testcases in the PR (plus some more, where output is in
different mode than input):

(set (reg:QI 96)
    ({eq,ne}:QI (zero_extract:SI (reg:SI 97)
            (const_int 1 [0x1])
            (and:SI (reg:SI 98)
                (const_int 31 [0x1f])))
        (const_int 0 [0])))

where QIreg is later zero-extended to the target width. In this case,
one define_insn_and_split pattern would handle all testcases from
PR107671.

Please also note that we can implement this transformation via a
combine splitter. The benefit of the combine splitter is that its
results are immediately "recognized", and new RTXes can be propagated
into subsequent combinations.

I have made some measurements with my proposed patch (as posted in the
PR), and the transformation never triggered (neither for gcc build,
nor when building the linux kernel). So, I wonder if the added number
of patterns outweigh the benefits at all.

IMO, the correct way is to teach the combine pass some more about
bit-test functionality (to also pass negative forms via zero-extract
RTXes, see above). This would canonicalize the transformation and
prevent pattern explosion.

I consider bt to be quite an important instruction, where one
instruction can substitute a bunch of shift/and/move insns. Perhaps a
separate tree or RTX code would enable optimizations, other than those
that combine can offer.

[The above is excerpt of some private communication I had with Roger,
now published for archival purposes]

Uros.

>
> Roger
> --
>
diff mbox series

Patch

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ba376f8..aa8946a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16405,19 +16405,19 @@ 
         (ne:QI (reg:CCC FLAGS_REG) (const_int 0)))])
 
 ;; Help combine recognize bt followed by setc
-(define_insn_and_split "*bt<mode>_setc<mode>_mask"
+(define_insn_and_split "*bt<SWI48:mode>_setc<SWI48:mode>_mask"
   [(set (match_operand:SWI48 0 "register_operand")
 	(zero_extract:SWI48
  	  (match_operand:SWI48 1 "register_operand")
 	  (const_int 1)
 	  (subreg:QI
-	    (and:SWI48
-	      (match_operand:SWI48 2 "register_operand")
+	    (and:SWI248
+	      (match_operand:SWI248 2 "register_operand")
 	      (match_operand 3 "const_int_operand")) 0)))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_USE_BT
-   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
-      == GET_MODE_BITSIZE (<MODE>mode)-1
+   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<SWI48:MODE>mode)-1))
+      == GET_MODE_BITSIZE (<SWI48:MODE>mode)-1
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
@@ -16432,6 +16432,90 @@ 
   operands[2] = gen_lowpart (QImode, operands[2]);
   operands[3] = gen_reg_rtx (QImode);
 })
+
+;; Help combine recognize bt followed by setnc
+(define_insn_and_split "*bt<SWI48:mode>_setnc<SWI48:mode>_mask"
+  [(set (match_operand:SWI48 0 "register_operand")
+	(and:SWI48
+	  (not:SWI48
+	    (lshiftrt:SWI48
+	      (match_operand:SWI48 1 "register_operand")
+	      (subreg:QI
+		(and:SWI248 (match_operand:SWI248 2 "register_operand")
+			    (match_operand 3 "const_int_operand")) 0)))
+	  (const_int 1)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_USE_BT
+   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<SWI48:MODE>mode)-1))
+      == GET_MODE_BITSIZE (<SWI48:MODE>mode)-1
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (reg:CCC FLAGS_REG)
+        (compare:CCC
+         (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2))
+         (const_int 0)))
+   (set (match_dup 3)
+        (ne:QI (reg:CCC FLAGS_REG) (const_int 0)))
+   (set (match_dup 0) (zero_extend:SWI48 (match_dup 3)))]
+{
+  operands[2] = gen_lowpart (QImode, operands[2]);
+  operands[3] = gen_reg_rtx (QImode);
+})
+
+;; Help combine recognize bt followed by setnc
+(define_insn_and_split "*btdi_setncsi_mask"
+  [(set (match_operand:SI 0 "register_operand")
+	(and:SI
+	  (not:SI
+	    (subreg:SI
+	      (lshiftrt:DI (match_operand:DI 1 "register_operand")
+			   (subreg:QI
+			     (and:SWI248
+				(match_operand:SWI248 2 "register_operand")
+				(const_int 63)) 0)) 0))
+	  (const_int 1)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_64BIT && TARGET_USE_BT && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (reg:CCC FLAGS_REG)
+        (compare:CCC
+         (zero_extract:DI (match_dup 1) (const_int 1) (match_dup 2))
+         (const_int 0)))
+   (set (match_dup 3)
+        (ne:QI (reg:CCC FLAGS_REG) (const_int 0)))
+   (set (match_dup 0) (zero_extend:SI (match_dup 3)))]
+{
+  operands[2] = gen_lowpart (QImode, operands[2]);
+  operands[3] = gen_reg_rtx (QImode);
+})
+
+;; Help combine recognize bt followed by setnc
+(define_insn_and_split "*bt<SWI48:mode>_setncqi_mask_2"
+  [(set (match_operand:QI 0 "register_operand")
+	(eq:QI
+	  (zero_extract:SWI48
+ 	    (match_operand:SWI48 1 "register_operand")
+	    (const_int 1)
+	    (subreg:QI
+	      (and:SWI248 (match_operand:SWI248 2 "register_operand")
+			  (match_operand 3 "const_int_operand")) 0))
+	  (const_int 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_USE_BT
+   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<SWI48:MODE>mode)-1))
+      == GET_MODE_BITSIZE (<SWI48:MODE>mode)-1
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (reg:CCC FLAGS_REG)
+        (compare:CCC
+         (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2))
+         (const_int 0)))
+   (set (match_dup 0)
+        (ne:QI (reg:CCC FLAGS_REG) (const_int 0)))]
+  "operands[2] = gen_lowpart (QImode, operands[2]);")
 
 ;; Store-flag instructions.
 
diff --git a/gcc/testsuite/gcc.target/i386/pr107671-1.c b/gcc/testsuite/gcc.target/i386/pr107671-1.c
new file mode 100644
index 0000000..d05b178
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr107671-1.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+
+int bt32v_setb(const __UINT32_TYPE__ v, __UINT32_TYPE__ bitnum)
+{
+    return ((v & (1  << (bitnum & 31)))) != 0;
+}
+
+int bt32v_setb2(const __UINT32_TYPE__ v, __UINT32_TYPE__ bitnum)
+{
+    return (v >> (bitnum & 31)) & 1;
+}
+
+int bt32v_setae(const __UINT32_TYPE__ v, __UINT32_TYPE__ bitnum)
+{
+    return ((v & (1  << (bitnum & 31)))) == 0;
+}
+
+int bt32v_setae2(const __UINT32_TYPE__ v, __UINT32_TYPE__ bitnum)
+{
+    return !((v >> (bitnum & 31)) & 1);
+}
+
+/* { dg-final { scan-assembler-times "btl" 4 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr107671-2.c b/gcc/testsuite/gcc.target/i386/pr107671-2.c
new file mode 100644
index 0000000..c90faea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr107671-2.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+
+int bt64v_setb(const __UINT64_TYPE__ v, __UINT64_TYPE__ bitnum)
+{
+    return ((v & (1LL << (bitnum & 63)))) != 0;
+}
+
+int bt64v_setb2(const __UINT64_TYPE__ v, __UINT64_TYPE__ bitnum)
+{
+    return (v >> (bitnum & 63)) & 1;
+}
+
+int bt64v_setae(const __UINT64_TYPE__ v, __UINT64_TYPE__ bitnum)
+{
+    return ((v & (1LL << (bitnum & 63)))) == 0;
+}
+
+int bt64v_setae2(const __UINT64_TYPE__ v, __UINT64_TYPE__ bitnum)
+{
+    return !((v >> (bitnum & 63)) & 1);
+}
+
+/* { dg-final { scan-assembler-times "btq" 4 } } */