diff mbox

arc/nps400: New peephole2 pattern allow more cmem loads

Message ID 1479382467-25790-1-git-send-email-andrew.burgess@embecosm.com
State New
Headers show

Commit Message

Andrew Burgess Nov. 17, 2016, 11:34 a.m. UTC
In the case where we access a single bit from a value and use this in a
EQ/NE comparison, GCC will convert this into a sign-extend and GE/LT
comparison.

Normally this would be fine, however, if the value is in CMEM memory,
then we don't have a sign-extending load available (using the special
short CMEM load instructions), and instead we end up using a long form
load with LIMM, which is less efficient.

This peephole optimisation looks for the sign-extend followed by GE/LT
pattern and converts this back into a load and EQ/NE comparison.

gcc/ChangeLog:

	* config/arc/arc.md (cmem bit/sign-extend peephole2): New peephole
	to make better use of cmem loads in the case where a single bit is
	being accessed.
	* config/arc/predicates.md (ge_lt_comparison_operator): New
	predicate.

gcc/testsuite/ChangeLog:

	* gcc.target/arc/cmem-bit-1.c: New file.
	* gcc.target/arc/cmem-bit-2.c: New file.
	* gcc.target/arc/cmem-bit-3.c: New file.
	* gcc.target/arc/cmem-bit-4.c: New file.
---
 gcc/ChangeLog                             |  8 ++++++
 gcc/config/arc/arc.md                     | 43 +++++++++++++++++++++++++++++++
 gcc/config/arc/predicates.md              |  3 +++
 gcc/testsuite/ChangeLog                   |  7 +++++
 gcc/testsuite/gcc.target/arc/cmem-bit-1.c | 20 ++++++++++++++
 gcc/testsuite/gcc.target/arc/cmem-bit-2.c | 20 ++++++++++++++
 gcc/testsuite/gcc.target/arc/cmem-bit-3.c | 20 ++++++++++++++
 gcc/testsuite/gcc.target/arc/cmem-bit-4.c | 20 ++++++++++++++
 8 files changed, 141 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-bit-1.c
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-bit-2.c
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-bit-3.c
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-bit-4.c

Comments

Claudiu Zissulescu Nov. 17, 2016, 1:02 p.m. UTC | #1
Hi Andrew,

Approved, please apply, but ...

> +(define_peephole2
> +  [(set (match_operand:SI 0 "register_operand" "")
> +        (sign_extend:SI
> +          (match_operand:QI 1 "any_mem_operand" "")))
> +   (set (reg:CC_ZN CC_REG)
> +	(compare:CC_ZN (match_dup 0)
> +                       (const_int 0)))
> +   (set (pc)
> +        (if_then_else (match_operator 2 "ge_lt_comparison_operator"
> +                        [(reg:CC_ZN CC_REG) (const_int 0)])
> +                      (match_operand 3 "" "")
> +                      (match_operand 4 "" "")))]
> +  "TARGET_NPS_CMEM
> +   && cmem_address (XEXP (operands[1], 0), SImode)
> +   && peep2_reg_dead_p (2, operands[0])
> +   && peep2_regno_dead_p (3, CC_REG)"
> +  [(set (match_dup 0)
> +        (zero_extend:SI
> +          (match_dup 1)))
> +   (set (reg:CC_ZN CC_REG)
> +	(compare:CC_ZN (zero_extract:SI
> +                         (match_dup 0)
> +                         (const_int 1)
> +                         (const_int 7))
> +                       (const_int 0)))
> +   (set (pc)
> +        (if_then_else (match_dup 2)
> +                      (match_dup 3)
> +                      (match_dup 4)))]
> +  "if (GET_CODE (operands[2]) == GE)
> +     operands[2] = gen_rtx_EQ (VOIDmode, gen_rtx_REG (CC_ZNmode, 61),
> const0_rtx);
> +   else
> +     operands[2] = gen_rtx_NE (VOIDmode, gen_rtx_REG (CC_ZNmode, 61),
> const0_rtx);")
> +

It seems to me you use spaces instead of tabs.

Note on tests: It will be nice to add a test where the added peephole kicks in. If you consider to add this test to the current patch, please resubmit it.

Best,
Claudiu
Andrew Burgess Nov. 17, 2016, 1:20 p.m. UTC | #2
* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2016-11-17 13:02:02 +0000]:

> Hi Andrew,
> 
> Approved, please apply, but ...
> 
> > +(define_peephole2
> > +  [(set (match_operand:SI 0 "register_operand" "")
> > +        (sign_extend:SI
> > +          (match_operand:QI 1 "any_mem_operand" "")))
> > +   (set (reg:CC_ZN CC_REG)
> > +	(compare:CC_ZN (match_dup 0)
> > +                       (const_int 0)))
> > +   (set (pc)
> > +        (if_then_else (match_operator 2 "ge_lt_comparison_operator"
> > +                        [(reg:CC_ZN CC_REG) (const_int 0)])
> > +                      (match_operand 3 "" "")
> > +                      (match_operand 4 "" "")))]
> > +  "TARGET_NPS_CMEM
> > +   && cmem_address (XEXP (operands[1], 0), SImode)
> > +   && peep2_reg_dead_p (2, operands[0])
> > +   && peep2_regno_dead_p (3, CC_REG)"
> > +  [(set (match_dup 0)
> > +        (zero_extend:SI
> > +          (match_dup 1)))
> > +   (set (reg:CC_ZN CC_REG)
> > +	(compare:CC_ZN (zero_extract:SI
> > +                         (match_dup 0)
> > +                         (const_int 1)
> > +                         (const_int 7))
> > +                       (const_int 0)))
> > +   (set (pc)
> > +        (if_then_else (match_dup 2)
> > +                      (match_dup 3)
> > +                      (match_dup 4)))]
> > +  "if (GET_CODE (operands[2]) == GE)
> > +     operands[2] = gen_rtx_EQ (VOIDmode, gen_rtx_REG (CC_ZNmode, 61),
> > const0_rtx);
> > +   else
> > +     operands[2] = gen_rtx_NE (VOIDmode, gen_rtx_REG (CC_ZNmode, 61),
> > const0_rtx);")
> > +
> 
> It seems to me you use spaces instead of tabs.

Ooops.  I'll fix.

> 
> Note on tests: It will be nice to add a test where the added
> peephole kicks in. If you consider to add this test to the current
> patch, please resubmit it.

There were cmem-bit-{1,2,3,4}.c added in that patch.  All of which
fail for me without the peephole, and work with the peephole.

The code generated for L/E ARC is slightly different than the code
generated for B/E ARC due to how the structures are laid out in
memory, so, for now I've made parts of the test B/E only.

In order to get code that is as efficient for L/E as B/E I'd end up
adding a whole new peeophole, I'd rather not do that - it would be
extra code to maintain for a combination CMEM+L/E that is not used.  I
figure we can come back to that if/when that combination ever becomes
interesting.  I'm hoping you'll be fine with that.

Thanks,
Andrew
Claudiu Zissulescu Nov. 17, 2016, 1:23 p.m. UTC | #3
> > Note on tests: It will be nice to add a test where the added
> > peephole kicks in. If you consider to add this test to the current
> > patch, please resubmit it.
> 
> There were cmem-bit-{1,2,3,4}.c added in that patch.  All of which
> fail for me without the peephole, and work with the peephole.
> 
> The code generated for L/E ARC is slightly different than the code
> generated for B/E ARC due to how the structures are laid out in
> memory, so, for now I've made parts of the test B/E only.
> 
> In order to get code that is as efficient for L/E as B/E I'd end up
> adding a whole new peeophole, I'd rather not do that - it would be
> extra code to maintain for a combination CMEM+L/E that is not used.  I
> figure we can come back to that if/when that combination ever becomes
> interesting.  I'm hoping you'll be fine with that.
> 

Sure, please go ahead and apply your patch after having the spaces converted ;) 
Claudiu
Andrew Burgess Nov. 17, 2016, 10:41 p.m. UTC | #4
* Claudiu Zissulescu <Claudiu.Zissulescu@synopsys.com> [2016-11-17 13:23:44 +0000]:

> 
> > > Note on tests: It will be nice to add a test where the added
> > > peephole kicks in. If you consider to add this test to the current
> > > patch, please resubmit it.
> > 
> > There were cmem-bit-{1,2,3,4}.c added in that patch.  All of which
> > fail for me without the peephole, and work with the peephole.
> > 
> > The code generated for L/E ARC is slightly different than the code
> > generated for B/E ARC due to how the structures are laid out in
> > memory, so, for now I've made parts of the test B/E only.
> > 
> > In order to get code that is as efficient for L/E as B/E I'd end up
> > adding a whole new peeophole, I'd rather not do that - it would be
> > extra code to maintain for a combination CMEM+L/E that is not used.  I
> > figure we can come back to that if/when that combination ever becomes
> > interesting.  I'm hoping you'll be fine with that.
> > 
> 
> Sure, please go ahead and apply your patch after having the spaces converted ;) 
> Claudiu

Thanks, committed as r242572.

Andrew
diff mbox

Patch

diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index c494ca5..2d745cf 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -1465,6 +1465,49 @@ 
   [(set_attr "type" "cmove,cmove")
    (set_attr "length" "4,8")])
 
+;; When there's a mask of a single bit, and then a compare to 0 or 1,
+;; if the single bit is the sign bit, then GCC likes to convert this
+;; into a sign extend and a compare less than, or greater to zero.
+;; This is usually fine, except for the NXP400 where we have access to
+;; a bit test instruction, along with a special short load instruction
+;; (from CMEM), that doesn't support sign-extension on load.
+;;
+;; This peephole optimisation attempts to restore the use of bit-test
+;; in those cases where it is useful to do so.
+(define_peephole2
+  [(set (match_operand:SI 0 "register_operand" "")
+        (sign_extend:SI
+          (match_operand:QI 1 "any_mem_operand" "")))
+   (set (reg:CC_ZN CC_REG)
+	(compare:CC_ZN (match_dup 0)
+                       (const_int 0)))
+   (set (pc)
+        (if_then_else (match_operator 2 "ge_lt_comparison_operator"
+                        [(reg:CC_ZN CC_REG) (const_int 0)])
+                      (match_operand 3 "" "")
+                      (match_operand 4 "" "")))]
+  "TARGET_NPS_CMEM
+   && cmem_address (XEXP (operands[1], 0), SImode)
+   && peep2_reg_dead_p (2, operands[0])
+   && peep2_regno_dead_p (3, CC_REG)"
+  [(set (match_dup 0)
+        (zero_extend:SI
+          (match_dup 1)))
+   (set (reg:CC_ZN CC_REG)
+	(compare:CC_ZN (zero_extract:SI
+                         (match_dup 0)
+                         (const_int 1)
+                         (const_int 7))
+                       (const_int 0)))
+   (set (pc)
+        (if_then_else (match_dup 2)
+                      (match_dup 3)
+                      (match_dup 4)))]
+  "if (GET_CODE (operands[2]) == GE)
+     operands[2] = gen_rtx_EQ (VOIDmode, gen_rtx_REG (CC_ZNmode, 61), const0_rtx);
+   else
+     operands[2] = gen_rtx_NE (VOIDmode, gen_rtx_REG (CC_ZNmode, 61), const0_rtx);")
+
 ; Try to generate more short moves, and/or less limms, by substituting a
 ; conditional move with a conditional sub.
 (define_peephole2
diff --git a/gcc/config/arc/predicates.md b/gcc/config/arc/predicates.md
index cb75dbc..d8c9281 100644
--- a/gcc/config/arc/predicates.md
+++ b/gcc/config/arc/predicates.md
@@ -452,6 +452,9 @@ 
 (define_predicate "equality_comparison_operator"
   (match_code "eq, ne"))
 
+(define_predicate "ge_lt_comparison_operator"
+  (match_code "ge, lt"))
+
 (define_predicate "brcc_nolimm_operator"
   (ior (match_test "REG_P (XEXP (op, 1))")
        (and (match_code "eq, ne, lt, ge, ltu, geu")
diff --git a/gcc/testsuite/gcc.target/arc/cmem-bit-1.c b/gcc/testsuite/gcc.target/arc/cmem-bit-1.c
new file mode 100644
index 0000000..d49ab5c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/cmem-bit-1.c
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mcpu=nps400 -mcmem -O2" } */
+
+struct strange_bool
+{
+  unsigned char bool_bit   :1;
+  unsigned char other_bits :7;
+};
+
+struct strange_bool a_strange_bool __attribute__((section(".cmem")));
+
+extern void bar();
+
+void foo() {
+  if (a_strange_bool.bool_bit)
+    bar();
+}
+
+/* { dg-final { scan-assembler "xldb r\[0-9\]+,\\\[@a_strange_bool\\\]" } } */
+/* { dg-final { scan-assembler "btst_s r\[0-9\]+,7" { target arceb-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/arc/cmem-bit-2.c b/gcc/testsuite/gcc.target/arc/cmem-bit-2.c
new file mode 100644
index 0000000..45b49c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/cmem-bit-2.c
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mcpu=nps400 -mcmem -O2" } */
+
+struct strange_bool
+{
+  unsigned short bool_bit   :1;
+  unsigned short other_bits :15;
+};
+
+struct strange_bool a_strange_bool __attribute__((section(".cmem")));
+
+extern void bar();
+
+void foo() {
+  if (a_strange_bool.bool_bit)
+    bar();
+}
+
+/* { dg-final { scan-assembler "xldb r\[0-9\]+,\\\[@a_strange_bool\\\]" } } */
+/* { dg-final { scan-assembler "btst_s r\[0-9\]+,7" { target arceb-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/arc/cmem-bit-3.c b/gcc/testsuite/gcc.target/arc/cmem-bit-3.c
new file mode 100644
index 0000000..371ff2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/cmem-bit-3.c
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mcpu=nps400 -mcmem -O2" } */
+
+struct strange_bool
+{
+  unsigned int bool_bit   :1;
+  unsigned int other_bits :31;
+};
+
+struct strange_bool a_strange_bool __attribute__((section(".cmem")));
+
+extern void bar();
+
+void foo() {
+  if (a_strange_bool.bool_bit)
+    bar();
+}
+
+/* { dg-final { scan-assembler "xldb r\[0-9\]+,\\\[@a_strange_bool\\\]" } } */
+/* { dg-final { scan-assembler "btst_s r\[0-9\]+,7" { target arceb-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/arc/cmem-bit-4.c b/gcc/testsuite/gcc.target/arc/cmem-bit-4.c
new file mode 100644
index 0000000..a95c6ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/cmem-bit-4.c
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mcpu=nps400 -mcmem -O2" } */
+
+struct strange_bool
+{
+  unsigned long long bool_bit   :1;
+  unsigned long long other_bits :61;
+};
+
+struct strange_bool a_strange_bool __attribute__((section(".cmem")));
+
+extern void bar();
+
+void foo() {
+  if (a_strange_bool.bool_bit)
+    bar();
+}
+
+/* { dg-final { scan-assembler "xldb r\[0-9\]+,\\\[@a_strange_bool\\\]" } } */
+/* { dg-final { scan-assembler "btst_s r\[0-9\]+,7" { target arceb-*-* } } } */