diff mbox

[ARM] Fix two IT issues

Message ID 000001cf1107$e7429f20$b5c7dd60$@arm.com
State New
Headers show

Commit Message

Zhenqiang Chen Jan. 14, 2014, 9:06 a.m. UTC
Hi,

The patch fixes two IT issues:
1) IT block is splitted for A15 (and Cortex-M).

For A15 tune, max cond insns (max_insns_skipped) is 2, which is set as the
maximum allowed insns in an IT block (see thumb2_final_prescan_insn). So IT
which has 3 or 4 instructions, will be splitted. Take the first if-then-else
in the test case of the patch as example, it will generate ITs like:
	cmp	r0, #10
	ite	gt
	subgt	r0, r0, r1
	suble	r0, r1, r0
	ite	gt
	addgt	r0, r0, #10
	suble	r0, r0, #7
It does not make sense to split the IT. For cortex-m, the IT can not be
folded if the previous insn is 4 BYTES. 

2) For arm_v7m_tune, max cond insns is not aligned with its branch cost. In
ifcvt.c, the relation between MAX_CONDITIONAL_EXECUTE (max cond insns) and
branch cost is:

#ifndef MAX_CONDITIONAL_EXECUTE
#define MAX_CONDITIONAL_EXECUTE \
  (BRANCH_COST (optimize_function_for_speed_p (cfun), false) \
   + 1)
#endif

So the best value of max cond insns for v7m_tune should be 2.

Bootstrap and no make check regression on ARM Chrome book. No make check
regression for Cortex-M3.

Cortex-M4 O2 performance changes on coremark, dhrystone and eembc-v1:
coremark: -0.11%
dhrystone: 1.26%.
a2time01_lite: 2.63%
canrdr01_lite: 4.27%
iirflt01_lite: 6.51%
rspeed01_lite: 6.51%
dither01_lite: 7.36%

The biggest regression in eembc-v1 is pntrch01_lite: -0.51%
All other regressions < 0.1%

Cortex-M4 O3 performance changes are similar with O2, except one regression
due to loop alignment change.

OK for trunk?

Thanks!
-Zhenqiang

2014-01-14  Zhenqiang Chen  <zhenqiang.chen@arm.com>

	* config/arm/arm.c (arm_v7m_tune): Set max_insns_skipped to 2.
	(thumb2_final_prescan_insn): Set max to MAX_INSN_PER_IT_BLOCK.

testsuite/ChangeLog:
2014-01-14  Zhenqiang Chen  <zhenqiang.chen@arm.com>

	* gcc.target/arm/its.c: New test.

Comments

Richard Earnshaw Jan. 14, 2014, 9:34 a.m. UTC | #1
On 14/01/14 09:06, Zhenqiang Chen wrote:
> Hi,
> 
> The patch fixes two IT issues:
> 1) IT block is splitted for A15 (and Cortex-M).
> 
> For A15 tune, max cond insns (max_insns_skipped) is 2, which is set as the
> maximum allowed insns in an IT block (see thumb2_final_prescan_insn). So IT
> which has 3 or 4 instructions, will be splitted. Take the first if-then-else
> in the test case of the patch as example, it will generate ITs like:
> 	cmp	r0, #10
> 	ite	gt
> 	subgt	r0, r0, r1
> 	suble	r0, r1, r0
> 	ite	gt
> 	addgt	r0, r0, #10
> 	suble	r0, r0, #7
> It does not make sense to split the IT. For cortex-m, the IT can not be
> folded if the previous insn is 4 BYTES. 
> 
> 2) For arm_v7m_tune, max cond insns is not aligned with its branch cost. In
> ifcvt.c, the relation between MAX_CONDITIONAL_EXECUTE (max cond insns) and
> branch cost is:
> 
> #ifndef MAX_CONDITIONAL_EXECUTE
> #define MAX_CONDITIONAL_EXECUTE \
>   (BRANCH_COST (optimize_function_for_speed_p (cfun), false) \
>    + 1)
> #endif
> 
> So the best value of max cond insns for v7m_tune should be 2.
> 
> Bootstrap and no make check regression on ARM Chrome book. No make check
> regression for Cortex-M3.
> 
> Cortex-M4 O2 performance changes on coremark, dhrystone and eembc-v1:
> coremark: -0.11%
> dhrystone: 1.26%.
> a2time01_lite: 2.63%
> canrdr01_lite: 4.27%
> iirflt01_lite: 6.51%
> rspeed01_lite: 6.51%
> dither01_lite: 7.36%
> 
> The biggest regression in eembc-v1 is pntrch01_lite: -0.51%
> All other regressions < 0.1%
> 
> Cortex-M4 O3 performance changes are similar with O2, except one regression
> due to loop alignment change.
> 
> OK for trunk?
> 
> Thanks!
> -Zhenqiang
> 
> 2014-01-14  Zhenqiang Chen  <zhenqiang.chen@arm.com>
> 
> 	* config/arm/arm.c (arm_v7m_tune): Set max_insns_skipped to 2.
> 	(thumb2_final_prescan_insn): Set max to MAX_INSN_PER_IT_BLOCK.
> 
> testsuite/ChangeLog:
> 2014-01-14  Zhenqiang Chen  <zhenqiang.chen@arm.com>
> 
> 	* gcc.target/arm/its.c: New test.
> 

OK.

R.
diff mbox

Patch

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 39d23cc..8751675 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1696,7 +1696,7 @@  const struct tune_params arm_v7m_tune =
   &v7m_extra_costs,
   NULL,						/* Sched adj cost.
*/
   1,						/* Constant limit.  */
-  5,						/* Max cond insns.  */
+  2,						/* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   true,						/* Prefer constant
pool.  */
   arm_cortex_m_branch_cost,
@@ -22131,11 +22131,11 @@  thumb2_final_prescan_insn (rtx insn)
   int mask;
   int max;
 
-  /* Maximum number of conditionally executed instructions in a block
-     is minimum of the two max values: maximum allowed in an IT block
-     and maximum that is beneficial according to the cost model and tune.
*/
-  max = (max_insns_skipped < MAX_INSN_PER_IT_BLOCK) ?
-    max_insns_skipped : MAX_INSN_PER_IT_BLOCK;
+  /* max_insns_skipped in the tune was already taken into account in the
+     cost model of ifcvt pass when generating COND_EXEC insns.  At this
stage
+     just emit the IT blocks as we can.  It does not make sense to split
+     the IT blocks.  */
+  max = MAX_INSN_PER_IT_BLOCK;
 
   /* Remove the previous insn from the count of insns to be output.  */
   if (arm_condexec_count)
diff --git a/gcc/testsuite/gcc.target/arm/its.c
b/gcc/testsuite/gcc.target/arm/its.c
new file mode 100644
index 0000000..8eecdf4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/its.c
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" }  */
+int test (int a, int b)
+{
+  int r;
+  if (a > 10)
+    {
+      r = a - b;
+      r += 10;
+    }
+  else
+    {
+      r = b - a;
+      r -= 7;
+    }
+  if (r > 0)
+    r -= 3;
+  return r;
+}
+/* { dg-final { scan-assembler-times "\tit" 2 { target arm_thumb2 } } } */