diff mbox

[RFC,PR66873] Use graphite for parloops

Message ID 55AE5340.2010700@mentor.com
State New
Headers show

Commit Message

Tom de Vries July 21, 2015, 2:12 p.m. UTC
On 21/07/15 02:21, Tom de Vries wrote:
> On 20/07/15 20:31, Sebastian Pop wrote:
>> Tom de Vries wrote:
>>> So I wondered, why not always use the graphite dependency analysis
>>> in parloops. (Of course you could use -floop-parallelize-all, but
>>> that also changes the heuristic). So I wrote a patch for parloops to
>>> use graphite dependency analysis by default (so without
>>> -floop-parallelize-all), but while testing found out that all the
>>> reduction test-cases started failing because the modifications
>>> graphite makes to the code messes up the parloops reduction
>>> analysis.
>>>
>>> Then I came up with this patch, which:
>>> - first runs a parloops pass, restricted to reduction loops only,
>>
>> I would prefer to fix graphite to catch the reduction loop and avoid
>> running an
>> extra pass before graphite for that case.
>
>> Can you please specify which file is
>> failing to be parallelized?  Are they all those testcases that you
>> update the flags?
>
> Yep, f.i. autopar/reduc-1.c.
>
>> Also it seems to me that you are missing -ffast-math to parallelize
>> all these
>> loops: without that flag graphite would not mark reductions as
>> associative/commutative operations and they would not be recognized as
>> parallel.
>
> For an unsigned int reduction, we need don't need -ffast-math, so we
> don't have to specify it for parloops. It seems graphite is too strict
> in that, since it won't do any reductions without -fassociate-math.
>
> But indeed, with -ffast-math -ftree-parallelize-loops=2
> -floop-parallelize-all we are able to parallelize the 3 reduction loops
> in autopar/reduc-1.c
>
>> Is that something the current parloops detection is not too strict about?
>
> Parloops uses vect_is_simple_reduction_1, which has some extensive
> testing to see if reordering of operations is allowed.

Hmm, vect_is_simple_reduction_1 seems to miss the check for 
TYPE_OVERFLOW_WRAPS.

> The testing of
> graphite seems to be limited to testing fassociative-math, which makes
> me suspect that tests are missing there, f.i. TYPE_OVERFLOW_TRAPS.

I could not enable both fassociative-math and -ftrapv, so I guess that 
case was covered implicitly.

Attached patch:
- adds the missing TYPE_OVERFLOW_WRAPS check in
   vect_is_simple_reduction_1
- enabled reductions in graphite, when safe
- introduces a macro FIXED_POINT_TYPE_OVERFLOW_WRAPS_P

Currently bootstrapping and reg-testing on x86_64.

Thanks,
- Tom

Comments

Sebastian Pop July 21, 2015, 6:42 p.m. UTC | #1
Tom de Vries wrote:
> Fix reduction safety checks
> 
> 	* graphite-sese-to-poly.c (is_reduction_operation_p): Limit
> 	flag_associative_math to SCALAR_FLOAT_TYPE_P.  Honour
> 	TYPE_OVERFLOW_TRAPS and TYPE_OVERFLOW_WRAPS for INTEGRAL_TYPE_P.
> 	Only allow wrapping fixed-point otherwise.
> 	(build_poly_scop): Always call
> 	rewrite_commutative_reductions_out_of_ssa.

The changes to graphite look good to me.

Thanks,
Sebastian
diff mbox

Patch

Fix reduction safety checks

2015-07-21  Tom de Vries  <tom@codesourcery.com>

	* tree.h (FIXED_POINT_TYPE_OVERFLOW_WRAPS_P): Define.
	* tree-ssa-reassoc.c (can_reassociate_p): Rewrite using
	FIXED_POINT_TYPE_OVERFLOW_WRAPS_P.
	* graphite-sese-to-poly.c (is_reduction_operation_p): Limit
	flag_associative_math to SCALAR_FLOAT_TYPE_P.  Honour
	TYPE_OVERFLOW_TRAPS and TYPE_OVERFLOW_WRAPS for INTEGRAL_TYPE_P.
	Only allow wrapping fixed-point otherwise.
	(build_poly_scop): Always call
	rewrite_commutative_reductions_out_of_ssa.
	* tree-vect-loop.c (vect_is_simple_reduction_1): Honour
	TYPE_OVERFLOW_WRAPS for INTEGRAL_TYPE_P. Rewrite using
	FIXED_POINT_TYPE_OVERFLOW_WRAPS_P.

	* gcc.dg/autopar/outer-4.c: Change reduction type to unsigned.
	* gcc.dg/autopar/outer-5.c: Same.
	* gcc.dg/autopar/outer-6.c: Same.
	* gcc.dg/autopar/reduc-2.c: Add -fwrapv to dg-options.
	* gcc.dg/autopar/reduc-8.c: Same.
	* gcc.dg/autopar/reduc-2char.c: Add -fwrapv to dg-options. Update
	scan-tree-dumps.
	* gcc.dg/autopar/reduc-2short.c: Same.
---
 gcc/graphite-sese-to-poly.c                 | 21 +++++++++++++++-----
 gcc/testsuite/gcc.dg/autopar/outer-4.c      |  6 +++---
 gcc/testsuite/gcc.dg/autopar/outer-5.c      |  8 ++++----
 gcc/testsuite/gcc.dg/autopar/outer-6.c      |  8 ++++----
 gcc/testsuite/gcc.dg/autopar/reduc-2.c      |  2 +-
 gcc/testsuite/gcc.dg/autopar/reduc-2char.c  |  6 +++---
 gcc/testsuite/gcc.dg/autopar/reduc-2short.c |  6 +++---
 gcc/testsuite/gcc.dg/autopar/reduc-8.c      |  2 +-
 gcc/tree-ssa-reassoc.c                      |  3 ++-
 gcc/tree-vect-loop.c                        | 30 +++++++++++++++++++++--------
 gcc/tree.h                                  | 12 ++++++++++++
 11 files changed, 71 insertions(+), 33 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 8960c3f..cb2204e 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -2604,9 +2604,21 @@  is_reduction_operation_p (gimple stmt)
   gcc_assert (is_gimple_assign (stmt));
   code = gimple_assign_rhs_code (stmt);
 
-  return flag_associative_math
-    && commutative_tree_code (code)
-    && associative_tree_code (code);
+  if (!commutative_tree_code (code)
+      || !associative_tree_code (code))
+    return false;
+
+  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+
+  if (SCALAR_FLOAT_TYPE_P (type))
+    return flag_associative_math;
+
+  if (INTEGRAL_TYPE_P (type))
+    return (!TYPE_OVERFLOW_TRAPS (type)
+	    && TYPE_OVERFLOW_WRAPS (type));
+
+  return (FIXED_POINT_TYPE_P (type)
+	  && FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (type));
 }
 
 /* Returns true when PHI contains an argument ARG.  */
@@ -3147,8 +3159,7 @@  build_poly_scop (scop_p scop)
   if (!scop_ivs_can_be_represented (scop))
     return;
 
-  if (flag_associative_math)
-    rewrite_commutative_reductions_out_of_ssa (scop);
+  rewrite_commutative_reductions_out_of_ssa (scop);
 
   build_sese_loop_nests (region);
   /* Record all conditions in REGION.  */
diff --git a/gcc/testsuite/gcc.dg/autopar/outer-4.c b/gcc/testsuite/gcc.dg/autopar/outer-4.c
index 6fd37c5..a57a0e4 100644
--- a/gcc/testsuite/gcc.dg/autopar/outer-4.c
+++ b/gcc/testsuite/gcc.dg/autopar/outer-4.c
@@ -3,14 +3,14 @@ 
 
 void abort (void);
 
-int g_sum=0;
-int x[500][500];
+unsigned int g_sum=0;
+unsigned int x[500][500];
 
 __attribute__((noinline))
 void parloop (int N)
 {
   int i, j;
-  int sum;
+  unsigned int sum;
 
   /* Double reduction is currently not supported, outer loop is not 
      parallelized.  Inner reduction is detected, inner loop is 
diff --git a/gcc/testsuite/gcc.dg/autopar/outer-5.c b/gcc/testsuite/gcc.dg/autopar/outer-5.c
index 6a0ae91..c1bda6a 100644
--- a/gcc/testsuite/gcc.dg/autopar/outer-5.c
+++ b/gcc/testsuite/gcc.dg/autopar/outer-5.c
@@ -3,9 +3,9 @@ 
 
 void abort (void);
 
-int x[500][500];
-int y[500];
-int g_sum=0;
+unsigned int x[500][500];
+unsigned int y[500];
+unsigned int g_sum=0;
 
 __attribute__((noinline))
 void init (int i, int j)
@@ -17,7 +17,7 @@  __attribute__((noinline))
 void parloop (int N)
 {
   int i, j;
-  int sum;
+  unsigned int sum;
 
   /* Inner cycle is currently not supported, outer loop is not 
      parallelized.  Inner reduction is detected, inner loop is 
diff --git a/gcc/testsuite/gcc.dg/autopar/outer-6.c b/gcc/testsuite/gcc.dg/autopar/outer-6.c
index 6bef7cc..9f90ba6 100644
--- a/gcc/testsuite/gcc.dg/autopar/outer-6.c
+++ b/gcc/testsuite/gcc.dg/autopar/outer-6.c
@@ -3,9 +3,9 @@ 
 
 void abort (void);
 
-int x[500][500];
-int y[500];
-int g_sum=0;
+unsigned int x[500][500];
+unsigned int y[500];
+unsigned int g_sum=0;
 
 __attribute__((noinline))
 void init (int i, int j)
@@ -17,7 +17,7 @@  __attribute__((noinline))
 void parloop (int N)
 {
   int i, j;
-  int sum;
+  unsigned int sum;
 
   /* Outer loop reduction, outerloop is parallelized.  */ 
   sum=0;
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2.c b/gcc/testsuite/gcc.dg/autopar/reduc-2.c
index 3ad16e4..ad78241 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized -fwrapv" } */
 
 #include <stdarg.h>
 #include <stdlib.h>
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
index 072489f..857a3cf 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized -fwrapv" } */
 
 #include <stdarg.h>
 #include <stdlib.h>
@@ -60,7 +60,7 @@  int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops" } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
index 4dbbc8a..dcf537e 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized -fwrapv" } */
 
 #include <stdarg.h>
 #include <stdlib.h>
@@ -59,6 +59,6 @@  int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops" } } */
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-8.c b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
index 16fb954..05f1126 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-8.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized -fwrapv" } */
 
 #include <stdlib.h>
 
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index efb813c..b48ae1e 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -4229,7 +4229,8 @@  can_reassociate_p (tree op)
 {
   tree type = TREE_TYPE (op);
   if ((INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type))
-      || NON_SAT_FIXED_POINT_TYPE_P (type)
+      || (FIXED_POINT_TYPE_P (type)
+	  && FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (type))
       || (flag_associative_math && FLOAT_TYPE_P (type)))
     return true;
   return false;
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 9145dbf..e014be2 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2613,16 +2613,30 @@  vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
 			"reduction: unsafe fp math optimization: ");
       return NULL;
     }
-  else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type)
-	   && check_reduction)
+  else if (INTEGRAL_TYPE_P (type) && check_reduction)
     {
-      /* Changing the order of operations changes the semantics.  */
-      if (dump_enabled_p ())
-	report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			"reduction: unsafe int math optimization: ");
-      return NULL;
+      if (TYPE_OVERFLOW_TRAPS (type))
+	{
+	  /* Changing the order of operations changes the semantics.  */
+	  if (dump_enabled_p ())
+	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+			    "reduction: unsafe int math optimization"
+			    " (overflow traps): ");
+	  return NULL;
+	}
+      if (!TYPE_OVERFLOW_WRAPS (type))
+	{
+	  /* Changing the order of operations changes the semantics.  */
+	  if (dump_enabled_p ())
+	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+			    "reduction: unsafe int math optimization"
+			    " (overflow doesn't wrap): ");
+	  return NULL;
+	}
     }
-  else if (SAT_FIXED_POINT_TYPE_P (type) && check_reduction)
+  else if (FIXED_POINT_TYPE_P (type)
+	   && !FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (type)
+	   && check_reduction)
     {
       /* Changing the order of operations changes the semantics.  */
       if (dump_enabled_p ())
diff --git a/gcc/tree.h b/gcc/tree.h
index 6df2217..f2ae669 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -501,6 +501,18 @@  extern void omp_clause_range_check_failed (const_tree, const char *, int,
 
 #define FIXED_POINT_TYPE_P(TYPE)	(TREE_CODE (TYPE) == FIXED_POINT_TYPE)
 
+/* Nonzero if fixed-point type TYPE wraps at overflow.
+
+   GCC support of fixed-point types as specified by the draft technical report
+   (N1169 draft of ISO/IEC DTR 18037) is incomplete: Pragmas to control overflow
+   and rounding behaviors are not implemented.
+
+   So, if not saturating, we assume modular wrap-around (see Annex E.4 Modwrap
+   overflow).  */
+
+#define FIXED_POINT_TYPE_OVERFLOW_WRAPS_P(TYPE) \
+  (NON_SAT_FIXED_POINT_TYPE_P (TYPE))
+
 /* Nonzero if TYPE represents a scalar floating-point type.  */
 
 #define SCALAR_FLOAT_TYPE_P(TYPE) (TREE_CODE (TYPE) == REAL_TYPE)
-- 
1.9.1