diff mbox series

[Ping^3] gomp: Various fixes for SVE types [PR101018]

Message ID mptzg9hyhce.fsf_-_@arm.com
State New
Headers show
Series [Ping^3] gomp: Various fixes for SVE types [PR101018] | expand

Commit Message

Richard Sandiford Feb. 13, 2023, 10:45 a.m. UTC
Ping^3 [https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606741.html]

----

Various parts of the omp code checked whether the size of a decl
was an INTEGER_CST in order to determine whether the decl was
variable-sized or not.  If it was variable-sized, it was expected
to have a DECL_VALUE_EXPR replacement, as for VLAs.

This patch uses poly_int_tree_p instead, so that variable-length
SVE vectors are treated like constant-length vectors.  This means
that some structures become poly_int-sized, with some fields at
poly_int offsets, but we already have code to handle that.

An alternative would have been to handle the data via indirection
instead.  However, that's likely to be more complicated, and it
would contradict is_variable_sized, which already uses a check
for TREE_CONSTANT rather than INTEGER_CST.

gimple_add_tmp_var should probably not add a safelen of 1
for SVE vectors, but that's really a separate thing and might
be hard to test.

Tested on aarch64-linux-gnu.  OK to install?

Richard


gcc/
	PR middle-end/101018
	* poly-int.h (can_and_p): New function.
	* fold-const.cc (poly_int_binop): Use it to optimize BIT_AND_EXPRs
	involving POLY_INT_CSTs.
	* expr.cc (get_inner_reference): Fold poly_uint64 size_trees
	into the constant bitsize.
	* gimplify.cc (gimplify_bind_expr): Use poly_int_tree_p instead
	of INTEGER_CST when checking for constant-sized omp data.
	(omp_add_variable): Likewise.
	(omp_notice_variable): Likewise.
	(gimplify_adjust_omp_clauses_1): Likewise.
	(gimplify_adjust_omp_clauses): Likewise.
	* omp-low.cc (scan_sharing_clauses): Likewise.
	(lower_omp_target): Likewise.

gcc/testsuite/
	PR middle-end/101018
	* gcc.target/aarch64/sve/acle/pr101018-1.c: New test.
	* gcc.target/aarch64/sve/acle/pr101018-2.c: Likewise
---
 gcc/expr.cc                                   |  4 +--
 gcc/fold-const.cc                             |  7 +++++
 gcc/gimplify.cc                               | 23 ++++++++--------
 gcc/omp-low.cc                                | 10 +++----
 gcc/poly-int.h                                | 19 +++++++++++++
 .../aarch64/sve/acle/general/pr101018-1.c     | 27 +++++++++++++++++++
 .../aarch64/sve/acle/general/pr101018-2.c     | 23 ++++++++++++++++
 7 files changed, 94 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr101018-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr101018-2.c

Comments

Jakub Jelinek Feb. 13, 2023, 12:10 p.m. UTC | #1
On Mon, Feb 13, 2023 at 10:45:05AM +0000, Richard Sandiford wrote:
> Ping^3 [https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606741.html]
> 
> ----
> 
> Various parts of the omp code checked whether the size of a decl
> was an INTEGER_CST in order to determine whether the decl was
> variable-sized or not.  If it was variable-sized, it was expected
> to have a DECL_VALUE_EXPR replacement, as for VLAs.
> 
> This patch uses poly_int_tree_p instead, so that variable-length
> SVE vectors are treated like constant-length vectors.  This means
> that some structures become poly_int-sized, with some fields at
> poly_int offsets, but we already have code to handle that.
> 
> An alternative would have been to handle the data via indirection
> instead.  However, that's likely to be more complicated, and it
> would contradict is_variable_sized, which already uses a check
> for TREE_CONSTANT rather than INTEGER_CST.
> 
> gimple_add_tmp_var should probably not add a safelen of 1
> for SVE vectors, but that's really a separate thing and might
> be hard to test.

Generally, OpenMP has lots of clauses on lots of different constructs
and for SVE we need to decide what to do with them, and it would be better
to cover them all in testsuite coverage, so something orders of magnitude
larger than this patch provides and then there is OpenACC too.
Can one add these non-constant poly_int sized types as members of
aggregates?  If yes, they need to be tested in addition to the plain
vectors.

From data sharing clauses and others:
1) shared (implicit or explicit) - I'd say the non-constant poly_int sized
   stuff should be shared by passing around an address, rather than by
   copying it around by value which can be large (so say similar to
   aggregates rather than scalars), though feel free to argue otherwise
2) for the offloading stuff, I'd say we want to error or sorry at
   gimplification time, both for explicit or implicit map clause on
   target/target data/target {enter,exit} data and on explicit/implicit
   private and firstprivate clauses on target; while it could work fine
   with host fallback, generally the intention is to offload to a different
   device and neither PTX nor AMDGCN have anything similar to SVE and even
   for say potential ssh based offloading to aarch64 there is the
   possibility that the two devices don't agree on the vector sizes
3) private clause just creates another private variable of the same type,
   except for target I think it should generally work fine, but should be
   nevertheless test covered, say on parallel, task and some worksharing
   construct (e.g. for) and simd and also with allocate clause specifying
   allocators
4) firstprivate clause is similar to private, but we need to copy the
   initial value to it; e.g. in case of parallel, host teams or task
   such values are copied through compiler generated artificial struct
   that contains all the stuff needed to be propagated around (and except
   for task/taskloop then propagated back).  For the SVE stuff I think it
   might be nice to put the non-constant sized stuff late in the artificial
   structure so that one can access the constant sized stuff using constant
   offsets
5) lastprivate similar to private with copying value back from one
   particular thread/lane (e.g. should be tested also on simd)
6) reduction/in_reduction/task_reduction - reductions are partly
   privatization clauses, for SVE only user defined reductions apply
   (declare reduction), but need to be tested in lots of constructs,
   parallel, for, simd, taskloop, task reductions and inscan reductions
   (explicit/implicit)
7) copyin - can the SVE vars be threadprivate (thread_local in C++ etc.)?
   If not, at least needs testing
8) linear clause hopefully will reject SVE stuff, but needs testing
9) affinity clause - we just parse/ignore, but still needs testing
10) aligned clause on simd - again, needs testing
11) allocate clause - as I said, for most of the data sharing clauses
    coverage needs to be without and with corresponding allocate clauses
12) depend clause - this one again needs testing, it just under the hood
    takes address of the passed in variable, so probably should just work
    out of the box
13) nontemporal clause on simd - probably works fine, but needs testing
14) uniform clause - this is on declare simd, one needs to test declare
    simd with the various cases (vector without clause, uniform, linear)
15) enter/link clauses - as I said in 2), these IMHO should be rejected
16) detach clause - the clause requires specific type, so I think should be
    ok
17) use_device_ptr/use_device_addr/is_device_ptr - offloading stuff, so
    like 2)
18) inclusive/exclusive clauses - see 6) above for inscan reductions
19) map/has_device_addr - see 2)
20) doacross - requires specific format with scalars, so just check it is
    properly rejected
Rest of clauses don't have arguments or have integral/logical expression
operands, so those should be ok.

Now, if the above is too much work for GCC 13 (likely it is), I think
it would be best to just make sure explicit/implicit shared clause works
fine (i.e. 1) ), perhaps a few other easy ones like 12) and simply sorry
on everything else for now and then incrementally handle it up later.

	Jakub
diff mbox series

Patch

diff --git a/gcc/expr.cc b/gcc/expr.cc
index d9407432ea5..a304c583d16 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -7941,10 +7941,10 @@  get_inner_reference (tree exp, poly_int64_pod *pbitsize,
 
   if (size_tree != 0)
     {
-      if (! tree_fits_uhwi_p (size_tree))
+      if (! tree_fits_poly_uint64_p (size_tree))
 	mode = BLKmode, *pbitsize = -1;
       else
-	*pbitsize = tree_to_uhwi (size_tree);
+	*pbitsize = tree_to_poly_uint64 (size_tree);
     }
 
   *preversep = reverse_storage_order_for_component_p (exp);
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index b89cac91cae..000600017e2 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -1183,6 +1183,13 @@  poly_int_binop (poly_wide_int &res, enum tree_code code,
 	return false;
       break;
 
+    case BIT_AND_EXPR:
+      if (TREE_CODE (arg2) != INTEGER_CST
+	  || !can_and_p (wi::to_poly_wide (arg1), wi::to_wide (arg2),
+			 &res))
+	return false;
+      break;
+
     case BIT_IOR_EXPR:
       if (TREE_CODE (arg2) != INTEGER_CST
 	  || !can_ior_p (wi::to_poly_wide (arg1), wi::to_wide (arg2),
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index f06ce3cc77a..096738c8ed4 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -7352,7 +7352,7 @@  omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
   /* When adding a variable-sized variable, we have to handle all sorts
      of additional bits of data: the pointer replacement variable, and
      the parameters of the type.  */
-  if (DECL_SIZE (decl) && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
+  if (DECL_SIZE (decl) && !poly_int_tree_p (DECL_SIZE (decl)))
     {
       /* Add the pointer replacement variable as PRIVATE if the variable
 	 replacement is private, else FIRSTPRIVATE since we'll need the
@@ -8002,7 +8002,8 @@  omp_notice_variable (struct gimplify_omp_ctx *ctx, tree decl, bool in_code)
       && (flags & (GOVD_SEEN | GOVD_LOCAL)) == GOVD_SEEN
       && DECL_SIZE (decl))
     {
-      if (TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
+      tree size;
+      if (!poly_int_tree_p (DECL_SIZE (decl)))
 	{
 	  splay_tree_node n2;
 	  tree t = DECL_VALUE_EXPR (decl);
@@ -8013,16 +8014,14 @@  omp_notice_variable (struct gimplify_omp_ctx *ctx, tree decl, bool in_code)
 	  n2->value |= GOVD_SEEN;
 	}
       else if (omp_privatize_by_reference (decl)
-	       && TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl)))
-	       && (TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl))))
-		   != INTEGER_CST))
+	       && (size = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl))))
+	       && !poly_int_tree_p (size))
 	{
 	  splay_tree_node n2;
-	  tree t = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl)));
-	  gcc_assert (DECL_P (t));
-	  n2 = splay_tree_lookup (ctx->variables, (splay_tree_key) t);
+	  gcc_assert (DECL_P (size));
+	  n2 = splay_tree_lookup (ctx->variables, (splay_tree_key) size);
 	  if (n2)
-	    omp_notice_variable (ctx, t, true);
+	    omp_notice_variable (ctx, size, true);
 	}
     }
 
@@ -12417,7 +12416,7 @@  gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data)
       if ((gimplify_omp_ctxp->region_type & ORT_ACC) == 0)
 	OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P (clause) = 1;
       if (DECL_SIZE (decl)
-	  && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
+	  && !poly_int_tree_p (DECL_SIZE (decl)))
 	{
 	  tree decl2 = DECL_VALUE_EXPR (decl);
 	  gcc_assert (TREE_CODE (decl2) == INDIRECT_REF);
@@ -12826,7 +12825,7 @@  gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 		}
 	    }
 	  else if (DECL_SIZE (decl)
-		   && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST
+		   && !poly_int_tree_p (DECL_SIZE (decl))
 		   && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_POINTER
 		   && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FIRSTPRIVATE_POINTER
 		   && (OMP_CLAUSE_MAP_KIND (c)
@@ -12886,7 +12885,7 @@  gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 	  if (!DECL_P (decl))
 	    break;
 	  if (DECL_SIZE (decl)
-	      && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
+	      && !poly_int_tree_p (DECL_SIZE (decl)))
 	    {
 	      tree decl2 = DECL_VALUE_EXPR (decl);
 	      gcc_assert (TREE_CODE (decl2) == INDIRECT_REF);
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 82a93d00f67..6203fab5096 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -1454,7 +1454,7 @@  scan_sharing_clauses (tree clauses, omp_context *ctx)
 	  else
 	    install_var_field (decl, false, 11, ctx);
 	  if (DECL_SIZE (decl)
-	      && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
+	      && !poly_int_tree_p (DECL_SIZE (decl)))
 	    {
 	      tree decl2 = DECL_VALUE_EXPR (decl);
 	      gcc_assert (TREE_CODE (decl2) == INDIRECT_REF);
@@ -1657,7 +1657,7 @@  scan_sharing_clauses (tree clauses, omp_context *ctx)
 	  if (DECL_P (decl))
 	    {
 	      if (DECL_SIZE (decl)
-		  && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
+		  && !poly_int_tree_p (DECL_SIZE (decl)))
 		{
 		  tree decl2 = DECL_VALUE_EXPR (decl);
 		  gcc_assert (TREE_CODE (decl2) == INDIRECT_REF);
@@ -1899,7 +1899,7 @@  scan_sharing_clauses (tree clauses, omp_context *ctx)
 		    = remap_type (TREE_TYPE (decl), &ctx->cb);
 		}
 	      else if (DECL_SIZE (decl)
-		       && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
+		       && !poly_int_tree_p (DECL_SIZE (decl)))
 		{
 		  tree decl2 = DECL_VALUE_EXPR (decl);
 		  gcc_assert (TREE_CODE (decl2) == INDIRECT_REF);
@@ -12838,7 +12838,7 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	  }
 
 	if (DECL_SIZE (var)
-	    && TREE_CODE (DECL_SIZE (var)) != INTEGER_CST)
+	    && !poly_int_tree_p (DECL_SIZE (var)))
 	  {
 	    tree var2 = DECL_VALUE_EXPR (var);
 	    gcc_assert (TREE_CODE (var2) == INDIRECT_REF);
@@ -13165,7 +13165,7 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	    else
 	      {
 		if (DECL_SIZE (ovar)
-		    && TREE_CODE (DECL_SIZE (ovar)) != INTEGER_CST)
+		    && !poly_int_tree_p (DECL_SIZE (ovar)))
 		  {
 		    tree ovar2 = DECL_VALUE_EXPR (ovar);
 		    gcc_assert (TREE_CODE (ovar2) == INDIRECT_REF);
diff --git a/gcc/poly-int.h b/gcc/poly-int.h
index d085544a57e..672db698dcc 100644
--- a/gcc/poly-int.h
+++ b/gcc/poly-int.h
@@ -1977,6 +1977,25 @@  known_alignment (const poly_int_pod<N, Ca> &a)
   return r & -r;
 }
 
+/* Return true if we can compute A & B at compile time, storing the
+   result in RES if so.  */
+
+template<unsigned int N, typename Ca, typename Cb, typename Cr>
+inline typename if_nonpoly<Cb, bool>::type
+can_and_p (const poly_int_pod<N, Ca> &a, Cb b, Cr *result)
+{
+  /* Coefficients 1 and above must be a multiple of something greater
+     than ~B.  */
+  typedef POLY_INT_TYPE (Ca) int_type;
+  if (N >= 2)
+    for (unsigned int i = 1; i < N; i++)
+      if ((-(a.coeffs[i] & -a.coeffs[i]) & ~b) != int_type (0))
+	return false;
+  *result = a;
+  result->coeffs[0] &= b;
+  return true;
+}
+
 /* Return true if we can compute A | B at compile time, storing the
    result in RES if so.  */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr101018-1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr101018-1.c
new file mode 100644
index 00000000000..7592ad4c12e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr101018-1.c
@@ -0,0 +1,27 @@ 
+/* { dg-options "-O -fopenmp" } */
+
+#include <stdint.h>
+#include <arm_sve.h>
+
+extern long N;
+extern double *a, *b, *c;
+
+void tuned_STREAM_Triad(double scalar)
+{
+  const uint64_t el = svcntd();
+  const svfloat64_t vscalar = svdup_n_f64(scalar);
+  const int Nadj = N/el;
+  int j;
+
+#pragma omp parallel for
+  for (j = 0; j < Nadj; j ++) {
+	  svfloat64_t va, vb, vc;
+	  vb = svld1_vnum_f64(svptrue_b64(), b, j);
+	  vc = svld1_vnum_f64(svptrue_b64(), c, j);
+	  va = svmla_f64_z(svptrue_b64(), vb , vscalar, vc );
+	  svst1_vnum_f64(svptrue_b64(), a, j+0, va);
+  }
+
+  for (j = Nadj*el ; j < N ; j++)
+	  a[j] = b[j] + scalar * c[j];
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr101018-2.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr101018-2.c
new file mode 100644
index 00000000000..eee9c9f7e2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr101018-2.c
@@ -0,0 +1,23 @@ 
+/* { dg-options "-O -fopenmp" } */
+
+#include <arm_sve.h>
+
+void ext(void *);
+svfloat32_t ext2();
+
+void
+foo (float32_t *ptr)
+{
+  svfloat32_t vec;
+  /* These directives are mostly nonsense, but they shouldn't ICE.  */
+  #pragma omp target data use_device_addr(vec)
+  ext(&vec);
+  #pragma omp target map(to:vec)
+  ext(&vec);
+  #pragma omp target defaultmap(none) firstprivate(vec)
+  ext(&vec);
+  #pragma omp target
+  ext(&vec);
+  #pragma omp target update to(vec)
+  vec = ext2();
+}