diff mbox series

[v2] Enhance if-conversion for automatic arrays

Message ID 20240617140501.32CF5139AB@imap1.dmz-prg2.suse.org
State New
Headers show
Series [v2] Enhance if-conversion for automatic arrays | expand

Commit Message

Richard Biener June 17, 2024, 2:05 p.m. UTC
Automatic arrays that are not address-taken should not be subject to
store data races.  This applies to OMP SIMD in-branch lowered
functions result array which for the testcase otherwise prevents
vectorization with SSE and for AVX and AVX512 ends up with spurious
.MASK_STORE to the stack surviving.

This inefficiency was noted in PR111793.

I've introduced ref_can_have_store_data_races, commonizing uses
of flag_store_data_races in if-conversion, cselim and store motion.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

	PR tree-optimization/111793
	* tree-ssa-alias.h (ref_can_have_store_data_races): Declare.
	* tree-ssa-alias.cc (ref_can_have_store_data_races): New
	function.
	* tree-if-conv.cc (ifcvt_memrefs_wont_trap): Use
	ref_can_have_store_data_races to allow more unconditional
	stores.
	* tree-ssa-loop-im.cc (execute_sm): Likewise.
	* tree-ssa-phiopt.cc (cond_store_replacement): Likewise.

	* gcc.dg/vect/vect-simd-clone-21.c: New testcase.
---
 .../gcc.dg/vect/vect-simd-clone-21.c          | 16 ++++++++++++++++
 gcc/tree-if-conv.cc                           | 11 +++++------
 gcc/tree-ssa-alias.cc                         | 19 +++++++++++++++++++
 gcc/tree-ssa-alias.h                          |  2 ++
 gcc/tree-ssa-loop-im.cc                       |  2 +-
 gcc/tree-ssa-phiopt.cc                        |  4 +---
 6 files changed, 44 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c

Comments

Toon Moene June 19, 2024, 6:25 p.m. UTC | #1
On 6/17/24 16:05, Richard Biener wrote:

> Automatic arrays that are not address-taken should not be subject to
> store data races.  This applies to OMP SIMD in-branch lowered
> functions result array which for the testcase otherwise prevents
> vectorization with SSE and for AVX and AVX512 ends up with spurious
> .MASK_STORE to the stack surviving.

Does this also apply for "automatic arrays" as defined by the Fortran 
Standard (see https://j3-fortran.org/doc/year/23/23-007r1.pdf, page 
104), i.e., outside of the OMP_SIMD construct ?

In gfortran, when using the option -fstack-arrays, they are assigned 
memory space on the stack.

Kind regards,
Richard Biener June 19, 2024, 7:06 p.m. UTC | #2
> Am 19.06.2024 um 20:25 schrieb Toon Moene <toon@moene.org>:
> 
> On 6/17/24 16:05, Richard Biener wrote:
> 
>> Automatic arrays that are not address-taken should not be subject to
>> store data races.  This applies to OMP SIMD in-branch lowered
>> functions result array which for the testcase otherwise prevents
>> vectorization with SSE and for AVX and AVX512 ends up with spurious
>> .MASK_STORE to the stack surviving.
> 
> Does this also apply for "automatic arrays" as defined by the Fortran Standard (see https://j3-fortran.org/doc/year/23/23-007r1.pdf, page 104), i.e., outside of the OMP_SIMD construct ?
> 
> In gfortran, when using the option -fstack-arrays, they are assigned memory space on the stack.

I’d say yes though the likelihood those are address taken and thus not considered is high.  The main target were the arrays created as part of the SIMD lowering.

Richard 

> Kind regards,
> 
> --
> Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
>
Toon Moene June 19, 2024, 7:20 p.m. UTC | #3
On 6/19/24 21:06, Richard Biener wrote:
> 
> 
>> Am 19.06.2024 um 20:25 schrieb Toon Moene <toon@moene.org>:
>>
>> On 6/17/24 16:05, Richard Biener wrote:
>>
>>> Automatic arrays that are not address-taken should not be subject to
>>> store data races.  This applies to OMP SIMD in-branch lowered
>>> functions result array which for the testcase otherwise prevents
>>> vectorization with SSE and for AVX and AVX512 ends up with spurious
>>> .MASK_STORE to the stack surviving.
>>
>> Does this also apply for "automatic arrays" as defined by the Fortran Standard (see https://j3-fortran.org/doc/year/23/23-007r1.pdf, page 104), i.e., outside of the OMP_SIMD construct ?
>>
>> In gfortran, when using the option -fstack-arrays, they are assigned memory space on the stack.
> 
> I’d say yes though the likelihood those are address taken and thus not considered is high.  The main target were the arrays created as part of the SIMD lowering.

Isn't there a "not" missing before "high" ?

So it mostly helps after the call to subroutine 'sub' in the following:

SUBROUTINE AAP(A, B, N)
INTEGER N
REAL A(N), B(N), R(N)
CALL SUB(R, N) ! Address of R passed to SUB
R = ABS(A)
B = R
B = SQRT(A)
END

?
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c
new file mode 100644
index 00000000000..49c52fb59bd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+
+#pragma omp declare simd simdlen(4) inbranch
+__attribute__((noinline)) int
+foo (int a, int b)
+{
+  return a + b;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" { target i?86-*-* x86_64-*-* } } } */
+/* if-conversion shouldn't need to resort to masked stores for the result
+   array created by OMP lowering since that's automatic and does not have
+   its address taken.  */
+/* { dg-final { scan-tree-dump-not "MASK_STORE" "vect" } } */
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index c4c3ed41a44..57992b6deca 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -936,12 +936,11 @@  ifcvt_memrefs_wont_trap (gimple *stmt, vec<data_reference_p> drs)
 
       /* an unconditionaly write won't trap if the base is written
          to unconditionally.  */
-      if (base_master_dr
-	  && DR_BASE_W_UNCONDITIONALLY (*base_master_dr))
-	return flag_store_data_races;
-      /* or the base is known to be not readonly.  */
-      else if (base_object_writable (DR_REF (a)))
-	return flag_store_data_races;
+      if ((base_master_dr
+	   && DR_BASE_W_UNCONDITIONALLY (*base_master_dr))
+	  /* or the base is known to be not readonly.  */
+	  || base_object_writable (DR_REF (a)))
+	return !ref_can_have_store_data_races (base);
     }
 
   return false;
diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index 1a91d63a31e..fab048b0b59 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -3704,6 +3704,25 @@  stmt_kills_ref_p (gimple *stmt, tree ref)
   return stmt_kills_ref_p (stmt, &r);
 }
 
+/* Return whether REF can be subject to store data races.  */
+
+bool
+ref_can_have_store_data_races (tree ref)
+{
+  /* With -fallow-store-data-races do not care about them.  */
+  if (flag_store_data_races)
+    return false;
+
+  tree base = get_base_address (ref);
+  if (auto_var_p (base)
+      && ! may_be_aliased (base))
+    /* Automatic variables not aliased are not subject to
+       data races.  */
+    return false;
+
+  return true;
+}
+
 
 /* Walk the virtual use-def chain of VUSE until hitting the virtual operand
    TARGET or a statement clobbering the memory reference REF in which
diff --git a/gcc/tree-ssa-alias.h b/gcc/tree-ssa-alias.h
index 5cd64e72295..5834533ae9c 100644
--- a/gcc/tree-ssa-alias.h
+++ b/gcc/tree-ssa-alias.h
@@ -144,6 +144,8 @@  extern bool call_may_clobber_ref_p (gcall *, tree, bool = true);
 extern bool call_may_clobber_ref_p_1 (gcall *, ao_ref *, bool = true);
 extern bool stmt_kills_ref_p (gimple *, tree);
 extern bool stmt_kills_ref_p (gimple *, ao_ref *);
+extern bool ref_can_have_store_data_races (tree);
+
 enum translate_flags
   { TR_TRANSLATE, TR_VALUEIZE_AND_DISAMBIGUATE, TR_DISAMBIGUATE };
 extern tree get_continuation_for_phi (gimple *, ao_ref *, bool,
diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index f3fda2bd7ce..3acbd886a0d 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -2298,7 +2298,7 @@  execute_sm (class loop *loop, im_mem_ref *ref,
   bool always_stored = ref_always_accessed_p (loop, ref, true);
   if (maybe_mt
       && (bb_in_transaction (loop_preheader_edge (loop)->src)
-	  || (! flag_store_data_races && ! always_stored)))
+	  || (ref_can_have_store_data_races (ref->mem.ref) && ! always_stored)))
     multi_threaded_model_p = true;
 
   if (multi_threaded_model_p && !use_other_flag_var)
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 0ef42a1031a..f05ca727503 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -3343,9 +3343,7 @@  cond_store_replacement (basic_block middle_bb, basic_block join_bb,
       /* If LHS is an access to a local variable without address-taken
 	 (or when we allow data races) and known not to trap, we could
 	 always safely move down the store.  */
-      tree base = get_base_address (lhs);
-      if (!auto_var_p (base)
-	  || (TREE_ADDRESSABLE (base) && !flag_store_data_races)
+      if (ref_can_have_store_data_races (lhs)
 	  || tree_could_trap_p (lhs))
 	return false;
     }