diff mbox series

[RFC] openmp: don't add aritificial const decl to offload table (PRs 94848 + 95551)

Message ID 59015878-3460-d570-1cb5-debab51a5089@codesourcery.com
State New
Headers show
Series [RFC] openmp: don't add aritificial const decl to offload table (PRs 94848 + 95551) | expand

Commit Message

Tobias Burnus June 8, 2020, 10:23 a.m. UTC
In the latest PR95551, the issue came up with a Fortran array
constructor ("if ((any (array /= [(-i, i=1, 10)])") used
in a target section within a host procedure. The constructor
is converted into a static local variable
   A.10 = [-1, -2, ...]
ends up in omp-offload.c's offload_vars.

With -O3 optimization, the variable is optimized away on the
host side but it is still in the .gnu.offload_vars (→ host_table,
target_data), causing link errors.

  * * *

I think there is a wider issue (→PR 95551 and others) regarding the
consistency between host and target variables and optimizations, but
I wonder why such a variables has to appear at all in the offload_vars
table.

Hence, I am thinking of the attached patch which does not add TREE_READONLY
variables to .gnu.offload_vars; I additionally require TREE_ARTIFICIAL as I
am thinking of the case such as Fortran parameter to which one an create a
pointer to – which might require the mapping information (is_device_{addr,ptr}).
[The question is whether more needs to be excluded, e.g. to exclude
virtual tables from the exlusion? For instance by requiring
   TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL ?]

For the test case, with -O3, the variable (A.10.3) is optimized away and
seemingly also not streamed out as it also does not appear with -foffload="-O0".

Thoughts?

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

Comments

Tobias Burnus June 8, 2020, 10:44 a.m. UTC | #1
As side-remark or follow up: I have also experimented
with the attached patch.

On the host side, the omp_finish_file call in toplev.c comes
late enough that the the variables is gone and one no longer
writes it to the var table.

However, the write_lto() → output_offload_tables() call is
that early that both the offload table and the variable is
still written. – Hence, this patch fails at run time as
the two tables host_table & target_data have a different size.

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Jakub Jelinek June 8, 2020, 11:11 a.m. UTC | #2
On Mon, Jun 08, 2020 at 12:44:31PM +0200, Tobias Burnus wrote:
> As side-remark or follow up: I have also experimented
> with the attached patch.
> 
> On the host side, the omp_finish_file call in toplev.c comes
> late enough that the the variables is gone and one no longer
> writes it to the var table.
> 
> However, the write_lto() → output_offload_tables() call is
> that early that both the offload table and the variable is
> still written. – Hence, this patch fails at run time as
> the two tables host_table & target_data have a different size.

I think this patch is the right thing to do, just needs to be slightly
extended.
If we do the decision at output_offload_tables, then for the
vars we choose to keep in the tables, we should set
node->force_output, so that from that point on we don't try to optimize it
away.  Similarly with functions.

	Jakub
Tobias Burnus June 8, 2020, 3:20 p.m. UTC | #3
Hi Jakub,

how about the following patch, which is kind of a combination of the
two? Namely, avoiding of the output of artificial,read-only nonglobal
variables – and marking all remaining variables and all functions with
node->force_output. As the LTO writing happens earlier, I only do it
there. TobiasOn 6/8/20 1:11 PM, Jakub Jelinek wrote:

> On Mon, Jun 08, 2020 at 12:44:31PM +0200, Tobias Burnus wrote:
>> As side-remark or follow up: I have also experimented
>> with the attached patch.
>>
>> On the host side, the omp_finish_file call in toplev.c comes
>> late enough that the the variables is gone and one no longer
>> writes it to the var table.
>>
>> However, the write_lto() → output_offload_tables() call is
>> that early that both the offload table and the variable is
>> still written. – Hence, this patch fails at run time as
>> the two tables host_table & target_data have a different size.
> I think this patch is the right thing to do, just needs to be slightly
> extended.
> If we do the decision at output_offload_tables, then for the
> vars we choose to keep in the tables, we should set
> node->force_output, so that from that point on we don't try to optimize it
> away.  Similarly with functions.
>
>       Jakub
>
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Jakub Jelinek June 8, 2020, 3:30 p.m. UTC | #4
On Mon, Jun 08, 2020 at 05:20:16PM +0200, Tobias Burnus wrote:
> how about the following patch, which is kind of a combination of the
> two? Namely, avoiding of the output of artificial,read-only nonglobal
> variables – and marking all remaining variables and all functions with
> node->force_output. As the LTO writing happens earlier, I only do it
> there. TobiasOn 6/8/20 1:11 PM, Jakub Jelinek wrote:

I really don't see what is special exactly on TREE_READONLY DECL_ARTIFICIAL
function-scope vars and why they should be treated that way, that is not
really some property that should imply special behavior.
There are dozens of reasons why a variable can be DECL_ARTIFICIAL and dozens
of reasons why it can be TREE_READONLY, and FUNCTION_DECL context can have
both automatic variables and static variables etc.

	Jakub
diff mbox series

Patch

openmp: don't add aritificial const decl to offload table (PRs 94848 + 95551)

gcc/ChangeLog:

	PR lto/94848
	PR middle-end/95551
	* lto-cgraph.c (output_offload_tables): Skip readonly
	aritificial variables.
	* omp-offload.c (add_decls_addresses_to_decl_constructor,
	omp_finish_file): Likewise

libgomp/ChangeLog:

	PR lto/94848
	PR middle-end/95551
	* testsuite/libgomp.fortran/target-var.f90: New test.

 gcc/lto-cgraph.c                                 |  4 +++
 gcc/omp-offload.c                                |  7 +++++-
 libgomp/testsuite/libgomp.fortran/target-var.f90 | 32 ++++++++++++++++++++++++
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index a671c671fa7..747d44c9f84 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -1077,6 +1077,10 @@  output_offload_tables (void)
 
   for (unsigned i = 0; i < vec_safe_length (offload_vars); i++)
     {
+      if (TREE_READONLY ((*offload_vars)[i])
+	  && DECL_ARTIFICIAL ((*offload_vars)[i]))
+	continue;
+
       streamer_write_enum (ob->main_stream, LTO_symtab_tags,
 			   LTO_symtab_last_tag, LTO_symtab_variable);
       lto_output_var_decl_ref (ob->decl_state, ob->main_stream,
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index b2df91a5724..2f6c55d8667 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -125,6 +125,9 @@  add_decls_addresses_to_decl_constructor (vec<tree, va_gc> *v_decls,
 #endif
 	  && lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (it));
 
+      if (is_var && TREE_READONLY (it) && DECL_ARTIFICIAL (it))
+	continue;
+
       tree size = NULL_TREE;
       if (is_var)
 	size = fold_convert (const_ptr_type_node, DECL_SIZE_UNIT (it));
@@ -341,7 +344,7 @@  omp_finish_file (void)
       add_decls_addresses_to_decl_constructor (offload_vars, v_v);
 
       tree vars_decl_type = build_array_type_nelts (pointer_sized_int_node,
-						    num_vars * 2);
+						    vec_safe_length (v_v));
       tree funcs_decl_type = build_array_type_nelts (pointer_sized_int_node,
 						     num_funcs);
       SET_TYPE_ALIGN (vars_decl_type, TYPE_ALIGN (pointer_sized_int_node));
@@ -381,6 +384,8 @@  omp_finish_file (void)
       for (unsigned i = 0; i < num_vars; i++)
 	{
 	  tree it = (*offload_vars)[i];
+	  if (TREE_READONLY (it) && DECL_ARTIFICIAL (it))
+	    continue;
 #ifdef ACCEL_COMPILER
 	  if (DECL_HAS_VALUE_EXPR_P (it)
 	      && lookup_attribute ("omp declare target link",
diff --git a/libgomp/testsuite/libgomp.fortran/target-var.f90 b/libgomp/testsuite/libgomp.fortran/target-var.f90
new file mode 100644
index 00000000000..5e5ccd47c96
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/target-var.f90
@@ -0,0 +1,32 @@ 
+! { dg-additional-options "-O3" }
+!
+! With -O3 the static local variable A.10 generated for
+! the array constructor [-2, -4, ..., -20] is optimized
+! away - which has to be handled in the offload_vars table.
+!
+program main
+  implicit none (type, external)
+  integer :: j
+  integer, allocatable :: A(:)
+
+  A = [(3*j, j=1, 10)]
+  call bar (A)
+  deallocate (A)
+contains
+  subroutine bar (array)
+    integer :: i
+    integer :: array(:)
+
+    !$omp target map(from:array)
+    !$acc parallel copyout(array)
+    array = [(-2*i, i = 1, size(array))]
+    !$omp do private(array)
+    !$acc loop gang private(array)
+    do i = 1, 10
+      array(i) = 9*i
+    end do
+    if (any (array /= [(-2*i, i = 1, 10)])) error stop 2
+    !$omp end target
+    !$acc end parallel
+  end subroutine bar
+end