diff mbox series

[optc-save-gen.awk] Fix streaming of command line options for offloading

Message ID DS0PR12MB9037628E6E95AEEA9313F62FCE862@DS0PR12MB9037.namprd12.prod.outlook.com
State New
Headers show
Series [optc-save-gen.awk] Fix streaming of command line options for offloading | expand

Commit Message

Prathamesh Kulkarni Aug. 13, 2024, 5:36 a.m. UTC
Hi,
As mentioned in:
https://gcc.gnu.org/pipermail/gcc/2024-August/244581.html

AArch64 cl_optimization_stream_out streams out target-specific optimization options like flag_aarch64_early_ldp_fusion, aarch64_early_ra etc, which breaks AArch64/nvptx offloading,
since nvptx cl_optimization_stream_in doesn't have corresponding stream-in for these options and ends up setting invalid values for ptr->explicit_mask (and subsequent data structures).

This makes even a trivial test like the following to cause ICE in lto_read_decls with -O3 -fopenmp -foffload=nvptx-none:

int main()
{
  int x;
  #pragma omp target map(x)
    x;
}

The attached patch modifies optc-save-gen.awk to generate if (!lto_stream_offload_p) check before streaming out target-specific opt in cl_optimization_stream_out, which
fixes the issue. cl_optimization_stream_out after patch (last few entries):

  bp_pack_var_len_int (bp, ptr->x_flag_wrapv_pointer);
  bp_pack_var_len_int (bp, ptr->x_debug_nonbind_markers_p);
  if (!lto_stream_offload_p)
  bp_pack_var_len_int (bp, ptr->x_flag_aarch64_early_ldp_fusion);
  if (!lto_stream_offload_p)
  bp_pack_var_len_int (bp, ptr->x_aarch64_early_ra);
  if (!lto_stream_offload_p)
  bp_pack_var_len_int (bp, ptr->x_flag_aarch64_late_ldp_fusion);
  if (!lto_stream_offload_p)
  bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_div);
  if (!lto_stream_offload_p)
  bp_pack_var_len_int (bp, ptr->x_flag_mrecip_low_precision_sqrt);
  if (!lto_stream_offload_p)
  bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_sqrt);
  for (size_t i = 0; i < ARRAY_SIZE (ptr->explicit_mask); i++)
    bp_pack_value (bp, ptr->explicit_mask[i], 64);

For target-specific options, streaming out is gated on !lto_stream_offload_p check.

The patch also fixes failures due to same issue with x86_64->nvptx offloading for target-print-1.f90 (and couple more).
Does the patch look OK ?

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

Thanks,
Prathamesh
[optc-save-gen.awk] Fix streaming of command line options for offloading.

The patch modifies optc-save-gen.awk to generate if (!lto_stream_offload_p)
check before streaming out target-specific opt in cl_optimization_stream_out,
when offloading is enabled.

gcc/ChangeLog:
	* gcc/optc-save-gen.awk: New array var_target_opt. Use it to generate
	if (!lto_stream_offload_p) check in cl_optimization_stream_out.

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

Comments

Andrew Pinski Aug. 13, 2024, 6:36 a.m. UTC | #1
On Mon, Aug 12, 2024 at 10:36 PM Prathamesh Kulkarni
<prathameshk@nvidia.com> wrote:
>
> Hi,
> As mentioned in:
> https://gcc.gnu.org/pipermail/gcc/2024-August/244581.html
>
> AArch64 cl_optimization_stream_out streams out target-specific optimization options like flag_aarch64_early_ldp_fusion, aarch64_early_ra etc, which breaks AArch64/nvptx offloading,
> since nvptx cl_optimization_stream_in doesn't have corresponding stream-in for these options and ends up setting invalid values for ptr->explicit_mask (and subsequent data structures).
>
> This makes even a trivial test like the following to cause ICE in lto_read_decls with -O3 -fopenmp -foffload=nvptx-none:
>
> int main()
> {
>   int x;
>   #pragma omp target map(x)
>     x;
> }
>
> The attached patch modifies optc-save-gen.awk to generate if (!lto_stream_offload_p) check before streaming out target-specific opt in cl_optimization_stream_out, which
> fixes the issue. cl_optimization_stream_out after patch (last few entries):
>
>   bp_pack_var_len_int (bp, ptr->x_flag_wrapv_pointer);
>   bp_pack_var_len_int (bp, ptr->x_debug_nonbind_markers_p);
>   if (!lto_stream_offload_p)
>   bp_pack_var_len_int (bp, ptr->x_flag_aarch64_early_ldp_fusion);
>   if (!lto_stream_offload_p)
>   bp_pack_var_len_int (bp, ptr->x_aarch64_early_ra);
>   if (!lto_stream_offload_p)
>   bp_pack_var_len_int (bp, ptr->x_flag_aarch64_late_ldp_fusion);
>   if (!lto_stream_offload_p)
>   bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_div);
>   if (!lto_stream_offload_p)
>   bp_pack_var_len_int (bp, ptr->x_flag_mrecip_low_precision_sqrt);
>   if (!lto_stream_offload_p)
>   bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_sqrt);
>   for (size_t i = 0; i < ARRAY_SIZE (ptr->explicit_mask); i++)
>     bp_pack_value (bp, ptr->explicit_mask[i], 64);
>
> For target-specific options, streaming out is gated on !lto_stream_offload_p check.
>
> The patch also fixes failures due to same issue with x86_64->nvptx offloading for target-print-1.f90 (and couple more).
> Does the patch look OK ?

I think it seems to be on the right track. One thing that is also
going to be an issue is streaming in, there could be a target option
on the offload side that is marked as Optimization that would might
also cause issues. We should check to make sure that also gets fixed
here too. Or error out for offloading targets can't have target
options with Optimization on them during the build.

Thanks,
Andrew Pinski

>
> Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
>
> Thanks,
> Prathamesh
Richard Biener Aug. 13, 2024, 7:21 a.m. UTC | #2
> Am 13.08.2024 um 08:37 schrieb Andrew Pinski <pinskia@gmail.com>:
> 
> On Mon, Aug 12, 2024 at 10:36 PM Prathamesh Kulkarni
> <prathameshk@nvidia.com> wrote:
>> 
>> Hi,
>> As mentioned in:
>> https://gcc.gnu.org/pipermail/gcc/2024-August/244581.html
>> 
>> AArch64 cl_optimization_stream_out streams out target-specific optimization options like flag_aarch64_early_ldp_fusion, aarch64_early_ra etc, which breaks AArch64/nvptx offloading,
>> since nvptx cl_optimization_stream_in doesn't have corresponding stream-in for these options and ends up setting invalid values for ptr->explicit_mask (and subsequent data structures).
>> 
>> This makes even a trivial test like the following to cause ICE in lto_read_decls with -O3 -fopenmp -foffload=nvptx-none:
>> 
>> int main()
>> {
>>  int x;
>>  #pragma omp target map(x)
>>    x;
>> }
>> 
>> The attached patch modifies optc-save-gen.awk to generate if (!lto_stream_offload_p) check before streaming out target-specific opt in cl_optimization_stream_out, which
>> fixes the issue. cl_optimization_stream_out after patch (last few entries):
>> 
>>  bp_pack_var_len_int (bp, ptr->x_flag_wrapv_pointer);
>>  bp_pack_var_len_int (bp, ptr->x_debug_nonbind_markers_p);
>>  if (!lto_stream_offload_p)
>>  bp_pack_var_len_int (bp, ptr->x_flag_aarch64_early_ldp_fusion);
>>  if (!lto_stream_offload_p)
>>  bp_pack_var_len_int (bp, ptr->x_aarch64_early_ra);
>>  if (!lto_stream_offload_p)
>>  bp_pack_var_len_int (bp, ptr->x_flag_aarch64_late_ldp_fusion);
>>  if (!lto_stream_offload_p)
>>  bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_div);
>>  if (!lto_stream_offload_p)
>>  bp_pack_var_len_int (bp, ptr->x_flag_mrecip_low_precision_sqrt);
>>  if (!lto_stream_offload_p)
>>  bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_sqrt);
>>  for (size_t i = 0; i < ARRAY_SIZE (ptr->explicit_mask); i++)
>>    bp_pack_value (bp, ptr->explicit_mask[i], 64);
>> 
>> For target-specific options, streaming out is gated on !lto_stream_offload_p check.
>> 
>> The patch also fixes failures due to same issue with x86_64->nvptx offloading for target-print-1.f90 (and couple more).
>> Does the patch look OK ?
> 
> I think it seems to be on the right track. One thing that is also
> going to be an issue is streaming in, there could be a target option
> on the offload side that is marked as Optimization that would might
> also cause issues. We should check to make sure that also gets fixed
> here too. Or error out for offloading targets can't have target
> options with Optimization on them during the build.

It may have been misguided to mark target specific flags as Optimization.  It might be required to merge those (from all targets) into the common optimize enum, like we do for tree codes.  Language specific options marked as Optimization possibly have the same issue when mixing with other languages and LTO.  Can you assess the situation a bit more?

I think the proposed fix looks reasonable but the problem might be more widespread and warrant a more global solution or at least revisiting documentation?

Thanks,
Richard 

> Thanks,
> Andrew Pinski
> 
>> 
>> Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
>> 
>> Thanks,
>> Prathamesh
Prathamesh Kulkarni Aug. 19, 2024, 8:09 a.m. UTC | #3
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Tuesday, August 13, 2024 12:52 PM
> To: Andrew Pinski <pinskia@gmail.com>
> Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>; gcc-
> patches@gcc.gnu.org; Thomas Schwinge <tschwinge@baylibre.com>
> Subject: Re: [optc-save-gen.awk] Fix streaming of command line options
> for offloading
> 
> External email: Use caution opening links or attachments
> 
> 
> > Am 13.08.2024 um 08:37 schrieb Andrew Pinski <pinskia@gmail.com>:
> >
> > On Mon, Aug 12, 2024 at 10:36 PM Prathamesh Kulkarni
> > <prathameshk@nvidia.com> wrote:
> >>
> >> Hi,
> >> As mentioned in:
> >> https://gcc.gnu.org/pipermail/gcc/2024-August/244581.html
> >>
> >> AArch64 cl_optimization_stream_out streams out target-specific
> >> optimization options like flag_aarch64_early_ldp_fusion,
> aarch64_early_ra etc, which breaks AArch64/nvptx offloading, since
> nvptx cl_optimization_stream_in doesn't have corresponding stream-in
> for these options and ends up setting invalid values for ptr-
> >explicit_mask (and subsequent data structures).
> >>
> >> This makes even a trivial test like the following to cause ICE in
> lto_read_decls with -O3 -fopenmp -foffload=nvptx-none:
> >>
> >> int main()
> >> {
> >>  int x;
> >>  #pragma omp target map(x)
> >>    x;
> >> }
> >>
> >> The attached patch modifies optc-save-gen.awk to generate if
> >> (!lto_stream_offload_p) check before streaming out target-specific
> opt in cl_optimization_stream_out, which fixes the issue.
> cl_optimization_stream_out after patch (last few entries):
> >>
> >>  bp_pack_var_len_int (bp, ptr->x_flag_wrapv_pointer);
> >> bp_pack_var_len_int (bp, ptr->x_debug_nonbind_markers_p);  if
> >> (!lto_stream_offload_p)  bp_pack_var_len_int (bp,
> >> ptr->x_flag_aarch64_early_ldp_fusion);
> >>  if (!lto_stream_offload_p)
> >>  bp_pack_var_len_int (bp, ptr->x_aarch64_early_ra);  if
> >> (!lto_stream_offload_p)  bp_pack_var_len_int (bp,
> >> ptr->x_flag_aarch64_late_ldp_fusion);
> >>  if (!lto_stream_offload_p)
> >>  bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_div);  if
> >> (!lto_stream_offload_p)  bp_pack_var_len_int (bp,
> >> ptr->x_flag_mrecip_low_precision_sqrt);
> >>  if (!lto_stream_offload_p)
> >>  bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_sqrt);  for
> >> (size_t i = 0; i < ARRAY_SIZE (ptr->explicit_mask); i++)
> >>    bp_pack_value (bp, ptr->explicit_mask[i], 64);
> >>
> >> For target-specific options, streaming out is gated on
> !lto_stream_offload_p check.
> >>
> >> The patch also fixes failures due to same issue with x86_64->nvptx
> offloading for target-print-1.f90 (and couple more).
> >> Does the patch look OK ?
> >
> > I think it seems to be on the right track. One thing that is also
> > going to be an issue is streaming in, there could be a target option
> > on the offload side that is marked as Optimization that would might
> > also cause issues. We should check to make sure that also gets fixed
> > here too. Or error out for offloading targets can't have target
> > options with Optimization on them during the build.
Thanks for the suggestions. The attached patch modifies optc-save-gen.awk
to emit an error if accel backend marks target specific option with Optimization.
AFAIU, currently neither nvptx nor gcn have target-specific options marked with Optimization,
so this is mostly a safeguard against future additions.

cl_optimization_stream_in after patch for target-specifc optimization options:

#ifdef ACCEL_COMPILER
#error accel compiler cannot define Optimization attribute for target-specific option x_flag_aarch64_early_ldp_fusion
#else
  ptr->x_flag_aarch64_early_ldp_fusion = (signed char ) bp_unpack_var_len_int (bp);
#endif

To test if this works, I added -mfoo to nvptx.opt and marked it with both Target and Optimization,
which resulted in the following build error for nvptx:

options-save.cc:13548:2: error: #error accel compiler cannot define Optimization attribute for target-specifc option x_flag_nvptx_foo
13548 | #error accel compiler cannot define Optimization attribute for target-specific option x_flag_nvptx_foo
      |  ^~~~~
> 
> It may have been misguided to mark target specific flags as
> Optimization.  It might be required to merge those (from all targets)
> into the common optimize enum, like we do for tree codes.  Language
> specific options marked as Optimization possibly have the same issue
> when mixing with other languages and LTO.  Can you assess the
> situation a bit more?
AFAIK, only c-family/c.opt marks few options with Optimization flag. I tried marking
fortran's -ffrontend-optimize with Optimization and verified that Optimization options
are combined for c-family languages and fortran in cl_optimization_stream_{out,in}.

cl_optimization_stream_out shows:
  ...
  bp_pack_var_len_int (bp, ptr->x_flag_frontend_optimize);
  ...
  bp_pack_var_len_int (bp, ptr->x_flag_nothrow_opt);

and likewise has corresponding entries for cl_optimization_stream_in.

So I guess this shouldn't be an issue with lang specific Optimization opts ?

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

Thanks,
Prathamesh
> 
> I think the proposed fix looks reasonable but the problem might be
> more widespread and warrant a more global solution or at least
> revisiting documentation?
> 
> Thanks,
> Richard
> 
> > Thanks,
> > Andrew Pinski
> >
> >>
> >> Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> >>
> >> Thanks,
> >> Prathamesh
[optc-save-gen.awk] Fix streaming of command line options for offloading.

The patch modifies optc-save-gen.awk to generate if (!lto_stream_offload_p)
check before streaming out target-specific opt in cl_optimization_stream_out,
when offloading is enabled.

gcc/ChangeLog:
	* gcc/optc-save-gen.awk: New array var_target_opt. Use it to generate
	if (!lto_stream_offload_p) check in cl_optimization_stream_out,
	and generate a diagnostic with #error if accelerator backend uses
	Optimization for target-specifc options in cl_optimization_stream_in.

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk
index a3af88e3776..b1289c281e7 100644
--- a/gcc/optc-save-gen.awk
+++ b/gcc/optc-save-gen.awk
@@ -1307,6 +1307,11 @@ for (i = 0; i < n_opts; i++) {
 			var_opt_optimize_init[n_opt_val] = init;
 		}
 
+		# Mark options that are annotated with both Optimization and
+		# Target so we can avoid streaming out target-specific opts when
+		# offloading is enabled.
+		if (flag_set_p("Target", flags[i]))
+			var_target_opt[n_opt_val] = 1;
 		n_opt_val++;
 	}
 }
@@ -1384,6 +1389,10 @@ for (i = 0; i < n_opt_val; i++) {
 		} else {
 			sgn = "int";
 		}
+		# Do not stream out target-specific opts if offloading is
+		# enabled.
+		if (var_target_opt[i])
+			print "  if (!lto_stream_offload_p)"
 		# If applicable, encode the streamed value.
 		if (var_opt_optimize_init[i]) {
 			print "  if (" var_opt_optimize_init[i] " > (" var_opt_val_type[i] ") 10)";
@@ -1408,6 +1417,11 @@ print "                           struct cl_optimization *ptr ATTRIBUTE_UNUSED)"
 print "{";
 for (i = 0; i < n_opt_val; i++) {
 	name = var_opt_val[i]
+        if (var_target_opt[i]) {
+		print "#ifdef ACCEL_COMPILER"
+		print "#error accel compiler cannot define Optimization attribute for target-specific option " name;
+		print "#else"
+	}
 	otype = var_opt_val_type[i];
 	if (otype ~ "^const char \\**$") {
 		print "  ptr->" name" = bp_unpack_string (data_in, bp);";
@@ -1427,6 +1441,8 @@ for (i = 0; i < n_opt_val; i++) {
 			print "    ptr->" name" ^= " var_opt_optimize_init[i] ";";
 		}
 	}
+	if (var_target_opt[i])
+		print "#endif"
 }
 print "  for (size_t i = 0; i < ARRAY_SIZE (ptr->explicit_mask); i++)";
 print "    ptr->explicit_mask[i] = bp_unpack_value (bp, 64);";
Richard Biener Aug. 19, 2024, 1:20 p.m. UTC | #4
On Mon, 19 Aug 2024, Prathamesh Kulkarni wrote:

> 
> 
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Tuesday, August 13, 2024 12:52 PM
> > To: Andrew Pinski <pinskia@gmail.com>
> > Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>; gcc-
> > patches@gcc.gnu.org; Thomas Schwinge <tschwinge@baylibre.com>
> > Subject: Re: [optc-save-gen.awk] Fix streaming of command line options
> > for offloading
> > 
> > External email: Use caution opening links or attachments
> > 
> > 
> > > Am 13.08.2024 um 08:37 schrieb Andrew Pinski <pinskia@gmail.com>:
> > >
> > > On Mon, Aug 12, 2024 at 10:36 PM Prathamesh Kulkarni
> > > <prathameshk@nvidia.com> wrote:
> > >>
> > >> Hi,
> > >> As mentioned in:
> > >> https://gcc.gnu.org/pipermail/gcc/2024-August/244581.html
> > >>
> > >> AArch64 cl_optimization_stream_out streams out target-specific
> > >> optimization options like flag_aarch64_early_ldp_fusion,
> > aarch64_early_ra etc, which breaks AArch64/nvptx offloading, since
> > nvptx cl_optimization_stream_in doesn't have corresponding stream-in
> > for these options and ends up setting invalid values for ptr-
> > >explicit_mask (and subsequent data structures).
> > >>
> > >> This makes even a trivial test like the following to cause ICE in
> > lto_read_decls with -O3 -fopenmp -foffload=nvptx-none:
> > >>
> > >> int main()
> > >> {
> > >>  int x;
> > >>  #pragma omp target map(x)
> > >>    x;
> > >> }
> > >>
> > >> The attached patch modifies optc-save-gen.awk to generate if
> > >> (!lto_stream_offload_p) check before streaming out target-specific
> > opt in cl_optimization_stream_out, which fixes the issue.
> > cl_optimization_stream_out after patch (last few entries):
> > >>
> > >>  bp_pack_var_len_int (bp, ptr->x_flag_wrapv_pointer);
> > >> bp_pack_var_len_int (bp, ptr->x_debug_nonbind_markers_p);  if
> > >> (!lto_stream_offload_p)  bp_pack_var_len_int (bp,
> > >> ptr->x_flag_aarch64_early_ldp_fusion);
> > >>  if (!lto_stream_offload_p)
> > >>  bp_pack_var_len_int (bp, ptr->x_aarch64_early_ra);  if
> > >> (!lto_stream_offload_p)  bp_pack_var_len_int (bp,
> > >> ptr->x_flag_aarch64_late_ldp_fusion);
> > >>  if (!lto_stream_offload_p)
> > >>  bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_div);  if
> > >> (!lto_stream_offload_p)  bp_pack_var_len_int (bp,
> > >> ptr->x_flag_mrecip_low_precision_sqrt);
> > >>  if (!lto_stream_offload_p)
> > >>  bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_sqrt);  for
> > >> (size_t i = 0; i < ARRAY_SIZE (ptr->explicit_mask); i++)
> > >>    bp_pack_value (bp, ptr->explicit_mask[i], 64);
> > >>
> > >> For target-specific options, streaming out is gated on
> > !lto_stream_offload_p check.
> > >>
> > >> The patch also fixes failures due to same issue with x86_64->nvptx
> > offloading for target-print-1.f90 (and couple more).
> > >> Does the patch look OK ?
> > >
> > > I think it seems to be on the right track. One thing that is also
> > > going to be an issue is streaming in, there could be a target option
> > > on the offload side that is marked as Optimization that would might
> > > also cause issues. We should check to make sure that also gets fixed
> > > here too. Or error out for offloading targets can't have target
> > > options with Optimization on them during the build.
> Thanks for the suggestions. The attached patch modifies optc-save-gen.awk
> to emit an error if accel backend marks target specific option with Optimization.
> AFAIU, currently neither nvptx nor gcn have target-specific options marked with Optimization,
> so this is mostly a safeguard against future additions.
> 
> cl_optimization_stream_in after patch for target-specifc optimization options:
> 
> #ifdef ACCEL_COMPILER
> #error accel compiler cannot define Optimization attribute for target-specific option x_flag_aarch64_early_ldp_fusion
> #else
>   ptr->x_flag_aarch64_early_ldp_fusion = (signed char ) bp_unpack_var_len_int (bp);
> #endif
> 
> To test if this works, I added -mfoo to nvptx.opt and marked it with both Target and Optimization,
> which resulted in the following build error for nvptx:
> 
> options-save.cc:13548:2: error: #error accel compiler cannot define Optimization attribute for target-specifc option x_flag_nvptx_foo
> 13548 | #error accel compiler cannot define Optimization attribute for target-specific option x_flag_nvptx_foo
>       |  ^~~~~
> > 
> > It may have been misguided to mark target specific flags as
> > Optimization.  It might be required to merge those (from all targets)
> > into the common optimize enum, like we do for tree codes.  Language
> > specific options marked as Optimization possibly have the same issue
> > when mixing with other languages and LTO.  Can you assess the
> > situation a bit more?
> AFAIK, only c-family/c.opt marks few options with Optimization flag. I tried marking
> fortran's -ffrontend-optimize with Optimization and verified that Optimization options
> are combined for c-family languages and fortran in cl_optimization_stream_{out,in}.
> 
> cl_optimization_stream_out shows:
>   ...
>   bp_pack_var_len_int (bp, ptr->x_flag_frontend_optimize);
>   ...
>   bp_pack_var_len_int (bp, ptr->x_flag_nothrow_opt);
> 
> and likewise has corresponding entries for cl_optimization_stream_in.
> 
> So I guess this shouldn't be an issue with lang specific Optimization opts ?

Yes.  So as said if we now support multiple targets in one image
by means of offloading we possibly should see to merge options like
we do for frontends.

That said, I think the patch you posted is OK, the above is just
something that we should remember when more similar cases pop up.

Richard.

> Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> 
> Thanks,
> Prathamesh
> > 
> > I think the proposed fix looks reasonable but the problem might be
> > more widespread and warrant a more global solution or at least
> > revisiting documentation?
> > 
> > Thanks,
> > Richard
> > 
> > > Thanks,
> > > Andrew Pinski
> > >
> > >>
> > >> Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > >>
> > >> Thanks,
> > >> Prathamesh
>
Prathamesh Kulkarni Aug. 20, 2024, 7:35 a.m. UTC | #5
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, August 19, 2024 6:51 PM
> To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> Cc: Andrew Pinski <pinskia@gmail.com>; gcc-patches@gcc.gnu.org; Thomas
> Schwinge <tschwinge@baylibre.com>
> Subject: RE: [optc-save-gen.awk] Fix streaming of command line options
> for offloading
> 
> External email: Use caution opening links or attachments
> 
> 
> On Mon, 19 Aug 2024, Prathamesh Kulkarni wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Tuesday, August 13, 2024 12:52 PM
> > > To: Andrew Pinski <pinskia@gmail.com>
> > > Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>; gcc-
> > > patches@gcc.gnu.org; Thomas Schwinge <tschwinge@baylibre.com>
> > > Subject: Re: [optc-save-gen.awk] Fix streaming of command line
> > > options for offloading
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > > Am 13.08.2024 um 08:37 schrieb Andrew Pinski
> <pinskia@gmail.com>:
> > > >
> > > > On Mon, Aug 12, 2024 at 10:36 PM Prathamesh Kulkarni
> > > > <prathameshk@nvidia.com> wrote:
> > > >>
> > > >> Hi,
> > > >> As mentioned in:
> > > >> https://gcc.gnu.org/pipermail/gcc/2024-August/244581.html
> > > >>
> > > >> AArch64 cl_optimization_stream_out streams out target-specific
> > > >> optimization options like flag_aarch64_early_ldp_fusion,
> > > aarch64_early_ra etc, which breaks AArch64/nvptx offloading, since
> > > nvptx cl_optimization_stream_in doesn't have corresponding stream-
> in
> > > for these options and ends up setting invalid values for ptr-
> > > >explicit_mask (and subsequent data structures).
> > > >>
> > > >> This makes even a trivial test like the following to cause ICE
> in
> > > lto_read_decls with -O3 -fopenmp -foffload=nvptx-none:
> > > >>
> > > >> int main()
> > > >> {
> > > >>  int x;
> > > >>  #pragma omp target map(x)
> > > >>    x;
> > > >> }
> > > >>
> > > >> The attached patch modifies optc-save-gen.awk to generate if
> > > >> (!lto_stream_offload_p) check before streaming out
> > > >> target-specific
> > > opt in cl_optimization_stream_out, which fixes the issue.
> > > cl_optimization_stream_out after patch (last few entries):
> > > >>
> > > >>  bp_pack_var_len_int (bp, ptr->x_flag_wrapv_pointer);
> > > >> bp_pack_var_len_int (bp, ptr->x_debug_nonbind_markers_p);  if
> > > >> (!lto_stream_offload_p)  bp_pack_var_len_int (bp,
> > > >> ptr->x_flag_aarch64_early_ldp_fusion);
> > > >>  if (!lto_stream_offload_p)
> > > >>  bp_pack_var_len_int (bp, ptr->x_aarch64_early_ra);  if
> > > >> (!lto_stream_offload_p)  bp_pack_var_len_int (bp,
> > > >> ptr->x_flag_aarch64_late_ldp_fusion);
> > > >>  if (!lto_stream_offload_p)
> > > >>  bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_div);  if
> > > >> (!lto_stream_offload_p)  bp_pack_var_len_int (bp,
> > > >> ptr->x_flag_mrecip_low_precision_sqrt);
> > > >>  if (!lto_stream_offload_p)
> > > >>  bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_sqrt);
> for
> > > >> (size_t i = 0; i < ARRAY_SIZE (ptr->explicit_mask); i++)
> > > >>    bp_pack_value (bp, ptr->explicit_mask[i], 64);
> > > >>
> > > >> For target-specific options, streaming out is gated on
> > > !lto_stream_offload_p check.
> > > >>
> > > >> The patch also fixes failures due to same issue with
> > > >> x86_64->nvptx
> > > offloading for target-print-1.f90 (and couple more).
> > > >> Does the patch look OK ?
> > > >
> > > > I think it seems to be on the right track. One thing that is
> also
> > > > going to be an issue is streaming in, there could be a target
> > > > option on the offload side that is marked as Optimization that
> > > > would might also cause issues. We should check to make sure that
> > > > also gets fixed here too. Or error out for offloading targets
> > > > can't have target options with Optimization on them during the
> build.
> > Thanks for the suggestions. The attached patch modifies
> > optc-save-gen.awk to emit an error if accel backend marks target
> specific option with Optimization.
> > AFAIU, currently neither nvptx nor gcn have target-specific options
> > marked with Optimization, so this is mostly a safeguard against
> future additions.
> >
> > cl_optimization_stream_in after patch for target-specifc
> optimization options:
> >
> > #ifdef ACCEL_COMPILER
> > #error accel compiler cannot define Optimization attribute for
> > target-specific option x_flag_aarch64_early_ldp_fusion #else
> >   ptr->x_flag_aarch64_early_ldp_fusion = (signed char )
> > bp_unpack_var_len_int (bp); #endif
> >
> > To test if this works, I added -mfoo to nvptx.opt and marked it with
> > both Target and Optimization, which resulted in the following build
> error for nvptx:
> >
> > options-save.cc:13548:2: error: #error accel compiler cannot define
> > Optimization attribute for target-specifc option x_flag_nvptx_foo
> > 13548 | #error accel compiler cannot define Optimization attribute
> for target-specific option x_flag_nvptx_foo
> >       |  ^~~~~
> > >
> > > It may have been misguided to mark target specific flags as
> > > Optimization.  It might be required to merge those (from all
> > > targets) into the common optimize enum, like we do for tree codes.
> > > Language specific options marked as Optimization possibly have the
> > > same issue when mixing with other languages and LTO.  Can you
> assess
> > > the situation a bit more?
> > AFAIK, only c-family/c.opt marks few options with Optimization flag.
> I
> > tried marking fortran's -ffrontend-optimize with Optimization and
> > verified that Optimization options are combined for c-family
> languages and fortran in cl_optimization_stream_{out,in}.
> >
> > cl_optimization_stream_out shows:
> >   ...
> >   bp_pack_var_len_int (bp, ptr->x_flag_frontend_optimize);
> >   ...
> >   bp_pack_var_len_int (bp, ptr->x_flag_nothrow_opt);
> >
> > and likewise has corresponding entries for
> cl_optimization_stream_in.
> >
> > So I guess this shouldn't be an issue with lang specific
> Optimization opts ?
> 
> Yes.  So as said if we now support multiple targets in one image by
> means of offloading we possibly should see to merge options like we do
> for frontends.
> 
> That said, I think the patch you posted is OK, the above is just
> something that we should remember when more similar cases pop up.
Thanks, I committed the patch in:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=db2e9a2a46f64b037494e8300c46f2d90a9fa55c
after LTO bootstrap+test on aarch64-linux-gnu and verifying it survives libgomp testing
for AArch64/nvptx offloading.

Thanks,
Prathamesh
> 
> Richard.
> 
> > Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> >
> > Thanks,
> > Prathamesh
> > >
> > > I think the proposed fix looks reasonable but the problem might be
> > > more widespread and warrant a more global solution or at least
> > > revisiting documentation?
> > >
> > > Thanks,
> > > Richard
> > >
> > > > Thanks,
> > > > Andrew Pinski
> > > >
> > > >>
> > > >> Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > >>
> > > >> Thanks,
> > > >> Prathamesh
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)
diff mbox series

Patch

diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk
index a3af88e3776..228efe2accd 100644
--- a/gcc/optc-save-gen.awk
+++ b/gcc/optc-save-gen.awk
@@ -1307,6 +1307,11 @@  for (i = 0; i < n_opts; i++) {
 			var_opt_optimize_init[n_opt_val] = init;
 		}
 
+		# Mark options that are annotated with both Optimization and
+		# Target so we can avoid streaming out target-specifc opts when
+		# offloading is enabled.
+		if (flag_set_p("Target", flags[i]))
+			var_target_opt[n_opt_val] = 1;
 		n_opt_val++;
 	}
 }
@@ -1384,6 +1389,10 @@  for (i = 0; i < n_opt_val; i++) {
 		} else {
 			sgn = "int";
 		}
+		# Do not stream out target-specifc opts if offloading is
+		# enabled.
+		if (var_target_opt[i])
+			print "  if (!lto_stream_offload_p)"
 		# If applicable, encode the streamed value.
 		if (var_opt_optimize_init[i]) {
 			print "  if (" var_opt_optimize_init[i] " > (" var_opt_val_type[i] ") 10)";