Message ID | da832c1c-fd40-c73e-b0b7-a5f003ad1e52@codesourcery.com |
---|---|
State | New |
Headers | show |
Series | + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and || | expand |
On Thu, May 06, 2021 at 12:17:03PM +0200, Tobias Burnus wrote: > OpenMP: Fix SIMT for complex/float reduction with && and || > > gcc/ChangeLog: > > * omp-low.c (lower_rec_input_clauses): Also handle SIMT part > for complex/float recution with && and ||. > > libgomp/ChangeLog: > > * testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing > complex/floating-point || + && recduction with 'omp target'. As the float/complex ||/&& reductions are IMHO just conformance issues, not something anyone would actually use in meaningful code - floats or complex aren't the most obvious or efficient holders of boolean values - I think punting SIMT on those isn't a workaround, but the right solution. Jakub
On 5/6/21 12:30 PM, Jakub Jelinek wrote: > On Thu, May 06, 2021 at 12:17:03PM +0200, Tobias Burnus wrote: >> OpenMP: Fix SIMT for complex/float reduction with && and || >> >> gcc/ChangeLog: >> >> * omp-low.c (lower_rec_input_clauses): Also handle SIMT part >> for complex/float recution with && and ||. >> >> libgomp/ChangeLog: >> >> * testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing >> complex/floating-point || + && recduction with 'omp target'. > > As the float/complex ||/&& reductions are IMHO just conformance issues, not > something anyone would actually use in meaningful code - floats or complex > aren't the most obvious or efficient holders of boolean values - I think > punting SIMT on those isn't a workaround, but the right solution. > Ack. WIP patch below tries that approach and fixes the ICE, but this simple example still doesn't work: ... int main () { float andf = 1; #pragma omp target parallel reduction(&&: andf) for (int i=0; i < 1024; ++i) andf = andf && 0.0; if ((int)andf != 0) __builtin_abort (); return 0; } ... Thanks, - Tom
On Thu, May 06, 2021 at 03:12:59PM +0200, Tom de Vries wrote: > + if (truth_value_p (OMP_CLAUSE_REDUCTION_CODE (c)) > + && TREE_CODE (TREE_TYPE (new_var)) != BOOLEAN_TYPE) I would use && !INTEGRAL_TYPE_P (TREE_TYPE (new_var)) Especially in C code using || or && with int or other non-_Bool types will pretty frequent. Of course, if that doesn't work for SIMT either, it needs further work and punting on those could be a temporary workaround. But it would be a preexisting issue, not something introduced with accepting &&/|| for floating/complex types - we've accepted &&/|| for integral types forever. Jakub
On 5/6/21 3:12 PM, Tom de Vries wrote: > On 5/6/21 12:30 PM, Jakub Jelinek wrote: >> On Thu, May 06, 2021 at 12:17:03PM +0200, Tobias Burnus wrote: >>> OpenMP: Fix SIMT for complex/float reduction with && and || >>> >>> gcc/ChangeLog: >>> >>> * omp-low.c (lower_rec_input_clauses): Also handle SIMT part >>> for complex/float recution with && and ||. >>> >>> libgomp/ChangeLog: >>> >>> * testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing >>> complex/floating-point || + && recduction with 'omp target'. >> >> As the float/complex ||/&& reductions are IMHO just conformance issues, not >> something anyone would actually use in meaningful code - floats or complex >> aren't the most obvious or efficient holders of boolean values - I think >> punting SIMT on those isn't a workaround, but the right solution. >> > > Ack. > > WIP patch below tries that approach and fixes the ICE, but this simple > example still doesn't work: > ... > int > main () > { > float andf = 1; > > #pragma omp target parallel reduction(&&: andf) > for (int i=0; i < 1024; ++i) > andf = andf && 0.0; > > if ((int)andf != 0) > __builtin_abort (); > > return 0; > } > ... Hm, after rewriting things like this: ... #pragma omp target map (tofrom: andf) #pragma omp parallel reduction(&&: andf) for (int i=0; i < 1024; ++i) andf = andf && 0.0; ... it does work. My limited openmp knowledge is not enough to decide whether the fail of the first variant is a test-case issue, or a gcc issue. Thanks, - Tom
On 06.05.21 15:12, Tom de Vries wrote: > WIP patch below tries that approach and fixes the ICE, Thanks! > but this simple example still doesn't work: > ... > #pragma omp target parallel reduction(&&: andf) Try: map(andf). [Cf. PR99928 with pending patch at https://gcc.gnu.org/pipermail/gcc-patches/2021-April/567838.html ] I have now added your WIP patch to my patch, honoring the comment by Jakub. I also copied the _Complex int example to -6.c to have also a target version for this. Lightly tested for now w/ and w/o offloading, will run the testsuite now. Tobias ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
On Thu, May 06, 2021 at 04:21:40PM +0200, Tobias Burnus wrote: > * omp-low.c (lower_rec_simd_input_clauses): Set max_vf = 1 if > a truth_value_p reduction variable is nonintegral. > (lower_rec_input_clauses): Also handle SIMT part > for complex/float recution with && and ||. s/recution/reduction/ > --- a/gcc/omp-low.c > +++ b/gcc/omp-low.c > @@ -4389,14 +4389,28 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx, > { > for (tree c = gimple_omp_for_clauses (ctx->stmt); c; > c = OMP_CLAUSE_CHAIN (c)) > - if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION > - && OMP_CLAUSE_REDUCTION_PLACEHOLDER (c)) > - { > - /* UDR reductions are not supported yet for SIMT, disable > - SIMT. */ > - sctx->max_vf = 1; > - break; > + { > + if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION) > + continue; > + > + if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c)) > + { > + /* UDR reductions are not supported yet for SIMT, disable > + SIMT. */ > + sctx->max_vf = 1; > + break; > + } > + > + if (truth_value_p (OMP_CLAUSE_REDUCTION_CODE (c)) > + && !INTEGRAL_TYPE_P (TREE_TYPE (new_var))) > + { > + /* Doing boolean operations on non-boolean types is > + for conformance only, it's not worth supporting this > + for SIMT. */ This comment needs to be adjusted to talk about non-integral types. > + sctx->max_vf = 1; > + break; > } > + } > } > if (maybe_gt (sctx->max_vf, 1U)) > { > @@ -6432,28 +6446,34 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist, > > gimplify_assign (unshare_expr (ivar), x, &llist[0]); > > - if (sctx.is_simt) > - { > - if (!simt_lane) > - simt_lane = create_tmp_var (unsigned_type_node); > - x = build_call_expr_internal_loc > - (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY, > - TREE_TYPE (ivar), 2, ivar, simt_lane); > - x = build2 (code, TREE_TYPE (ivar), ivar, x); > - gimplify_assign (ivar, x, &llist[2]); > - } > tree ivar2 = ivar; > tree ref2 = ref; > + tree zero = NULL_TREE; > if (is_fp_and_or) > { > - tree zero = build_zero_cst (TREE_TYPE (ivar)); > + zero = build_zero_cst (TREE_TYPE (ivar)); > ivar2 = fold_build2_loc (clause_loc, NE_EXPR, > integer_type_node, ivar, > zero); > ref2 = fold_build2_loc (clause_loc, NE_EXPR, > integer_type_node, ref, zero); > } > - x = build2 (code, TREE_TYPE (ref), ref2, ivar2); > + if (sctx.is_simt) > + { > + if (!simt_lane) > + simt_lane = create_tmp_var (unsigned_type_node); > + x = build_call_expr_internal_loc > + (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY, > + TREE_TYPE (ivar), 2, ivar, simt_lane); > + if (is_fp_and_or) > + x = fold_build2_loc (clause_loc, NE_EXPR, > + integer_type_node, x, zero); > + x = build2 (code, TREE_TYPE (ivar2), ivar2, x); > + if (is_fp_and_or) > + x = fold_convert (TREE_TYPE (ivar), x); > + gimplify_assign (ivar, x, &llist[2]); > + } > + x = build2 (code, TREE_TYPE (ref2), ref2, ivar2); > if (is_fp_and_or) > x = fold_convert (TREE_TYPE (ref), x); > ref = build_outer_var_ref (var, ctx); Is this hunk still needed when the first hunk is in? I mean, this is in code guarded with is_simd && lower_rec_simd_input_clauses (...) and that function will return false for if (known_eq (sctx->max_vf, 1U)) which the first hunk ensures. So sctx.is_simt && is_fp_and_or shouldn't be true in that code. Jakub
On 06.05.21 16:32, Jakub Jelinek wrote: > s/recution/reduction/ Fixed. > This comment needs to be adjusted to talk about non-integral types. Fixed. > Is this hunk still needed when the first hunk is in? No - and now removed. Updated code attached. Tobias ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
On Fri, May 07, 2021 at 12:05:11PM +0200, Tobias Burnus wrote: > 2021-05-07 Tobias Burnus <tobias@codesourcery.com> > Tom de Vries <tdevries@suse.de> > > gcc/ChangeLog: > > * omp-low.c (lower_rec_simd_input_clauses): Set max_vf = 1 if > a truth_value_p reduction variable is nonintegral. > > libgomp/ChangeLog: > > * testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing > complex/floating-point || + && reduction with 'omp target'. > * testsuite/libgomp.c-c++-common/reduction-5.c: Likewise. > > gcc/omp-low.c | 28 ++- > .../testsuite/libgomp.c-c++-common/reduction-5.c | 193 ++++++++++++++++++++ > .../testsuite/libgomp.c-c++-common/reduction-6.c | 196 +++++++++++++++++++++ > 3 files changed, 410 insertions(+), 7 deletions(-) Ok, thanks. Jakub
On 5/7/21 12:05 PM, Tobias Burnus wrote: > On 06.05.21 16:32, Jakub Jelinek wrote: > >> s/recution/reduction/ > Fixed. >> This comment needs to be adjusted to talk about non-integral types. > Fixed. >> Is this hunk still needed when the first hunk is in? > > No - and now removed. > > Updated code attached. > > libgomp/ChangeLog: > > * testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing > complex/floating-point || + && reduction with 'omp target'. > * testsuite/libgomp.c-c++-common/reduction-5.c: Likewise. 5 -> 6. Otherwise, LGTM. Thanks, - Tom
Hi! On 2021-05-07T12:05:11+0200, Tobias Burnus <tobias@codesourcery.com> wrote: > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c > @@ -0,0 +1,193 @@ > +/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */ > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c > @@ -0,0 +1,196 @@ > +/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */ Causes issues if more than nvptx offloading compilation is enabled. Thus pushed "'libgomp.c-c++-common/reduction-{5,6}.c': Restrict '-latomic' to nvptx offloading compilation" to master branch in commit 937fa5fb7840c19c96b1fdf1ce678699649a6c5e, see attached. Grüße Thomas ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
OpenMP: Fix SIMT for complex/float reduction with && and || gcc/ChangeLog: * omp-low.c (lower_rec_input_clauses): Also handle SIMT part for complex/float recution with && and ||. libgomp/ChangeLog: * testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing complex/floating-point || + && recduction with 'omp target'. gcc/omp-low.c | 30 ++-- .../testsuite/libgomp.c-c++-common/reduction-5.c | 192 +++++++++++++++++++++ 2 files changed, 210 insertions(+), 12 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 26ceaf7..46220c5 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -6432,28 +6432,34 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist, gimplify_assign (unshare_expr (ivar), x, &llist[0]); - if (sctx.is_simt) - { - if (!simt_lane) - simt_lane = create_tmp_var (unsigned_type_node); - x = build_call_expr_internal_loc - (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY, - TREE_TYPE (ivar), 2, ivar, simt_lane); - x = build2 (code, TREE_TYPE (ivar), ivar, x); - gimplify_assign (ivar, x, &llist[2]); - } tree ivar2 = ivar; tree ref2 = ref; + tree zero = NULL_TREE; if (is_fp_and_or) { - tree zero = build_zero_cst (TREE_TYPE (ivar)); + zero = build_zero_cst (TREE_TYPE (ivar)); ivar2 = fold_build2_loc (clause_loc, NE_EXPR, integer_type_node, ivar, zero); ref2 = fold_build2_loc (clause_loc, NE_EXPR, integer_type_node, ref, zero); } - x = build2 (code, TREE_TYPE (ref), ref2, ivar2); + if (sctx.is_simt) + { + if (!simt_lane) + simt_lane = create_tmp_var (unsigned_type_node); + x = build_call_expr_internal_loc + (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY, + TREE_TYPE (ivar), 2, ivar, simt_lane); + if (is_fp_and_or) + x = fold_build2_loc (clause_loc, NE_EXPR, + integer_type_node, x, zero); + x = build2 (code, TREE_TYPE (ivar2), ivar2, x); + if (is_fp_and_or) + x = fold_convert (TREE_TYPE (ivar), x); + gimplify_assign (ivar, x, &llist[2]); + } + x = build2 (code, TREE_TYPE (ref2), ref2, ivar2); if (is_fp_and_or) x = fold_convert (TREE_TYPE (ref), x); ref = build_outer_var_ref (var, ctx); diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c new file mode 100644 index 0000000..346c882 --- /dev/null +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c @@ -0,0 +1,192 @@ +/* C / C++'s logical AND and OR operators take any scalar argument + which compares (un)equal to 0 - the result 1 or 0 and of type int. + + In this testcase, the int result is again converted to a floating-poing + or complex type. + + While having a floating-point/complex array element with || and && can make + sense, having a non-integer/non-bool reduction variable is odd but valid. + + Test: FP reduction variable + FP array. */ + +#define N 1024 +_Complex float rcf[N]; +_Complex double rcd[N]; +float rf[N]; +double rd[N]; + +int +reduction_or () +{ + float orf = 0; + double ord = 0; + _Complex float orfc = 0; + _Complex double ordc = 0; + + #pragma omp target parallel reduction(||: orf) + for (int i=0; i < N; ++i) + orf = orf || rf[i]; + + #pragma omp target parallel for reduction(||: ord) + for (int i=0; i < N; ++i) + ord = ord || rcd[i]; + + #pragma omp target parallel for simd reduction(||: orfc) + for (int i=0; i < N; ++i) + orfc = orfc || rcf[i]; + + #pragma omp target parallel loop reduction(||: ordc) + for (int i=0; i < N; ++i) + ordc = ordc || rcd[i]; + + return orf + ord + __real__ orfc + __real__ ordc; +} + +int +reduction_or_teams () +{ + float orf = 0; + double ord = 0; + _Complex float orfc = 0; + _Complex double ordc = 0; + + #pragma omp target teams distribute parallel for reduction(||: orf) + for (int i=0; i < N; ++i) + orf = orf || rf[i]; + + #pragma omp target teams distribute parallel for simd reduction(||: ord) + for (int i=0; i < N; ++i) + ord = ord || rcd[i]; + + #pragma omp target teams distribute parallel for reduction(||: orfc) + for (int i=0; i < N; ++i) + orfc = orfc || rcf[i]; + + #pragma omp target teams distribute parallel for simd reduction(||: ordc) + for (int i=0; i < N; ++i) + ordc = ordc || rcd[i]; + + return orf + ord + __real__ orfc + __real__ ordc; +} + +int +reduction_and () +{ + float andf = 1; + double andd = 1; + _Complex float andfc = 1; + _Complex double anddc = 1; + + #pragma omp target parallel reduction(&&: andf) + for (int i=0; i < N; ++i) + andf = andf && rf[i]; + + #pragma omp target parallel for reduction(&&: andd) + for (int i=0; i < N; ++i) + andd = andd && rcd[i]; + + #pragma omp target parallel for simd reduction(&&: andfc) + for (int i=0; i < N; ++i) + andfc = andfc && rcf[i]; + + #pragma omp target parallel loop reduction(&&: anddc) + for (int i=0; i < N; ++i) + anddc = anddc && rcd[i]; + + return andf + andd + __real__ andfc + __real__ anddc; +} + +int +reduction_and_teams () +{ + float andf = 1; + double andd = 1; + _Complex float andfc = 1; + _Complex double anddc = 1; + + #pragma omp target teams distribute parallel for reduction(&&: andf) + for (int i=0; i < N; ++i) + andf = andf && rf[i]; + + #pragma omp target teams distribute parallel for simd reduction(&&: andd) + for (int i=0; i < N; ++i) + andd = andd && rcd[i]; + + #pragma omp target teams distribute parallel for reduction(&&: andfc) + for (int i=0; i < N; ++i) + andfc = andfc && rcf[i]; + + #pragma omp target teams distribute parallel for simd reduction(&&: anddc) + for (int i=0; i < N; ++i) + anddc = anddc && rcd[i]; + + return andf + andd + __real__ andfc + __real__ anddc; +} + +int +main () +{ + for (int i = 0; i < N; ++i) + { + rf[i] = 0; + rd[i] = 0; + rcf[i] = 0; + rcd[i] = 0; + } + + if (reduction_or () != 0) + __builtin_abort (); + if (reduction_or_teams () != 0) + __builtin_abort (); + if (reduction_and () != 0) + __builtin_abort (); + if (reduction_and_teams () != 0) + __builtin_abort (); + + rf[10] = 1.0; + rd[15] = 1.0; + rcf[10] = 1.0; + rcd[15] = 1.0i; + + if (reduction_or () != 4) + __builtin_abort (); + if (reduction_or_teams () != 4) + __builtin_abort (); + if (reduction_and () != 0) + __builtin_abort (); + if (reduction_and_teams () != 0) + __builtin_abort (); + + for (int i = 0; i < N; ++i) + { + rf[i] = 1; + rd[i] = 1; + rcf[i] = 1; + rcd[i] = 1; + } + + if (reduction_or () != 4) + __builtin_abort (); + if (reduction_or_teams () != 4) + __builtin_abort (); + if (reduction_and () != 4) + __builtin_abort (); + if (reduction_and_teams () != 4) + __builtin_abort (); + + rf[10] = 0.0; + rd[15] = 0.0; + rcf[10] = 0.0; + rcd[15] = 0.0; + + if (reduction_or () != 4) + __builtin_abort (); + if (reduction_or_teams () != 4) + __builtin_abort (); + if (reduction_and () != 0) + __builtin_abort (); + if (reduction_and_teams () != 0) + __builtin_abort (); + + return 0; +}