Message ID | 565DADE6.8020908@mentor.com |
---|---|
State | New |
Headers | show |
On Tue, 1 Dec 2015, Tom de Vries wrote: > [ was: Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta ] > > On 30/11/15 17:36, Tom de Vries wrote: > > On 30/11/15 14:24, Richard Biener wrote: > > > On Mon, 30 Nov 2015, Tom de Vries wrote: > > > > > > > On 30/11/15 10:16, Richard Biener wrote: > > > > > On Mon, 30 Nov 2015, Tom de Vries wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > this patch fixes PR46032. > > > > > > > > > > > > It handles a call: > > > > > > ... > > > > > > __builtin_GOMP_parallel (fn, data, num_threads, flags) > > > > > > ... > > > > > > as: > > > > > > ... > > > > > > fn (data) > > > > > > ... > > > > > > in ipa-pta. > > > > > > > > > > > > This improves ipa-pta alias analysis in the parallelized function > > > > > > fn, > > This follow-up patch does the same for BUILT_IN_GOACC_PARALLEL. > > Bootstrapped and reg-tested on x86_64. > > OK for stage3 trunk? Ok. Richard.
On Tue, Dec 01, 2015 at 03:25:42PM +0100, Tom de Vries wrote: > Handle BUILT_IN_GOACC_PARALLEL in ipa-pta > > 2015-12-01 Tom de Vries <tom@codesourcery.com> > > * tree-ssa-structalias.c (find_func_aliases_for_builtin_call) > (find_func_clobbers, ipa_pta_execute): Handle BUILT_IN_GOACC_PARALLEL. Isn't this cheating though? The kernel will be called with those addresses only if doing host fallback (and for GOMP_target_ext even not for that always - firstprivate vars will have the addresses replaced by addresses of alloca-ed copies of those objects). I haven't studied in detail what exactly IPA-PTA does, so maybe it is good enough to pretend that. Jakub
On 01/12/15 15:44, Jakub Jelinek wrote: > On Tue, Dec 01, 2015 at 03:25:42PM +0100, Tom de Vries wrote: >> Handle BUILT_IN_GOACC_PARALLEL in ipa-pta >> >> 2015-12-01 Tom de Vries <tom@codesourcery.com> >> >> * tree-ssa-structalias.c (find_func_aliases_for_builtin_call) >> (find_func_clobbers, ipa_pta_execute): Handle BUILT_IN_GOACC_PARALLEL. > > Isn't this cheating though? The kernel will be called with those addresses > only if doing host fallback Let's take a look at goacc/kernels-alias-ipa-pta.c: ... unsigned int a[N]; unsigned int b[N]; unsigned int c[N]; #pragma acc kernels pcopyout (a, b, c) { a[0] = 0; b[0] = 1; c[0] = a[0]; } ... If we execute on the host, the a, b and c used in the kernels region will be the a, b and c declared outside the region. If we execute on a non-shared mem accelerator, the a, b and c used in the kernels region will be copies of a, b and c in the accelerator memory: a.1, b.1 and c.1. This patch tells ipa-pta (which has no notion of a.1, b.1 and c.1) that we're using declared a, b, and c, in the kernels region, while on the accelerator we're really using a.1, b.1 and c.1, so in that sense it's cheating. However, given that declared a, b and c are disjunct, we know that their copies will be disjunct, so by pretending that declared a, b and c are used in the kernels region, we get conclusions which are also valid when we use a.1, b.1 and c.1 instead in the kernels region. So, for this patch to be incorrect we have to find an example where ipa-pta finds that two memory references are not aliasing, while on the accelerator those memory references are really aliasing. AFAICT there are no such examples. > (and for GOMP_target_ext even not for that > always - firstprivate vars will have the addresses replaced by addresses of > alloca-ed copies of those objects). I don't think firstprivate vars is a problem, I think the opposite would a problem: merging vars on the accelerator which are disjunct on the host. > I haven't studied in detail what exactly IPA-PTA does, so maybe it is good > enough to pretend that. AFAIU, it's good enough, because the points-to information is only used to prove non-aliases. Does this explanation address your concern? Thanks, - Tom
On Wed, Dec 02, 2015 at 12:46:47AM +0100, Tom de Vries wrote:
> Does this explanation address your concern?
Yeah, for now it is fine I hope.
Jakub
Hi! On Tue, 1 Dec 2015 15:25:42 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote: > Handle BUILT_IN_GOACC_PARALLEL in ipa-pta > * c-c++-common/goacc/kernels-alias-ipa-pta-2.c: New test. > * c-c++-common/goacc/kernels-alias-ipa-pta-3.c: New test. > * c-c++-common/goacc/kernels-alias-ipa-pta.c: New test. I see: PASS: c-c++-common/goacc/kernels-alias-ipa-pta-2.c (test for excess errors) FAIL: c-c++-common/goacc/kernels-alias-ipa-pta-2.c scan-tree-dump-times optimized "(?n)= 0;$" 2 PASS: c-c++-common/goacc/kernels-alias-ipa-pta-2.c scan-tree-dump-times optimized "(?n)= 1;$" 1 FAIL: c-c++-common/goacc/kernels-alias-ipa-pta-2.c scan-tree-dump-times optimized "(?n)= \\*a" 0 PASS: c-c++-common/goacc/kernels-alias-ipa-pta-3.c (test for excess errors) PASS: c-c++-common/goacc/kernels-alias-ipa-pta-3.c scan-tree-dump-times optimized "(?n)= 0;$" 1 PASS: c-c++-common/goacc/kernels-alias-ipa-pta-3.c scan-tree-dump-times optimized "(?n)= 1;$" 1 PASS: c-c++-common/goacc/kernels-alias-ipa-pta-3.c scan-tree-dump-times optimized "(?n)= \\*a" 1 PASS: c-c++-common/goacc/kernels-alias-ipa-pta.c (test for excess errors) FAIL: c-c++-common/goacc/kernels-alias-ipa-pta.c scan-tree-dump-times optimized "(?n)= 0;$" 2 PASS: c-c++-common/goacc/kernels-alias-ipa-pta.c scan-tree-dump-times optimized "(?n)= 1;$" 1 FAIL: c-c++-common/goacc/kernels-alias-ipa-pta.c scan-tree-dump-times optimized "(?n)= \\*_[0-9]\\[0\\];$" 0 ..., and similar for C++. Looking at c-c++-common/goacc/kernels-alias-ipa-pta.c: > --- /dev/null > +++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta.c > @@ -0,0 +1,23 @@ > +/* { dg-additional-options "-O2" } */ > +/* { dg-additional-options "-fipa-pta -fdump-tree-optimized" } */ > + > +#define N 2 > + > +void > +foo (void) > +{ > + unsigned int a[N]; > + unsigned int b[N]; > + unsigned int c[N]; > + > +#pragma acc kernels pcopyout (a, b, c) > + { > + a[0] = 0; > + b[0] = 1; > + c[0] = a[0]; > + } > +} > + > +/* { dg-final { scan-tree-dump-times "(?n)= 0;$" 2 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times "(?n)= 1;$" 1 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times "(?n)= \\*_\[0-9\]\\\[0\\\];$" 0 "optimized" } } */ ..., manually running that one for C, I get: ;; Function foo._omp_fn.0 (foo._omp_fn.0, funcdef_no=1, decl_uid=1874, cgraph_uid=1, symbol_order=1) __attribute__((oacc function (1, 1, 1), omp target entrypoint)) foo._omp_fn.0 (const struct .omp_data_t.0 & restrict .omp_data_i) { unsigned int[2] * _3; unsigned int[2] * _5; unsigned int _7; unsigned int[2] * _8; <bb 2>: _3 = *.omp_data_i_2(D).a; *_3[0] = 0; _5 = *.omp_data_i_2(D).b; *_5[0] = 1; _7 = *_3[0]; _8 = *.omp_data_i_2(D).c; *_8[0] = _7; return; } ;; Function foo (foo, funcdef_no=0, decl_uid=1866, cgraph_uid=0, symbol_order=0) foo () { unsigned int c[2]; unsigned int b[2]; unsigned int a[2]; struct .omp_data_t.0 .omp_data_arr.1; static long unsigned int .omp_data_sizes.2[3] = {8, 8, 8}; static short unsigned int .omp_data_kinds.3[3] = {514, 514, 514}; <bb 2>: .omp_data_arr.1.c = &c; .omp_data_arr.1.b = &b; .omp_data_arr.1.a = &a; GOACC_parallel_keyed (-1, foo._omp_fn.0, 3, &.omp_data_arr.1, &.omp_data_sizes.2, &.omp_data_kinds.3, 0); .omp_data_arr.1 ={v} {CLOBBER}; a ={v} {CLOBBER}; b ={v} {CLOBBER}; c ={v} {CLOBBER}; return; } Grüße Thomas
On 02/12/15 18:58, Thomas Schwinge wrote: > Hi! > > On Tue, 1 Dec 2015 15:25:42 +0100, Tom de Vries<Tom_deVries@mentor.com> wrote: >> >Handle BUILT_IN_GOACC_PARALLEL in ipa-pta >> > * c-c++-common/goacc/kernels-alias-ipa-pta-2.c: New test. >> > * c-c++-common/goacc/kernels-alias-ipa-pta-3.c: New test. >> > * c-c++-common/goacc/kernels-alias-ipa-pta.c: New test. > I see: > > PASS: c-c++-common/goacc/kernels-alias-ipa-pta-2.c (test for excess errors) > FAIL: c-c++-common/goacc/kernels-alias-ipa-pta-2.c scan-tree-dump-times optimized "(?n)= 0;$" 2 > PASS: c-c++-common/goacc/kernels-alias-ipa-pta-2.c scan-tree-dump-times optimized "(?n)= 1;$" 1 > FAIL: c-c++-common/goacc/kernels-alias-ipa-pta-2.c scan-tree-dump-times optimized "(?n)= \\*a" 0 > PASS: c-c++-common/goacc/kernels-alias-ipa-pta-3.c (test for excess errors) > PASS: c-c++-common/goacc/kernels-alias-ipa-pta-3.c scan-tree-dump-times optimized "(?n)= 0;$" 1 > PASS: c-c++-common/goacc/kernels-alias-ipa-pta-3.c scan-tree-dump-times optimized "(?n)= 1;$" 1 > PASS: c-c++-common/goacc/kernels-alias-ipa-pta-3.c scan-tree-dump-times optimized "(?n)= \\*a" 1 > PASS: c-c++-common/goacc/kernels-alias-ipa-pta.c (test for excess errors) > FAIL: c-c++-common/goacc/kernels-alias-ipa-pta.c scan-tree-dump-times optimized "(?n)= 0;$" 2 > PASS: c-c++-common/goacc/kernels-alias-ipa-pta.c scan-tree-dump-times optimized "(?n)= 1;$" 1 > FAIL: c-c++-common/goacc/kernels-alias-ipa-pta.c scan-tree-dump-times optimized "(?n)= \\*_[0-9]\\[0\\];$" 0 > > ..., and similar for C++. Curious, I get all passes for both C and C++ (at r231192). > Looking at > c-c++-common/goacc/kernels-alias-ipa-pta.c: > >> >--- /dev/null >> >+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta.c >> >@@ -0,0 +1,23 @@ >> >+/* { dg-additional-options "-O2" } */ >> >+/* { dg-additional-options "-fipa-pta -fdump-tree-optimized" } */ >> >+ >> >+#define N 2 >> >+ >> >+void >> >+foo (void) >> >+{ >> >+ unsigned int a[N]; >> >+ unsigned int b[N]; >> >+ unsigned int c[N]; >> >+ >> >+#pragma acc kernels pcopyout (a, b, c) >> >+ { >> >+ a[0] = 0; >> >+ b[0] = 1; >> >+ c[0] = a[0]; >> >+ } >> >+} >> >+ >> >+/* { dg-final { scan-tree-dump-times "(?n)= 0;$" 2 "optimized" } } */ >> >+/* { dg-final { scan-tree-dump-times "(?n)= 1;$" 1 "optimized" } } */ >> >+/* { dg-final { scan-tree-dump-times "(?n)= \\*_\[0-9\]\\\[0\\\];$" 0 "optimized" } } */ > ..., manually running that one for C, I get: > > ;; Function foo._omp_fn.0 (foo._omp_fn.0, funcdef_no=1, decl_uid=1874, cgraph_uid=1, symbol_order=1) > > __attribute__((oacc function (1, 1, 1), omp target entrypoint)) > foo._omp_fn.0 (const struct .omp_data_t.0 & restrict .omp_data_i) > { > unsigned int[2] * _3; > unsigned int[2] * _5; > unsigned int _7; > unsigned int[2] * _8; > > <bb 2>: > _3 = *.omp_data_i_2(D).a; > *_3[0] = 0; > _5 = *.omp_data_i_2(D).b; > *_5[0] = 1; > _7 = *_3[0]; > _8 = *.omp_data_i_2(D).c; > *_8[0] = _7; > return; > > } Indeed, the optimization hasn't taken place here so it's correct that the scan fails. I've attached the kernels-alias-ipa-pta.c.201t.optimized which I get, and it's clear the optimization is happening there (which explains why I get all PASSES). The question is, why is the optimization not happening for you. Thanks, - Tom ;; Function foo._omp_fn.0 (foo._omp_fn.0, funcdef_no=1, decl_uid=1874, cgraph_uid=1, symbol_order=1) __attribute__((oacc function (1, 1, 1), omp target entrypoint)) foo._omp_fn.0 (const struct .omp_data_t.0 & restrict .omp_data_i) { unsigned int[2] * _3; unsigned int[2] * _5; unsigned int[2] * _8; <bb 2>: _3 = *.omp_data_i_2(D).a; *_3[0] = 0; _5 = *.omp_data_i_2(D).b; *_5[0] = 1; _8 = *.omp_data_i_2(D).c; *_8[0] = 0; return; } ;; Function foo (foo, funcdef_no=0, decl_uid=1866, cgraph_uid=0, symbol_order=0) foo () { unsigned int c[2]; unsigned int b[2]; unsigned int a[2]; struct .omp_data_t.0 .omp_data_arr.1; static long unsigned int .omp_data_sizes.2[3] = {8, 8, 8}; static short unsigned int .omp_data_kinds.3[3] = {514, 514, 514}; <bb 2>: .omp_data_arr.1.c = &c; .omp_data_arr.1.b = &b; .omp_data_arr.1.a = &a; __builtin_GOACC_parallel_keyed (-1, foo._omp_fn.0, 3, &.omp_data_arr.1, &.omp_data_sizes.2, &.omp_data_kinds.3, 0); .omp_data_arr.1 ={v} {CLOBBER}; a ={v} {CLOBBER}; b ={v} {CLOBBER}; c ={v} {CLOBBER}; return; }
On 03/12/15 00:31, Tom de Vries wrote: > On 02/12/15 18:58, Thomas Schwinge wrote: >> Hi! >> >> On Tue, 1 Dec 2015 15:25:42 +0100, Tom de >> Vries<Tom_deVries@mentor.com> wrote: >>> >Handle BUILT_IN_GOACC_PARALLEL in ipa-pta >>> > * c-c++-common/goacc/kernels-alias-ipa-pta-2.c: New test. >>> > * c-c++-common/goacc/kernels-alias-ipa-pta-3.c: New test. >>> > * c-c++-common/goacc/kernels-alias-ipa-pta.c: New test. >> I see: >> >> PASS: c-c++-common/goacc/kernels-alias-ipa-pta-2.c (test for >> excess errors) >> FAIL: c-c++-common/goacc/kernels-alias-ipa-pta-2.c >> scan-tree-dump-times optimized "(?n)= 0;$" 2 >> PASS: c-c++-common/goacc/kernels-alias-ipa-pta-2.c >> scan-tree-dump-times optimized "(?n)= 1;$" 1 >> FAIL: c-c++-common/goacc/kernels-alias-ipa-pta-2.c >> scan-tree-dump-times optimized "(?n)= \\*a" 0 >> PASS: c-c++-common/goacc/kernels-alias-ipa-pta-3.c (test for >> excess errors) >> PASS: c-c++-common/goacc/kernels-alias-ipa-pta-3.c >> scan-tree-dump-times optimized "(?n)= 0;$" 1 >> PASS: c-c++-common/goacc/kernels-alias-ipa-pta-3.c >> scan-tree-dump-times optimized "(?n)= 1;$" 1 >> PASS: c-c++-common/goacc/kernels-alias-ipa-pta-3.c >> scan-tree-dump-times optimized "(?n)= \\*a" 1 >> PASS: c-c++-common/goacc/kernels-alias-ipa-pta.c (test for excess >> errors) >> FAIL: c-c++-common/goacc/kernels-alias-ipa-pta.c >> scan-tree-dump-times optimized "(?n)= 0;$" 2 >> PASS: c-c++-common/goacc/kernels-alias-ipa-pta.c >> scan-tree-dump-times optimized "(?n)= 1;$" 1 >> FAIL: c-c++-common/goacc/kernels-alias-ipa-pta.c >> scan-tree-dump-times optimized "(?n)= \\*_[0-9]\\[0\\];$" 0 >> >> ..., and similar for C++. > > Curious, I get all passes for both C and C++ (at r231192). > I've managed to reproduce it. The difference between pass and fail is whether the compiler is configured with or without accelerator. I'll look into it. Thanks, - Tom
On 03/12/15 01:10, Tom de Vries wrote: > > I've managed to reproduce it. The difference between pass and fail is > whether the compiler is configured with or without accelerator. > > I'll look into it. In the configuration with accelerator, the flag node->force_output is on for foo._omp.fn. This causes nonlocal_p to be true in ipa_pta_execute, which causes the optimization to fail. The flag is decribed as: ... /* The symbol will be assumed to be used in an invisible way (like by an toplevel asm statement). */ ... Looks like I have to ignore the force_output flag as well in ipa_pta_execute for this sort of node. Thanks, - Tom
On Thu, 3 Dec 2015, Tom de Vries wrote: > On 03/12/15 01:10, Tom de Vries wrote: > > > > I've managed to reproduce it. The difference between pass and fail is > > whether the compiler is configured with or without accelerator. > > > > I'll look into it. > > In the configuration with accelerator, the flag node->force_output is on for > foo._omp.fn. > > This causes nonlocal_p to be true in ipa_pta_execute, which causes the > optimization to fail. > > The flag is decribed as: > ... > /* The symbol will be assumed to be used in an invisible way (like > by an toplevel asm statement). */ > ... > > Looks like I have to ignore the force_output flag as well in ipa_pta_execute > for this sort of node. It rather looks like the flag shouldn't be set. The fn after all has its address taken!(?) Richard.
On 03/12/15 09:59, Richard Biener wrote: > On Thu, 3 Dec 2015, Tom de Vries wrote: > >> On 03/12/15 01:10, Tom de Vries wrote: >>> >>> I've managed to reproduce it. The difference between pass and fail is >>> whether the compiler is configured with or without accelerator. >>> >>> I'll look into it. >> >> In the configuration with accelerator, the flag node->force_output is on for >> foo._omp.fn. >> >> This causes nonlocal_p to be true in ipa_pta_execute, which causes the >> optimization to fail. >> >> The flag is decribed as: >> ... >> /* The symbol will be assumed to be used in an invisible way (like >> by an toplevel asm statement). */ >> ... >> >> Looks like I have to ignore the force_output flag as well in ipa_pta_execute >> for this sort of node. > > It rather looks like the flag shouldn't be set. The fn after all has > its address taken!(?) > The flag is set here in expand_omp_target: ... 12682 /* Prevent IPA from removing child_fn as unreachable, since there are no 12683 refs from the parent function to child_fn in offload LTO mode. */ 12684 if (ENABLE_OFFLOADING) 12685 cgraph_node::get (child_fn)->mark_force_output (); ... I guess setting forced_by_abi instead would also mean child_fn is not removed as unreachable, while still allowing optimizations: ... /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol to be exported. Unlike FORCE_OUTPUT this flag gets cleared to symbols promoted to static and it does not inhibit optimization. */ unsigned forced_by_abi : 1; ... But I suspect that other optimizations (than ipa-pta) might break things. Essentially we have two situations: - in the host compiler, there is no need for the forced_output flag, and it inhibits optimization - in the accelerator compiler, it (or some equivalent) is needed I wonder if setting the force_output flag only when streaming the bytecode for offloading would work. That way, it wouldn't be set in the host compiler, while being set in the accelerator compiler. Thanks, - Tom
On Thu, 3 Dec 2015, Tom de Vries wrote: > On 03/12/15 09:59, Richard Biener wrote: > > On Thu, 3 Dec 2015, Tom de Vries wrote: > > > > > On 03/12/15 01:10, Tom de Vries wrote: > > > > > > > > I've managed to reproduce it. The difference between pass and fail is > > > > whether the compiler is configured with or without accelerator. > > > > > > > > I'll look into it. > > > > > > In the configuration with accelerator, the flag node->force_output is on > > > for > > > foo._omp.fn. > > > > > > This causes nonlocal_p to be true in ipa_pta_execute, which causes the > > > optimization to fail. > > > > > > The flag is decribed as: > > > ... > > > /* The symbol will be assumed to be used in an invisible way (like > > > by an toplevel asm statement). */ > > > ... > > > > > > Looks like I have to ignore the force_output flag as well in > > > ipa_pta_execute > > > for this sort of node. > > > > It rather looks like the flag shouldn't be set. The fn after all has > > its address taken!(?) > > > > The flag is set here in expand_omp_target: > ... > 12682 /* Prevent IPA from removing child_fn as unreachable, > since there are no > 12683 refs from the parent function to child_fn in offload > LTO mode. */ > 12684 if (ENABLE_OFFLOADING) > 12685 cgraph_node::get (child_fn)->mark_force_output (); > ... > How are there no refs from the "parent"? Are there not refs from some kind of descriptor that maps fallback CPU and offloaded variants? I think the above needs sorting out in somw way, making the refs explicit rather than implicit via force_output. > I guess setting forced_by_abi instead would also mean child_fn is not removed > as unreachable, while still allowing optimizations: > ... > /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol > to be exported. Unlike FORCE_OUTPUT this flag gets cleared to > symbols promoted to static and it does not inhibit > optimization. */ > unsigned forced_by_abi : 1; > ... > > But I suspect that other optimizations (than ipa-pta) might break things. How so? > Essentially we have two situations: > - in the host compiler, there is no need for the forced_output flag, > and it inhibits optimization > - in the accelerator compiler, it (or some equivalent) is needed > > I wonder if setting the force_output flag only when streaming the bytecode for > offloading would work. That way, it wouldn't be set in the host compiler, > while being set in the accelerator compiler. Yeah, that was my original thinking btw. Richard.
On Thu, Dec 03, 2015 at 12:09:04PM +0100, Tom de Vries wrote: > The flag is set here in expand_omp_target: > ... > 12682 /* Prevent IPA from removing child_fn as unreachable, > since there are no > 12683 refs from the parent function to child_fn in offload > LTO mode. */ > 12684 if (ENABLE_OFFLOADING) > 12685 cgraph_node::get (child_fn)->mark_force_output (); > ... > > I guess setting forced_by_abi instead would also mean child_fn is not > removed as unreachable, while still allowing optimizations: > ... > /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol > to be exported. Unlike FORCE_OUTPUT this flag gets cleared to > symbols promoted to static and it does not inhibit > optimization. */ > unsigned forced_by_abi : 1; > ... > > But I suspect that other optimizations (than ipa-pta) might break things. > > Essentially we have two situations: > - in the host compiler, there is no need for the forced_output flag, > and it inhibits optimization > - in the accelerator compiler, it (or some equivalent) is needed > > I wonder if setting the force_output flag only when streaming the bytecode > for offloading would work. That way, it wouldn't be set in the host > compiler, while being set in the accelerator compiler. I believe that the host and offload func (and var) tables need to be in sync, so there needs to be something both in the host and accel compilers that prevents the functions and variables that have their accel or host counterpart in the tables from being optimized away, or say replaced by a clone with different arguments etc. Jakub
Handle BUILT_IN_GOACC_PARALLEL in ipa-pta 2015-12-01 Tom de Vries <tom@codesourcery.com> * tree-ssa-structalias.c (find_func_aliases_for_builtin_call) (find_func_clobbers, ipa_pta_execute): Handle BUILT_IN_GOACC_PARALLEL. * c-c++-common/goacc/kernels-alias-ipa-pta-2.c: New test. * c-c++-common/goacc/kernels-alias-ipa-pta-3.c: New test. * c-c++-common/goacc/kernels-alias-ipa-pta.c: New test. --- .../c-c++-common/goacc/kernels-alias-ipa-pta-2.c | 37 ++++++++++++++++++++++ .../c-c++-common/goacc/kernels-alias-ipa-pta-3.c | 36 +++++++++++++++++++++ .../c-c++-common/goacc/kernels-alias-ipa-pta.c | 23 ++++++++++++++ gcc/tree-ssa-structalias.c | 28 +++++++++++++--- .../kernels-alias-ipa-pta-2.c | 27 ++++++++++++++++ .../kernels-alias-ipa-pta-3.c | 26 +++++++++++++++ .../kernels-alias-ipa-pta.c | 26 +++++++++++++++ 7 files changed, 199 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-2.c new file mode 100644 index 0000000..f16d698 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-2.c @@ -0,0 +1,37 @@ +/* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fipa-pta -fdump-tree-optimized" } */ + +#ifdef __cplusplus +extern "C" { +#endif +typedef __SIZE_TYPE__ size_t; +void *malloc (size_t); +void free (void *); +#ifdef __cplusplus +} +#endif + +#define N 2 + +void +foo (void) +{ + unsigned int *a = (unsigned int *)malloc (N * sizeof (unsigned int)); + unsigned int *b = (unsigned int *)malloc (N * sizeof (unsigned int)); + unsigned int *c = (unsigned int *)malloc (N * sizeof (unsigned int)); + +#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N]) + { + a[0] = 0; + b[0] = 1; + c[0] = a[0]; + } + + free (a); + free (b); + free (c); +} + +/* { dg-final { scan-tree-dump-times "(?n)= 0;$" 2 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "(?n)= 1;$" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "(?n)= \\*a" 0 "optimized" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-3.c new file mode 100644 index 0000000..1eb56eb --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-3.c @@ -0,0 +1,36 @@ +/* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fipa-pta -fdump-tree-optimized" } */ + +#ifdef __cplusplus +extern "C" { +#endif +typedef __SIZE_TYPE__ size_t; +void *malloc (size_t); +void free (void *); +#ifdef __cplusplus +} +#endif + +#define N 2 + +void +foo (void) +{ + unsigned int *a = (unsigned int *)malloc (N * sizeof (unsigned int)); + unsigned int *b = a; + unsigned int *c = (unsigned int *)malloc (N * sizeof (unsigned int)); + +#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N]) + { + a[0] = 0; + b[0] = 1; + c[0] = a[0]; + } + + free (a); + free (c); +} + +/* { dg-final { scan-tree-dump-times "(?n)= 0;$" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "(?n)= 1;$" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "(?n)= \\*a" 1 "optimized" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta.c new file mode 100644 index 0000000..969b466 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta.c @@ -0,0 +1,23 @@ +/* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fipa-pta -fdump-tree-optimized" } */ + +#define N 2 + +void +foo (void) +{ + unsigned int a[N]; + unsigned int b[N]; + unsigned int c[N]; + +#pragma acc kernels pcopyout (a, b, c) + { + a[0] = 0; + b[0] = 1; + c[0] = a[0]; + } +} + +/* { dg-final { scan-tree-dump-times "(?n)= 0;$" 2 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "(?n)= 1;$" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "(?n)= \\*_\[0-9\]\\\[0\\\];$" 0 "optimized" } } */ diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c index 7f4a8ad..060ff3e 100644 --- a/gcc/tree-ssa-structalias.c +++ b/gcc/tree-ssa-structalias.c @@ -4507,15 +4507,32 @@ find_func_aliases_for_builtin_call (struct function *fn, gcall *t) return true; } case BUILT_IN_GOMP_PARALLEL: + case BUILT_IN_GOACC_PARALLEL: { - /* Handle __builtin_GOMP_parallel (fn, data, num_threads, flags) as - fn (data). */ if (in_ipa_mode) { - tree fnarg = gimple_call_arg (t, 0); + unsigned int fnpos, argpos; + switch (DECL_FUNCTION_CODE (fndecl)) + { + case BUILT_IN_GOMP_PARALLEL: + /* __builtin_GOMP_parallel (fn, data, num_threads, flags). */ + fnpos = 0; + argpos = 1; + break; + case BUILT_IN_GOACC_PARALLEL: + /* __builtin_GOACC_parallel (device, fn, mapnum, hostaddrs, + sizes, kinds, ...). */ + fnpos = 1; + argpos = 3; + break; + default: + gcc_unreachable (); + } + + tree fnarg = gimple_call_arg (t, fnpos); gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR); tree fndecl = TREE_OPERAND (fnarg, 0); - tree arg = gimple_call_arg (t, 1); + tree arg = gimple_call_arg (t, argpos); gcc_assert (TREE_CODE (arg) == ADDR_EXPR); varinfo_t fi = get_vi_for_tree (fndecl); @@ -5064,6 +5081,7 @@ find_func_clobbers (struct function *fn, gimple *origt) case BUILT_IN_VA_END: return; case BUILT_IN_GOMP_PARALLEL: + case BUILT_IN_GOACC_PARALLEL: return; /* printf-style functions may have hooks to set pointers to point to somewhere into the generated string. Leave them @@ -7547,6 +7565,8 @@ ipa_pta_execute (void) /* Handle direct calls to functions with body. */ if (gimple_call_builtin_p (stmt, BUILT_IN_GOMP_PARALLEL)) decl = TREE_OPERAND (gimple_call_arg (stmt, 0), 0); + else if (gimple_call_builtin_p (stmt, BUILT_IN_GOACC_PARALLEL)) + decl = TREE_OPERAND (gimple_call_arg (stmt, 1), 0); else decl = gimple_call_fndecl (stmt); diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c new file mode 100644 index 0000000..0f323c8 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c @@ -0,0 +1,27 @@ +/* { dg-additional-options "-O2 -fipa-pta" } */ + +#include <stdlib.h> + +#define N 2 + +int +main (void) +{ + unsigned int *a = (unsigned int *)malloc (N * sizeof (unsigned int)); + unsigned int *b = (unsigned int *)malloc (N * sizeof (unsigned int)); + unsigned int *c = (unsigned int *)malloc (N * sizeof (unsigned int)); + +#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N]) + { + a[0] = 0; + b[0] = 1; + c[0] = a[0]; + } + + if (a[0] != 0 || b[0] != 1 || c[0] != 0) + abort (); + + free (a); + free (b); + free (c); +} diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c new file mode 100644 index 0000000..654e750 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c @@ -0,0 +1,26 @@ +/* { dg-additional-options "-O2 -fipa-pta" } */ + +#include <stdlib.h> + +#define N 2 + +int +main (void) +{ + unsigned int *a = (unsigned int *)malloc (N * sizeof (unsigned int)); + unsigned int *b = a; + unsigned int *c = (unsigned int *)malloc (N * sizeof (unsigned int)); + +#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N]) + { + a[0] = 0; + b[0] = 1; + c[0] = a[0]; + } + + if (a[0] != 1 || b[0] != 1 || c[0] != 1) + abort (); + + free (a); + free (c); +} diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c new file mode 100644 index 0000000..44d4fd2 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c @@ -0,0 +1,26 @@ +/* { dg-additional-options "-O2 -fipa-pta" } */ + +#include <stdlib.h> + +#define N 2 + +int +main (void) +{ + unsigned int a[N]; + unsigned int b[N]; + unsigned int c[N]; + +#pragma acc kernels pcopyout (a, b, c) + { + a[0] = 0; + b[0] = 1; + c[0] = a[0]; + } + + if (a[0] != 0 || b[0] != 1 || c[0] != 0) + abort (); + + return 0; +} +