Message ID | 20190112222131.29519-7-tdevries@suse.de |
---|---|
State | New |
Headers | show |
Series | Add support for warp-multiple openacc vector length | expand |
Hi Tom! While working on something completely different, I had to dig deeper, and noticed a thing there, and deeper, and notice another thing, and deeper, and noticed this other thing here... (So, business as usual...) ;-) On 2019-01-12T23:21:28+0100, Tom de Vries <tdevries@suse.de> wrote: > --- /dev/null > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c > +#pragma acc routine vector > +void __attribute__((noinline, noclone)) > +Vector (int *ptr, int n, const int inc) > +{ > +#pragma acc parallel copy (ary) vector_length (128) /* { dg-warning "using vector_length \\(32\\) due to call to vector-partitionable routine, ignoring 128" } */ > + { > + Vector (&ary[0][0], m * n, (1 << 24) - (1 << 16)); This works as diagnosed/expected. On 2019-01-12T23:21:31+0100, Tom de Vries <tdevries@suse.de> wrote: > --- /dev/null > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c > @@ -0,0 +1,52 @@ > +/* { dg-do run { target openacc_nvidia_accel_selected } } */ > +/* { dg-additional-options "-fopenacc-dim=::128" } */ Via '-fopenacc-dim', we here request a default 'vector_length(128)'. > +#pragma acc parallel copy (ary) > + { > + Vector (&ary[0][0], m * n, (1 << 24) - (1 << 16)); As above, 'vector_length(128)' must be demoted to 'vector_length(32)' (and in fact, it is) -- but we're not getting a diagnostic for that. Is this expected? On 2019-01-12T23:21:28+0100, Tom de Vries <tdevries@suse.de> wrote: > --- /dev/null > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c > @@ -0,0 +1,54 @@ > +/* { dg-do run { target openacc_nvidia_accel_selected } } */ > +/* { dg-set-target-env-var "GOMP_OPENACC_DIM" "::128" } */ This testcase needs 'dg-additional-options "-fopenacc-dim=::-"' (or similar), but support for that is still missing in master branch (I'm working on porting over the corresponding patch), so this currently defaults to 'vector_length(32)', and... > +#pragma acc parallel copy (ary) > + { > + Vector (&ary[0][0], m * n, (1 << 24) - (1 << 16)); ... thus no diagnostic here, and... > +/* { dg-prune-output "using vector_length \\(32\\), ignoring runtime setting" } */ ... we're in fact not seeing this diagnostic. In addition to the (presumedly unexpected) missing diagnostic for '-fopenacc-dim=::128' mentioned above -- OK to simplify and enhance the testcases as attached, "Simplify and enhance 'libgomp.oacc-c-c++-common/pr85486*.c' [PR85486]"? Grüße Thomas ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
On 10/30/20 5:16 PM, Thomas Schwinge wrote: > Hi Tom! > > While working on something completely different, I had to dig deeper, and > noticed a thing there, and deeper, and notice another thing, and deeper, > and noticed this other thing here... (So, business as usual...) ;-) > > On 2019-01-12T23:21:28+0100, Tom de Vries <tdevries@suse.de> wrote: >> --- /dev/null >> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c > >> +#pragma acc routine vector >> +void __attribute__((noinline, noclone)) >> +Vector (int *ptr, int n, const int inc) >> +{ > >> +#pragma acc parallel copy (ary) vector_length (128) /* { dg-warning "using vector_length \\(32\\) due to call to vector-partitionable routine, ignoring 128" } */ >> + { >> + Vector (&ary[0][0], m * n, (1 << 24) - (1 << 16)); > > This works as diagnosed/expected. > > On 2019-01-12T23:21:31+0100, Tom de Vries <tdevries@suse.de> wrote: >> --- /dev/null >> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c >> @@ -0,0 +1,52 @@ >> +/* { dg-do run { target openacc_nvidia_accel_selected } } */ >> +/* { dg-additional-options "-fopenacc-dim=::128" } */ > > Via '-fopenacc-dim', we here request a default 'vector_length(128)'. > >> +#pragma acc parallel copy (ary) >> + { >> + Vector (&ary[0][0], m * n, (1 << 24) - (1 << 16)); > > As above, 'vector_length(128)' must be demoted to 'vector_length(32)' > (and in fact, it is) -- but we're not getting a diagnostic for that. Is > this expected? > I think it would be good to have. I don't know whether it's implemented. > On 2019-01-12T23:21:28+0100, Tom de Vries <tdevries@suse.de> wrote: >> --- /dev/null >> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c >> @@ -0,0 +1,54 @@ >> +/* { dg-do run { target openacc_nvidia_accel_selected } } */ >> +/* { dg-set-target-env-var "GOMP_OPENACC_DIM" "::128" } */ > > This testcase needs 'dg-additional-options "-fopenacc-dim=::-"' (or > similar), but support for that is still missing in master branch (I'm > working on porting over the corresponding patch), so this currently > defaults to 'vector_length(32)', and... > >> +#pragma acc parallel copy (ary) >> + { >> + Vector (&ary[0][0], m * n, (1 << 24) - (1 << 16)); > > ... thus no diagnostic here, and... > >> +/* { dg-prune-output "using vector_length \\(32\\), ignoring runtime setting" } */ > > ... we're in fact not seeing this diagnostic. > > > In addition to the (presumedly unexpected) missing diagnostic for > '-fopenacc-dim=::128' mentioned above -- OK to simplify and enhance the > testcases as attached, "Simplify and enhance > 'libgomp.oacc-c-c++-common/pr85486*.c' [PR85486]"? > Yep, looks good. Thanks, - Tom
Hi Tom! On 2020-10-30T17:32:56+0100, Tom de Vries <tdevries@suse.de> wrote: > On 10/30/20 5:16 PM, Thomas Schwinge wrote: >> OK to simplify and enhance the >> testcases as attached, "Simplify and enhance >> 'libgomp.oacc-c-c++-common/pr85486*.c' [PR85486]"? > > Yep, looks good. As posted, pushed "Simplify and enhance 'libgomp.oacc-c-c++-common/pr85486*.c' [PR85486]" to master branch in commit 79680c1d5cd3d89c2e7423e20dc8a6e1d6dc8151, and backported to releases/gcc-10 branch in commit 28aaad48d5aafde3e5f269864ba934c602011328, releases/gcc-9 branch in commit 8860822a91e2e90a5eae726a478cd5ffc0d1fbfa. Grüße Thomas ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c new file mode 100644 index 00000000000..a959b90c29a --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c @@ -0,0 +1,54 @@ +/* { dg-do run { target openacc_nvidia_accel_selected } } */ +/* { dg-set-target-env-var "GOMP_OPENACC_DIM" "::128" } */ + +/* Minimized from ref-1.C. */ + +#include <stdio.h> + +#pragma acc routine vector +void __attribute__((noinline, noclone)) +Vector (int *ptr, int n, const int inc) +{ + #pragma acc loop vector + for (unsigned ix = 0; ix < n; ix++) + ptr[ix] += inc; +} + +int +main (void) +{ + const int n = 32, m=32; + + int ary[m][n]; + unsigned ix, iy; + + for (ix = m; ix--;) + for (iy = n; iy--;) + ary[ix][iy] = (1 << 16) + (ix << 8) + iy; + + int err = 0; + +#pragma acc parallel copy (ary) + { + Vector (&ary[0][0], m * n, (1 << 24) - (1 << 16)); + } + + for (ix = m; ix--;) + for (iy = n; iy--;) + if (ary[ix][iy] != ((1 << 24) + (ix << 8) + iy)) + { + printf ("ary[%u][%u] = %x expected %x\n", + ix, iy, ary[ix][iy], ((1 << 24) + (ix << 8) + iy)); + err++; + } + + if (err) + { + printf ("%d failed\n", err); + return 1; + } + + return 0; +} + +/* { dg-prune-output "using vector_length \\(32\\), ignoring runtime setting" } */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c new file mode 100644 index 00000000000..99c08059d37 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c @@ -0,0 +1,51 @@ +/* { dg-do run { target openacc_nvidia_accel_selected } } */ + +/* Minimized from ref-1.C. */ + +#include <stdio.h> + +#pragma acc routine vector +void __attribute__((noinline, noclone)) +Vector (int *ptr, int n, const int inc) +{ + #pragma acc loop vector + for (unsigned ix = 0; ix < n; ix++) + ptr[ix] += inc; +} + +int +main (void) +{ + const int n = 32, m=32; + + int ary[m][n]; + unsigned ix, iy; + + for (ix = m; ix--;) + for (iy = n; iy--;) + ary[ix][iy] = (1 << 16) + (ix << 8) + iy; + + int err = 0; + +#pragma acc parallel copy (ary) vector_length (128) /* { dg-warning "using vector_length \\(32\\) due to call to vector-partitionable routine, ignoring 128" } */ + { + Vector (&ary[0][0], m * n, (1 << 24) - (1 << 16)); + } + + for (ix = m; ix--;) + for (iy = n; iy--;) + if (ary[ix][iy] != ((1 << 24) + (ix << 8) + iy)) + { + printf ("ary[%u][%u] = %x expected %x\n", + ix, iy, ary[ix][iy], ((1 << 24) + (ix << 8) + iy)); + err++; + } + + if (err) + { + printf ("%d failed\n", err); + return 1; + } + + return 0; +}