Message ID | 877dyxnfa0.fsf@euler.schwinge.homeip.net |
---|---|
State | New |
Headers | show |
Series | Revert "[nvptx, libgomp] Update pr85381-{2, 4}.c test-cases" [PR89713, PR94392] (was: [PATCH][RFC] c/94392 - only enable -ffinite-loops for C++) | expand |
On Fri, 3 Apr 2020, Thomas Schwinge wrote: > Hi! > > On 2020-04-02T11:12:48+0200, Richard Biener <rguenther@suse.de> wrote: > > On Wed, 1 Apr 2020, Jason Merrill wrote: > > > >> On 4/1/20 9:36 AM, Richard Biener wrote: > >> > This does away with enabling -ffinite-loops at -O2+ for all languages > >> > and instead enables it selectively for C++ only. > > > I'm retesting the following [...] > > ..., which got pushed in commit 75efe9cb1f8938a713ce540dc3b27bc2afcd3fae > "c/94392 - only enable -ffinite-loops for C++". > > I pushed the attached in commit 4f6a0888de52a2e523a6fd4235fe7f8193819c3b > 'Revert "[nvptx, libgomp] Update pr85381-{2,4}.c test-cases" [PR89713, > PR94392]'. As can be observed in two nvptx offloading test cases > regressing, 'apparently now again "empty oacc loops are" no longer > "removed before expand"' (quoting myself from the commit log, adapting > Tom's commit log snippet from the reverted commit). > > It's not obvious to me how the "finite loop" property discussed/changed > in Richard's commit 75efe9cb1f8938a713ce540dc3b27bc2afcd3fae "c/94392 - > only enable -ffinite-loops for C++" relates to the previously observed > optimization of removing "empty oacc loops [...] before expand" (after > PR89713 commit c29c92c789d93848cc1c929838771bfc68cb272c "PR > tree-optimization/89713 - Assume loop with an exit is finite"), but > examining that in detail is for another day. For C we no longer have -ffinite-loops in effect but for C++ we still do. But since the testcase is c/c++ common I'd have expected it now fails "split" ... so an explicit -fno-finite-loops or -ffinite-loops with an explanation would be easier. Note there's now also the opportunity to set the loop flag for OpenACC/OpenMP annotated loops if any of that guarantees finiteness. (for GCC11 only please) Richard.
On Fri, Apr 03, 2020 at 11:36:29AM +0200, Richard Biener wrote: > Note there's now also the opportunity to set the loop flag for > OpenACC/OpenMP annotated loops if any of that guarantees finiteness. > (for GCC11 only please) Dunno about OpenACC, but OpenMP loops guarantee finiteness, as the number of iterations must be computable before the loop and must fit into the type in which that count is computed without overflows. Jakub
Hi! Frederik stumbled over a related thing. On 2020-04-03T12:36:29+0200, Richard Biener <rguenther@suse.de> wrote: > On Fri, 3 Apr 2020, Thomas Schwinge wrote: >> On 2020-04-02T11:12:48+0200, Richard Biener <rguenther@suse.de> wrote: >> > On Wed, 1 Apr 2020, Jason Merrill wrote: >> > >> >> On 4/1/20 9:36 AM, Richard Biener wrote: >> >> > This does away with enabling -ffinite-loops at -O2+ for all languages >> >> > and instead enables it selectively for C++ only. >> >> > I'm retesting the following [...] >> >> ..., which got pushed in commit 75efe9cb1f8938a713ce540dc3b27bc2afcd3fae >> "c/94392 - only enable -ffinite-loops for C++". >> >> I pushed the attached in commit 4f6a0888de52a2e523a6fd4235fe7f8193819c3b >> 'Revert "[nvptx, libgomp] Update pr85381-{2,4}.c test-cases" [PR89713, >> PR94392]'. As can be observed in two nvptx offloading test cases >> regressing, 'apparently now again "empty oacc loops are" no longer >> "removed before expand"' (quoting myself from the commit log, adapting >> Tom's commit log snippet from the reverted commit). >> >> It's not obvious to me how the "finite loop" property discussed/changed >> in Richard's commit 75efe9cb1f8938a713ce540dc3b27bc2afcd3fae "c/94392 - >> only enable -ffinite-loops for C++" relates to the previously observed >> optimization of removing "empty oacc loops [...] before expand" (after >> PR89713 commit c29c92c789d93848cc1c929838771bfc68cb272c "PR >> tree-optimization/89713 - Assume loop with an exit is finite"), but >> examining that in detail is for another day. > > For C we no longer have -ffinite-loops in effect but for C++ we still > do. But since the testcase is c/c++ common I'd have expected it > now fails "split" ... so an explicit -fno-finite-loops or > -ffinite-loops with an explanation would be easier. (Thanks, and ACK; still have to look into that.) > Note there's now also the opportunity to set the loop flag for > OpenACC/OpenMP annotated loops if any of that guarantees finiteness. > (for GCC11 only please) On 2020-04-03T13:34:18+0200, Jakub Jelinek <jakub@redhat.com> wrote: > Dunno about OpenACC, but OpenMP loops guarantee finiteness, as the number of > iterations must be computable before the loop and must fit into the type in > which that count is computed without overflows. Specifically, is that computable at run-time or compile-time? Similar for OpenACC. For example, OpenACC 3.0, 2.9. "Loop Construct", "Restrictions": "A loop associated with a 'loop' construct that does not have a 'seq' clause must be written such that the loop iteration count is computable when entering the loop construct". (This can only viewed by members of the OpenACC GitHub organization, but I wanted to share the pointer anyway, and can relay discussion as necessary.) For the next version of OpenACC (soon!), this is being further clarified as per the current discussion in <https://github.com/OpenACC/openacc-spec/pull/320> "Proposed changes for range-based for loops and != operator", which should be relevant here: | - A loop associated with a 'loop' construct that does not have a 'seq' | clause must be written to meet all of the following conditions: | | - The loop variable must be of integer, C/C++ pointer, or C++ | random-access iterator type. | | - The loop variable must monotonically increase or decrease in the | direction of its termination condition. | | - The loop iteration count must be computable in constant time when | entering the 'loop' construct. | | For a C++ range-based 'for' loop, the loop variable identified by the | above conditions is the internal iterator, such as a pointer, that | the compiler generates to iterate the range. It is not the variable | declared by the 'for' loop. (Notice: "computable in constant time" (which means: not computable only by actually iterating the whole loop structure), and "computable [...] when entering" (which, if I got that right, means: at run-time, not necessarily already compile-time?). Grüße Thomas ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
On Fri, Oct 30, 2020 at 03:09:31PM +0100, Thomas Schwinge wrote: > On 2020-04-03T13:34:18+0200, Jakub Jelinek <jakub@redhat.com> wrote: > > Dunno about OpenACC, but OpenMP loops guarantee finiteness, as the number of > > iterations must be computable before the loop and must fit into the type in > > which that count is computed without overflows. > > Specifically, is that computable at run-time or compile-time? At run-time. OpenMP certainly doesn't disallow loops with non-constant steps or lower/upper bound expressions (ok, there is one exception, for != loop condition the step must be a constant expression, the step has to be just 1 or -1). Jakub
From 4f6a0888de52a2e523a6fd4235fe7f8193819c3b Mon Sep 17 00:00:00 2001 From: Thomas Schwinge <thomas@codesourcery.com> Date: Fri, 3 Apr 2020 10:07:16 +0200 Subject: [PATCH] Revert "[nvptx, libgomp] Update pr85381-{2,4}.c test-cases" [PR89713, PR94392] In response to PR94392 commit 75efe9cb1f8938a713ce540dc3b27bc2afcd3fae "c/94392 - only enable -ffinite-loops for C++", this reverts PR89713 commit 00908992f2a78f213d227aea8dbab014a1361df0, as apparently now again "empty oacc loops are" no longer "removed before expand". libgomp/ PR tree-optimization/89713 PR c/94392 * testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Again expect 'bar.sync'. * testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Likewise. --- libgomp/ChangeLog | 8 ++++++++ .../libgomp.oacc-c-c++-common/pr85381-2.c | 20 ++++++++++++++++++- .../libgomp.oacc-c-c++-common/pr85381-4.c | 5 ++++- 3 files changed, 31 insertions(+), 2 deletions(-) diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog index 6c437930b02f..3716f559aa1c 100644 --- a/libgomp/ChangeLog +++ b/libgomp/ChangeLog @@ -1,3 +1,11 @@ +2020-04-03 Thomas Schwinge <thomas@codesourcery.com> + + PR tree-optimization/89713 + PR c/94392 + * testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Again expect + 'bar.sync'. + * testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Likewise. + 2020-03-31 Tobias Burnus <tobias@codesourcery.com> * target.c (GOMP_target_enter_exit_data): Handle PSET/MAP_POINTER. diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c index 2cb5b95949de..6570c64afff5 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c @@ -15,4 +15,22 @@ main (void) return 0; } -/* { dg-final { scan-assembler-times "bar.sync" 0 } } */ +/* Todo: Boths bar.syncs can be removed. + Atm we generate this dead code inbetween forked and joining: + + mov.u32 %r28, %ntid.y; + mov.u32 %r29, %tid.y; + add.u32 %r30, %r29, %r29; + setp.gt.s32 %r31, %r30, 19; + @%r31 bra $L2; + add.u32 %r25, %r28, %r28; + mov.u32 %r24, %r30; + $L3: + add.u32 %r24, %r24, %r25; + setp.le.s32 %r33, %r24, 19; + @%r33 bra $L3; + $L2: + + so the loop is not recognized as empty loop (which we detect by seeing if + joining immediately follows forked). */ +/* { dg-final { scan-assembler-times "bar.sync" 2 } } */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c index e8a433ffc0a5..d955d79718df 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c @@ -21,4 +21,7 @@ main (void) return 0; } -/* { dg-final { scan-assembler-times "bar.sync" 0 } } */ +/* Atm, %ntid.y is broadcast from one loop to the next, so there are 2 bar.syncs + for that (the other two are there for the same reason as in pr85381-2.c). + Todo: Recompute %ntid.y instead of broadcasting it. */ +/* { dg-final { scan-assembler-times "bar.sync" 4 } } */ -- 2.25.1