Message ID | 87y1taencs.fsf@dem-tschwing-1.ger.mentorg.com |
---|---|
State | New |
Headers | show |
Series | Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195] (was: Add 'c-c++-common/torture/pr107195-1.c' [PR107195] (was: [COMMITTED] [PR107195] Set range to zero when nonzero mask is 0.)) | expand |
> I understand 'r & 3' to be logically equivalent to '(r & 2) && (r & 1)', > right? For r == 2, r & 3 == 2, whereas (r & 2) && (r & 1) == 0, so no? Aldy
Hi! On 2022-10-20T14:23:33+0200, Aldy Hernandez <aldyh@redhat.com> wrote: >> I understand 'r & 3' to be logically equivalent to '(r & 2) && (r & 1)', >> right? > > For r == 2, r & 3 == 2, whereas (r & 2) && (r & 1) == 0, so no? Thanks, and now please let me crawl back under my stone, embarassing... That'd rather be '(r & 2) || (r & 1)'. Well, with that now clarified, how about the again updated "Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195]" attached? Have I done something stupid again re 'f4b', XFAILed? Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
On Thu, Oct 20, 2022 at 9:22 PM Thomas Schwinge <thomas@codesourcery.com> wrote: > > Hi! > > On 2022-10-20T14:23:33+0200, Aldy Hernandez <aldyh@redhat.com> wrote: > >> I understand 'r & 3' to be logically equivalent to '(r & 2) && (r & 1)', > >> right? > > > > For r == 2, r & 3 == 2, whereas (r & 2) && (r & 1) == 0, so no? > > Thanks, and now please let me crawl back under my stone, embarassing... > That'd rather be '(r & 2) || (r & 1)'. No worries. If there was a tally of how many times a GCC hacker has to crawl under a stone, I'd have the record ;-). > > Well, with that now clarified, how about the again updated > "Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195]" attached? I see 7 different tests in this patch. Did the 6 that pass, fail before my patch for PR107195 and are now working? Cause unless that's the case, they shouldn't be in a test named pr107195-3.c, but somewhere else. I see there's one XFAILed test in your patch, and this certainly doesn't look like something that has anything to do with the patch I submitted. Perhaps you could open a PR with an enhancement request for this one? That being said... /* { dg-additional-options -O1 } */ extern int __attribute__((const)) foo4b (int); int f4b (unsigned int r) { if (foo4b (r)) r *= 8U; if ((r / 2U) & 2U) r += foo4b (r); return r; } /* { dg-final { scan-tree-dump-times {gimple_call <foo4b,} 1 dom3 { xfail *-*-* } } } */ At -O2, this is something PRE is doing, so GCC already handles this. However, you are suggesting this isn't handled at -O1 and should be?? None of the VRPs run at -O1 so ranger-vrp won't even get a chance. However, DOM runs at -O1 and it uses ranger to do simple copy propagation and some jump threading...so technically we could do something... DOM should be able to thread from the r *= 8U to the return because the nonzero mask (known zeros) after the multiplication is 0xfffffff8, which it could use to solve the second conditional as false. This would leave us with: if (foo4b (r)) { r *= 8U; return r; } else { if ((r / 2U) & 2U) r += foo4b (r); } ...which exposes the fact that the second call to foo4b() has the same "r" as the first one, so it could be folded. I don't know whose job it is to notice that two const calls have the same arguments, but ISTM that if we thread the above correctly, someone should be able to clean this up. No clue whether this happens at -O1. However... we're not threading this. It looks like we're not keeping track of nonzero bits (known zeros) through the division. The multiplication gives us 0xfffffff8 and we should be able to divide that by 2 and get 0x7ffffffc which solves the second conditional to 0. So...maybe DOM+ranger could set things up for another pass to clean this up? Either way, you could open an enhancement request, if anything to keep the nonzero mask up to date through the division. Aldy
Hi! On 2022-10-21T00:44:30+0200, Aldy Hernandez <aldyh@redhat.com> wrote: > On Thu, Oct 20, 2022 at 9:22 PM Thomas Schwinge <thomas@codesourcery.com> wrote: >> "Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195]" attached? > > I see 7 different tests in this patch. Did the 6 that pass, fail > before my patch for PR107195 and are now working? Cause unless > that's the case, they shouldn't be in a test named pr107195-3.c, but > somewhere else. That's correct; I should've mentioned that I had verified this. With the code changes of commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04 "[PR107195] Set range to zero when nonzero mask is 0" reverted, we get: PASS: gcc.dg/tree-ssa/pr107195-3.c (test for excess errors) FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call <foo1," 1 FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call <foo2," 1 FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call <foo3," 1 FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call <foo4," 1 FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call <foo5," 1 FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call <foo6," 1 ..., and in 'pr107195-3.c.196t.dom3' instead see two calls of each 'foo[...]' function. That's with this... > I see there's one XFAILed test in your patch ... XFAILed test case removed, see the attached "Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195]"; OK now to push that version? > and this certainly > doesn't look like something that has anything to do with the patch I > submitted. Perhaps you could open a PR with an enhancement request > for this one? > > That being said... > > /* { dg-additional-options -O1 } */ > extern int > __attribute__((const)) > foo4b (int); > > int f4b (unsigned int r) > { > if (foo4b (r)) > r *= 8U; > > if ((r / 2U) & 2U) > r += foo4b (r); > > return r; > } > /* { dg-final { scan-tree-dump-times {gimple_call <foo4b,} 1 dom3 { > xfail *-*-* } } } */ > > At -O2, this is something PRE is doing, so GCC already handles this. > However, you are suggesting this isn't handled at -O1 and should be?? My thinking was that this optimization does work for 'r >> 1', but it doesn't work for 'r / 2'. > None of the VRPs run at -O1 so ranger-vrp won't even get a chance. > However, DOM runs at -O1 and it uses ranger to do simple copy > propagation and some jump threading...so technically we could do > something... > > DOM should be able to thread from the r *= 8U to the return because > the nonzero mask (known zeros) after the multiplication is 0xfffffff8, > which it could use to solve the second conditional as false. This > would leave us with: > > if (foo4b (r)) > { > r *= 8U; > return r; > } > else > { > if ((r / 2U) & 2U) > r += foo4b (r); > } > > ...which exposes the fact that the second call to foo4b() has the same > "r" as the first one, so it could be folded. I don't know whose job > it is to notice that two const calls have the same arguments, but ISTM > that if we thread the above correctly, someone should be able to clean > this up. No clue whether this happens at -O1. > > However... we're not threading this. It looks like we're not keeping > track of nonzero bits (known zeros) through the division. The > multiplication gives us 0xfffffff8 and we should be able to divide > that by 2 and get 0x7ffffffc which solves the second conditional to 0. > > So...maybe DOM+ranger could set things up for another pass to clean this up? > > Either way, you could open an enhancement request, if anything to keep > the nonzero mask up to date through the division. I've thus filed <https://gcc.gnu.org/PR107342> "Optimization opportunity where integer '/' corresponds to '>>'" for continuing that investigation. Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
On Fri, Oct 21, 2022 at 10:38 AM Thomas Schwinge <thomas@codesourcery.com> wrote: > > Hi! > > On 2022-10-21T00:44:30+0200, Aldy Hernandez <aldyh@redhat.com> wrote: > > On Thu, Oct 20, 2022 at 9:22 PM Thomas Schwinge <thomas@codesourcery.com> wrote: > >> "Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195]" attached? > > > > I see 7 different tests in this patch. Did the 6 that pass, fail > > before my patch for PR107195 and are now working? Cause unless > > that's the case, they shouldn't be in a test named pr107195-3.c, but > > somewhere else. > > That's correct; I should've mentioned that I had verified this. With the > code changes of commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04 > "[PR107195] Set range to zero when nonzero mask is 0" reverted, we get: > > PASS: gcc.dg/tree-ssa/pr107195-3.c (test for excess errors) > FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call <foo1," 1 > FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call <foo2," 1 > FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call <foo3," 1 > FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call <foo4," 1 > FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call <foo5," 1 > FAIL: gcc.dg/tree-ssa/pr107195-3.c scan-tree-dump-times dom3 "gimple_call <foo6," 1 > > ..., and in 'pr107195-3.c.196t.dom3' instead see two calls of each > 'foo[...]' function. > > That's with this... > > > I see there's one XFAILed test in your patch > > ... XFAILed test case removed, see the attached > "Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195]"; > OK now to push that version? OK, thanks.
From 1e7943646059c13713b0b3f1e667be9de2c03d0f Mon Sep 17 00:00:00 2001 From: Thomas Schwinge <thomas@codesourcery.com> Date: Mon, 17 Oct 2022 09:10:03 +0200 Subject: [PATCH] Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195] ... to display optimization performed as of recent commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04 "[PR107195] Set range to zero when nonzero mask is 0". PR tree-optimization/107195 gcc/testsuite/ * gcc.dg/tree-ssa/pr107195-3.c: New. --- gcc/testsuite/gcc.dg/tree-ssa/pr107195-3.c | 73 ++++++++++++++++++++++ 1 file changed, 73 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr107195-3.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr107195-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-3.c new file mode 100644 index 00000000000..d3c5a31a904 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-3.c @@ -0,0 +1,73 @@ +/* Inspired by 'libgomp.oacc-c-c++-common/nvptx-sese-1.c'. */ + +/* { dg-additional-options -O1 } */ +/* { dg-additional-options -fdump-tree-dom3-raw } */ + + +extern int +__attribute__((const)) +foo1 (int); + +int f1 (int r) +{ + if (foo1 (r)) /* If this first 'if' holds... */ + r *= 2; /* ..., 'r' now has a zero-value lower-most bit... */ + + if (r & 1) /* ..., so this second 'if' can never hold... */ + { /* ..., so this is unreachable. */ + /* In constrast, if the first 'if' does not hold ('foo1 (r) == 0'), the + second 'if' may hold, but we know ('foo1' being 'const') that + 'foo1 (r) == 0', so don't have to re-evaluate it here: */ + r += foo1 (r); + /* Thus, if optimizing, we only ever expect one call of 'foo1'. + { dg-final { scan-tree-dump-times {gimple_call <foo1,} 1 dom3 } } */ + } + + return r; +} + + +extern int +__attribute__((const)) +foo2 (int); + +int f2 (int r) +{ + if (foo2 (r)) /* If this first 'if' holds... */ + r *= 2; /* ..., 'r' now has a zero-value lower-most bit... */ + + if ((r & 2) && (r & 1)) /* ..., so 'r & 1' in this second 'if' can never hold... */ + { /* ..., so this is unreachable. */ + /* In constrast, if the first 'if' does not hold ('foo2 (r) == 0'), the + second 'if' may hold, but we know ('foo2' being 'const') that + 'foo2 (r) == 0', so don't have to re-evaluate it here: */ + r += foo2 (r); + /* Thus, if optimizing, we only ever expect one call of 'foo2'. + { dg-final { scan-tree-dump-times {gimple_call <foo2,} 1 dom3 } } */ + } + + return r; +} + + +extern int +__attribute__((const)) +foo3 (int); + +int f3 (int r) +{ + if (foo3 (r)) /* If this first 'if' holds... */ + r *= 2; /* ..., 'r' now has a zero-value lower-most bit... */ + + if (r & 3) /* ..., so this second 'if' can never hold... */ + { /* ..., so this is unreachable. */ + /* In constrast, if the first 'if' does not hold ('foo3 (r) == 0'), the + second 'if' may hold, but we know ('foo3' being 'const') that + 'foo3 (r) == 0', so don't have to re-evaluate it here: */ + r += foo3 (r); + /* Thus, if optimizing, we only ever expect one call of 'foo3'. + { dg-final { scan-tree-dump-times {gimple_call <foo3,} 1 dom3 { xfail *-*-* } } } */ + } + + return r; +} -- 2.25.1