Message ID | ZI8owVx4aDXxwB8Y@kam.mff.cuni.cz |
---|---|
State | New |
Headers | show |
Series | Optimize std::max early | expand |
On Sun, Jun 18, 2023 at 5:55 PM Jan Hubicka via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi, > we currently produce very bad code on loops using std::vector as a stack, since > we fail to inline push_back which in turn prevents SRA and we fail to optimize > out some store-to-load pairs (PR109849). > > I looked into why this function is not inlined and it is inlined by clang. We > currently estimate it to 66 instructions and inline limits are 15 at -O2 and 30 > at -O3. Clang has similar estimate, but still decides to inline at -O2. > > I looked into reason why the body is so large and one problem I spotted is the > way std::max is implemented by taking and returning reference to the values. > > const T& max( const T& a, const T& b ); > > This makes it necessary to store the values to memory and load them later > (Max is used by code computing new size of vector on resize.) > Two stores, conditional and load accounts as 8 instructions, while > MAX_EXPR as 1 and has a lot better chance to fold with the surrounding > code. > > We optimize this to MAX_EXPR, but only during late optimizations. I think this > is a common enough coding pattern and we ought to make this transparent to > early opts and IPA. The following is easist fix that simply adds phiprop pass > that turns the PHI of address values into PHI of values so later FRE can > propagate values across memory, phiopt discover the MAX_EXPR pattern and DSE > remove the memory stores. > > Bootstrapped/regtested x86_64-linux, does this look resonable thing to do? OK. Thanks, Richard. > Looking into how expensive the pass is, I think it is very cheap, except > that it computes postdominator and updates ssa even if no patterns > are matched. I will send patch to avoid that. > > gcc/ChangeLog: > > PR tree-optimization/109811 > PR tree-optimization/109849 > * passes.def: Add phiprop to early optimization passes. > * tree-ssa-phiprop.cc: Allow clonning. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/109811 > PR tree-optimization/109849 > * gcc.dg/tree-ssa/phiprop-1.c: New test. > > diff --git a/gcc/passes.def b/gcc/passes.def > index c9a8f19747b..faa5208b26b 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -88,6 +88,8 @@ along with GCC; see the file COPYING3. If not see > /* pass_build_ealias is a dummy pass that ensures that we > execute TODO_rebuild_alias at this point. */ > NEXT_PASS (pass_build_ealias); > + /* Do phiprop before FRE so we optimize std::min and std::max well. */ > + NEXT_PASS (pass_phiprop); > NEXT_PASS (pass_fre, true /* may_iterate */); > NEXT_PASS (pass_early_vrp); > NEXT_PASS (pass_merge_phi); > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phiprop-1.c b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-1.c > new file mode 100644 > index 00000000000..9f52c2a7298 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-1.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O1 -fdump-tree-phiprop1-details -fdump-tree-release_ssa" } */ > +int max(int a, int b) > +{ > + int *ptr; > + if (a > b) > + ptr = &a; > + else > + ptr = &b; > + return *ptr; > +} > + > +/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 "phiprop1"} } */ > +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "release_ssa"} } */ > diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc > index 3cb4900b6be..5dc505df420 100644 > --- a/gcc/tree-ssa-phiprop.cc > +++ b/gcc/tree-ssa-phiprop.cc > @@ -476,6 +476,7 @@ public: > {} > > /* opt_pass methods: */ > + opt_pass * clone () final override { return new pass_phiprop (m_ctxt); } > bool gate (function *) final override { return flag_tree_phiprop; } > unsigned int execute (function *) final override; >
diff --git a/gcc/passes.def b/gcc/passes.def index c9a8f19747b..faa5208b26b 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -88,6 +88,8 @@ along with GCC; see the file COPYING3. If not see /* pass_build_ealias is a dummy pass that ensures that we execute TODO_rebuild_alias at this point. */ NEXT_PASS (pass_build_ealias); + /* Do phiprop before FRE so we optimize std::min and std::max well. */ + NEXT_PASS (pass_phiprop); NEXT_PASS (pass_fre, true /* may_iterate */); NEXT_PASS (pass_early_vrp); NEXT_PASS (pass_merge_phi); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phiprop-1.c b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-1.c new file mode 100644 index 00000000000..9f52c2a7298 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-1.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fdump-tree-phiprop1-details -fdump-tree-release_ssa" } */ +int max(int a, int b) +{ + int *ptr; + if (a > b) + ptr = &a; + else + ptr = &b; + return *ptr; +} + +/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 "phiprop1"} } */ +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "release_ssa"} } */ diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc index 3cb4900b6be..5dc505df420 100644 --- a/gcc/tree-ssa-phiprop.cc +++ b/gcc/tree-ssa-phiprop.cc @@ -476,6 +476,7 @@ public: {} /* opt_pass methods: */ + opt_pass * clone () final override { return new pass_phiprop (m_ctxt); } bool gate (function *) final override { return flag_tree_phiprop; } unsigned int execute (function *) final override;