Message ID | 1278543168-11395-1-git-send-email-sebpop@gmail.com |
---|---|
State | New |
Headers | show |
On Wed, 7 Jul 2010, Sebastian Pop wrote: > * common.opt (ftree-loop-if-convert): New flag. > * doc/invoke.texi (ftree-loop-if-convert): Documented. > * tree-if-conv.c (gate_tree_if_conversion): Enable if-conversion > when flag_tree_loop_if_convert is set. > --- > gcc/common.opt | 4 ++++ > gcc/doc/invoke.texi | 14 ++++++++++---- > gcc/tree-if-conv.c | 6 +++++- > 3 files changed, 19 insertions(+), 5 deletions(-) > > diff --git a/gcc/common.opt b/gcc/common.opt > index 6ca787a..111d7b7 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -653,6 +653,10 @@ fif-conversion2 > Common Report Var(flag_if_conversion2) Optimization > Perform conversion of conditional jumps to conditional execution > > +ftree-loop-if-convert > +Common Report Var(flag_tree_loop_if_convert) Init(-1) Optimization > +Convert conditional jumps in innermost loops to branchless equivalents > + > ; -finhibit-size-directive inhibits output of .size for ELF. > ; This is used only for compiling crtstuff.c, > ; and it may be extended to other effects > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index d70f130..0847e01 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -342,7 +342,7 @@ Objective-C and Objective-C++ Dialects}. > -fearly-inlining -fipa-sra -fexpensive-optimizations -ffast-math @gol > -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol > -fforward-propagate -ffunction-sections @gol > --fgcse -fgcse-after-reload -fgcse-las -fgcse-lm @gol > +-fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol > -fgcse-sm -fif-conversion -fif-conversion2 -findirect-inlining @gol > -finline-functions -finline-functions-called-once -finline-limit=@var{n} @gol > -finline-small-functions -fipa-cp -fipa-cp-clone -fipa-matrix-reorg -fipa-pta @gol > @@ -352,7 +352,7 @@ Objective-C and Objective-C++ Dialects}. > -fira-loop-pressure -fno-ira-share-save-slots @gol > -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol > -fivopts -fkeep-inline-functions -fkeep-static-consts @gol > --floop-block -floop-interchange -floop-strip-mine -fgraphite-identity @gol > +-floop-block -floop-interchange -floop-strip-mine @gol > -floop-parallelize-all -flto -flto-compression-level -flto-report -fltrans @gol > -fltrans-output-list -fmerge-all-constants -fmerge-constants -fmodulo-sched @gol > -fmodulo-sched-allow-regmoves -fmove-loop-invariants -fmudflap @gol > @@ -382,8 +382,8 @@ Objective-C and Objective-C++ Dialects}. > -fsplit-wide-types -fstack-protector -fstack-protector-all @gol > -fstrict-aliasing -fstrict-overflow -fthread-jumps -ftracer @gol > -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop @gol > --ftree-copyrename -ftree-dce @gol > --ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-im @gol > +-ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol > +-ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol > -ftree-phiprop -ftree-loop-distribution @gol > -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol > -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol > @@ -6883,6 +6883,12 @@ profitable to parallelize the loops. > Compare the results of several data dependence analyzers. This option > is used for debugging the data dependence analyzers. > > +@item -ftree-loop-if-convert > +Attempt to transform conditional jumps in the innermost loops to > +branch-less equivalents. The intent is to remove control-flow from > +the innermost loops in order to improve the ability of the > +auto-vectorization pass to handle these loops. > + Please state that this is enabled by default if vectorization is enabled. > @item -ftree-loop-distribution > Perform loop distribution. This flag can improve cache performance on > big loop bodies and allow further loop optimizations, like > diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c > index 8d5d226..873cd89 100644 > --- a/gcc/tree-if-conv.c > +++ b/gcc/tree-if-conv.c > @@ -1242,7 +1242,11 @@ main_tree_if_conversion (void) > static bool > gate_tree_if_conversion (void) > { > - return flag_tree_vectorize != 0; > + if (flag_tree_vectorize > + && flag_tree_loop_if_convert < 0) > + flag_tree_loop_if_convert = 1; Err, no. This should be return ((flag_tree_vectorize && flag_tree_loop_if_convert != 0) || flag_tree_loop_if_convert == 1); not set flag_tree_loop_if_convert here. But on a 2nd thought please follow what -ftree-cselim does, do Init(2) (ISTR -1 is now problematic for some reason), and in process_options () set flag_tree_loop_if_convert if it is equal to AUTODETECT_VALUE (2) to the setting of flag_tree_vectorize. The gate function then simply can return flag_tree_loop_if_convert. Ok with that change. Thanks, Richard.
On Thu, Jul 8, 2010 at 04:01, Richard Guenther <rguenther@suse.de> wrote: > On Wed, 7 Jul 2010, Sebastian Pop wrote: > >> * common.opt (ftree-loop-if-convert): New flag. >> * doc/invoke.texi (ftree-loop-if-convert): Documented. >> * tree-if-conv.c (gate_tree_if_conversion): Enable if-conversion >> when flag_tree_loop_if_convert is set. >> --- >> gcc/common.opt | 4 ++++ >> gcc/doc/invoke.texi | 14 ++++++++++---- >> gcc/tree-if-conv.c | 6 +++++- >> 3 files changed, 19 insertions(+), 5 deletions(-) >> >> diff --git a/gcc/common.opt b/gcc/common.opt >> index 6ca787a..111d7b7 100644 >> --- a/gcc/common.opt >> +++ b/gcc/common.opt >> @@ -653,6 +653,10 @@ fif-conversion2 >> Common Report Var(flag_if_conversion2) Optimization >> Perform conversion of conditional jumps to conditional execution >> >> +ftree-loop-if-convert >> +Common Report Var(flag_tree_loop_if_convert) Init(-1) Optimization >> +Convert conditional jumps in innermost loops to branchless equivalents >> + >> ; -finhibit-size-directive inhibits output of .size for ELF. >> ; This is used only for compiling crtstuff.c, >> ; and it may be extended to other effects >> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi >> index d70f130..0847e01 100644 >> --- a/gcc/doc/invoke.texi >> +++ b/gcc/doc/invoke.texi >> @@ -342,7 +342,7 @@ Objective-C and Objective-C++ Dialects}. >> -fearly-inlining -fipa-sra -fexpensive-optimizations -ffast-math @gol >> -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol >> -fforward-propagate -ffunction-sections @gol >> --fgcse -fgcse-after-reload -fgcse-las -fgcse-lm @gol >> +-fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol >> -fgcse-sm -fif-conversion -fif-conversion2 -findirect-inlining @gol >> -finline-functions -finline-functions-called-once -finline-limit=@var{n} @gol >> -finline-small-functions -fipa-cp -fipa-cp-clone -fipa-matrix-reorg -fipa-pta @gol >> @@ -352,7 +352,7 @@ Objective-C and Objective-C++ Dialects}. >> -fira-loop-pressure -fno-ira-share-save-slots @gol >> -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol >> -fivopts -fkeep-inline-functions -fkeep-static-consts @gol >> --floop-block -floop-interchange -floop-strip-mine -fgraphite-identity @gol >> +-floop-block -floop-interchange -floop-strip-mine @gol >> -floop-parallelize-all -flto -flto-compression-level -flto-report -fltrans @gol >> -fltrans-output-list -fmerge-all-constants -fmerge-constants -fmodulo-sched @gol >> -fmodulo-sched-allow-regmoves -fmove-loop-invariants -fmudflap @gol >> @@ -382,8 +382,8 @@ Objective-C and Objective-C++ Dialects}. >> -fsplit-wide-types -fstack-protector -fstack-protector-all @gol >> -fstrict-aliasing -fstrict-overflow -fthread-jumps -ftracer @gol >> -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop @gol >> --ftree-copyrename -ftree-dce @gol >> --ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-im @gol >> +-ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol >> +-ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol >> -ftree-phiprop -ftree-loop-distribution @gol >> -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol >> -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol >> @@ -6883,6 +6883,12 @@ profitable to parallelize the loops. >> Compare the results of several data dependence analyzers. This option >> is used for debugging the data dependence analyzers. >> >> +@item -ftree-loop-if-convert >> +Attempt to transform conditional jumps in the innermost loops to >> +branch-less equivalents. The intent is to remove control-flow from >> +the innermost loops in order to improve the ability of the >> +auto-vectorization pass to handle these loops. >> + > > Please state that this is enabled by default if vectorization is enabled. > >> @item -ftree-loop-distribution >> Perform loop distribution. This flag can improve cache performance on >> big loop bodies and allow further loop optimizations, like >> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c >> index 8d5d226..873cd89 100644 >> --- a/gcc/tree-if-conv.c >> +++ b/gcc/tree-if-conv.c >> @@ -1242,7 +1242,11 @@ main_tree_if_conversion (void) >> static bool >> gate_tree_if_conversion (void) >> { >> - return flag_tree_vectorize != 0; >> + if (flag_tree_vectorize >> + && flag_tree_loop_if_convert < 0) >> + flag_tree_loop_if_convert = 1; > > Err, no. This should be > > return ((flag_tree_vectorize && flag_tree_loop_if_convert != 0) > || flag_tree_loop_if_convert == 1); > > not set flag_tree_loop_if_convert here. > > But on a 2nd thought please follow what -ftree-cselim does, do > Init(2) (ISTR -1 is now problematic for some reason), and in > process_options () set flag_tree_loop_if_convert if it is > equal to AUTODETECT_VALUE (2) to the setting of flag_tree_vectorize. > > The gate function then simply can return flag_tree_loop_if_convert. > > Ok with that change. > Committed r161963.
diff --git a/gcc/common.opt b/gcc/common.opt index 6ca787a..111d7b7 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -653,6 +653,10 @@ fif-conversion2 Common Report Var(flag_if_conversion2) Optimization Perform conversion of conditional jumps to conditional execution +ftree-loop-if-convert +Common Report Var(flag_tree_loop_if_convert) Init(-1) Optimization +Convert conditional jumps in innermost loops to branchless equivalents + ; -finhibit-size-directive inhibits output of .size for ELF. ; This is used only for compiling crtstuff.c, ; and it may be extended to other effects diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index d70f130..0847e01 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -342,7 +342,7 @@ Objective-C and Objective-C++ Dialects}. -fearly-inlining -fipa-sra -fexpensive-optimizations -ffast-math @gol -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol -fforward-propagate -ffunction-sections @gol --fgcse -fgcse-after-reload -fgcse-las -fgcse-lm @gol +-fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol -fgcse-sm -fif-conversion -fif-conversion2 -findirect-inlining @gol -finline-functions -finline-functions-called-once -finline-limit=@var{n} @gol -finline-small-functions -fipa-cp -fipa-cp-clone -fipa-matrix-reorg -fipa-pta @gol @@ -352,7 +352,7 @@ Objective-C and Objective-C++ Dialects}. -fira-loop-pressure -fno-ira-share-save-slots @gol -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol -fivopts -fkeep-inline-functions -fkeep-static-consts @gol --floop-block -floop-interchange -floop-strip-mine -fgraphite-identity @gol +-floop-block -floop-interchange -floop-strip-mine @gol -floop-parallelize-all -flto -flto-compression-level -flto-report -fltrans @gol -fltrans-output-list -fmerge-all-constants -fmerge-constants -fmodulo-sched @gol -fmodulo-sched-allow-regmoves -fmove-loop-invariants -fmudflap @gol @@ -382,8 +382,8 @@ Objective-C and Objective-C++ Dialects}. -fsplit-wide-types -fstack-protector -fstack-protector-all @gol -fstrict-aliasing -fstrict-overflow -fthread-jumps -ftracer @gol -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop @gol --ftree-copyrename -ftree-dce @gol --ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-im @gol +-ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol +-ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol -ftree-phiprop -ftree-loop-distribution @gol -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol @@ -6883,6 +6883,12 @@ profitable to parallelize the loops. Compare the results of several data dependence analyzers. This option is used for debugging the data dependence analyzers. +@item -ftree-loop-if-convert +Attempt to transform conditional jumps in the innermost loops to +branch-less equivalents. The intent is to remove control-flow from +the innermost loops in order to improve the ability of the +auto-vectorization pass to handle these loops. + @item -ftree-loop-distribution Perform loop distribution. This flag can improve cache performance on big loop bodies and allow further loop optimizations, like diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index 8d5d226..873cd89 100644 --- a/gcc/tree-if-conv.c +++ b/gcc/tree-if-conv.c @@ -1242,7 +1242,11 @@ main_tree_if_conversion (void) static bool gate_tree_if_conversion (void) { - return flag_tree_vectorize != 0; + if (flag_tree_vectorize + && flag_tree_loop_if_convert < 0) + flag_tree_loop_if_convert = 1; + + return flag_tree_loop_if_convert > 0; } struct gimple_opt_pass pass_if_conversion =