Message ID | D4C76825A6780047854A11E93CDE84D004C17683E8@SAUSEXMBP01.amd.com |
---|---|
State | New |
Headers | show |
On Mon, Dec 6, 2010 at 12:04 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote: > Hi, > > Attached is the patch that proposes to turn on -fschedule-insns and -fsched-pressure by default at > -O2 or higher (for speed-runs only) for x86-64 systems. Enablement of these two flags could improve > the performance of floating point programs (For CFP2006, 2.29% under -O2, and 1.34% under -O3). > There is no apparent performance impact on integer programs (as expected). > > We have investigated this optimization since 4.5, and it is the time to turn on these two flags to enjoy > the performance gains. > > The patch passed bootstrappings under -O2 and -O3 on x86_64-unknown-linux-gnu, and passed > the gcc regression tests. > > Is it OK to commit the patch? "-fschedule-insns -fsched-pressure" isn't well tested: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46829 Have you run "-fschedule-insns -fsched-pressure" on gcc testsuite?
> "-fschedule-insns -fsched-pressure" isn't well tested: We have tested the gcc testsuite and SPEC cpu benchmarks. > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46829 Bug 46829 does not appear in 4.6 trunk, and the patched compiler does not not has this bug either. >Have you run "-fschedule-insns -fsched-pressure" on gcc testsuite? Yes, gcc testsuite testing passes for both -O2 and -O3 bootstrapped compilers. Thanks, Changpeng >-- >H.J.
On Mon, Dec 6, 2010 at 1:37 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote: > >> "-fschedule-insns -fsched-pressure" isn't well tested: > We have tested the gcc testsuite and SPEC cpu benchmarks. > > >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46829 > Bug 46829 does not appear in 4.6 trunk, and the patched compiler does not > not has this bug either. Which revision are are you using? I verified it with trunk revision 167517.
Sorry, the bug exists in gcc 4.6 trunk. Thanks, Changpeng
On Mon, Dec 06, 2010 at 02:04:14PM -0600, Fang, Changpeng wrote: > Hi, > > Attached is the patch that proposes to turn on -fschedule-insns and -fsched-pressure by default at > -O2 or higher (for speed-runs only) for x86-64 systems. Enablement of these two flags could improve > the performance of floating point programs (For CFP2006, 2.29% under -O2, and 1.34% under -O3). > There is no apparent performance impact on integer programs (as expected). > > We have investigated this optimization since 4.5, and it is the time to turn on these two flags to enjoy > the performance gains. > > The patch passed bootstrappings under -O2 and -O3 on x86_64-unknown-linux-gnu, and passed > the gcc regression tests. > > Is it OK to commit the patch? > > Thanks, > > Changpeng According to the Polyhedron 2005 benchmarks, -fschedule-insns and -fsched-pressure result in a net performance loss (at least for -mtune=core2) at both -m32 and -m64 on x86_64-apple-darwin10. The induct benchmark in particular suffers a severe regression with those options. x86_64-apple-darwin10 gfortran -m64 -mtune=core2 -ffast-math -funroll-loops -O3 %n.f90 -o %n benchmark stock -fschedule-insns %change -fsched-pressure ac 8.81 8.75 -0.7 aermod 17.21 17.21 0.0 air 5.53 5.53 0.0 capacita 32.39 32.89 1.5 channel 1.84 1.84 0.0 doduc 26.40 26.55 0.6 fatigue 8.14 8.22 1.0 gas_dyn 4.31 4.29 -0.5 induct 12.85 15.89 19.1 linpk 15.47 15.47 0.0 mdbx 11.21 11.19 -0.2 nf 29.98 29.96 -0.1 protein 34.03 33.44 -1.8 rnflow 23.13 22.91 -1.0 test_fpu 8.03 8.11 1.0 tfft 1.87 1.87 0.0 Geometric Mean 10.84 10.99 1.4 gfortran -m32 -mtune=core2 -ffast-math -funroll-loops -O3 %n.f90 -o %n benchmark stock -fschedule-insns %change -fsched-pressure ac 10.76 11.14 3.4 aermod 19.55 19.25 -1.6 air 6.08 6.54 7.0 capacita 46.13 44.82 -2.9 channel 1.98 1.98 0.0 doduc 31.23 31.27 0.1 fatigue 9.83 10.35 5.0 gas_dyn 4.71 4.74 0.6 induct 13.93 18.74 25.7 linpk 15.49 15.54 0.3 mdbx 11.28 11.51 2.0 nf 27.51 27.53 0.1 protein 38.55 38.90 0.9 rnflow 24.69 24.55 -0.6 test_fpu 10.12 10.88 7.0 tfft 1.92 1.95 1.5 Geometric Mean 12.09 12.50 3.3 Content-Description: 0001-Turn-on-fschedule-insns-and-fsched-pressure-by-defau.patch > From 60859804422c1e0093769eaa47b4cc2d95657f86 Mon Sep 17 00:00:00 2001 > From: Changpeng Fang <chfang@houghton.(none)> > Date: Mon, 6 Dec 2010 11:03:07 -0800 > Subject: [PATCH] Turn on -fschedule-insns and -fsched-pressure by default at -O2 or higher for x86-64. > > * common.opt (fschedule-insns): Add Init(0). > > * opts.c (default_options_table): Set default of -fschedule-insns to > 2 to differentiate it with the user setting of 1 (if any). > > * config/i386/i386.c (ix86_option_optimization_table): Turn on > -fsched-pressure by default at -O2 or higher (for speed run only). > (ix86_option_override_internal): Turn off -fschedule-insns by default > for 32-bit compilation. Also, turn off -fsched-pressure by default if > -fschedule-insns was not turned on. > --- > gcc/common.opt | 2 +- > gcc/config/i386/i386.c | 20 +++++++++++++++++--- > gcc/opts.c | 2 +- > 3 files changed, 19 insertions(+), 5 deletions(-) > > diff --git a/gcc/common.opt b/gcc/common.opt > index 57f5b0a..72be118 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -1570,7 +1570,7 @@ Common Ignore > Does nothing. Preserved for backward compatibility. > > fschedule-insns > -Common Report Var(flag_schedule_insns) Optimization > +Common Report Var(flag_schedule_insns) Init(0) Optimization > Reschedule instructions before register allocation > > fschedule-insns2 > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index a66a0c4..226e4fc 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -4079,6 +4079,19 @@ ix86_option_override_internal (bool main_args_p) > if (!TARGET_SCHEDULE) > flag_schedule_insns_after_reload = flag_schedule_insns = 0; > > +#ifdef INSN_SCHEDULING > + /* For 32-bit compilation, turn off -fschedule-insns by default to hide > + the problem with not enough registers. */ > + if (!TARGET_64BIT) > + flag_schedule_insns = (flag_schedule_insns == 1); > + > + /* If -fschedule-insns was not turned on, the default of -fsched-pressure > + should be off. On the other hand, if user turned on -fschedule-insns > + for 32-bit compilation, the default of -fsched-presssure is kept on. */ > + if (!flag_schedule_insns) > + flag_sched_pressure = (flag_sched_pressure == 1); > +#endif > + > maybe_set_param_value (PARAM_SIMULTANEOUS_PREFETCHES, > ix86_cost->simultaneous_prefetches, > global_options.x_param_values, > @@ -5031,10 +5044,11 @@ x86_output_aligned_bss (FILE *file, tree decl ATTRIBUTE_UNUSED, > > static const struct default_options ix86_option_optimization_table[] = > { > - /* Turn off -fschedule-insns by default. It tends to make the > - problem with not enough registers even worse. */ > + /* Turn on -fsched-pressure by default at -O2 or higher. This could > + alleviate the problem with not enough registers caused by turning > + on -fschedule-pressure (in opts.c). */ > #ifdef INSN_SCHEDULING > - { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 }, > + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fsched_pressure, NULL, 2 }, > #endif > > #ifdef SUBTARGET_OPTIMIZATION_OPTIONS > diff --git a/gcc/opts.c b/gcc/opts.c > index cd41c2a..7700f0a 100644 > --- a/gcc/opts.c > +++ b/gcc/opts.c > @@ -483,7 +483,7 @@ static const struct default_options default_options_table[] = > { OPT_LEVELS_2_PLUS, OPT_fpeephole2, NULL, 1 }, > #ifdef INSN_SCHEDULING > /* Only run the pre-regalloc scheduling pass if optimizing for speed. */ > - { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fschedule_insns, NULL, 1 }, > + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fschedule_insns, NULL, 2 }, > { OPT_LEVELS_2_PLUS, OPT_fschedule_insns2, NULL, 1 }, > #endif > { OPT_LEVELS_2_PLUS, OPT_fregmove, NULL, 1 }, > -- > 1.6.3.3 > Content-Description: schedule-improvement.txt > Percentage improvement by turning on -fschedule-insns and > -fsched-pressure on an x86-64 system > > -O2 -O3 > ============================ > 410.bwaves 1.63 2.45 > 416.gamess 1.79 2.38 > 433.milc 1.39 2.52 > 434.zeusmp 1.79 0.8 > 435.gromacs 3.48 4.02 > 436.cactusADM 0.49 11.26 > 437.leslie3d 0 1.02 > 444.namd 3.73 3.6 > 447.dealII 1.09 -0.5 > 450.soplex 0 1.26 > 453.povray 0 0 > 454.calculix 23.81 0.24 > 459.GemsFDTD 2.16 -1.97 > 465.tonto 0 -0.75 > 470.lbm 1.49 -3.28 > 481.wrf 0.94 0.69 > 482.sphinx3 0.5 0 > 400.perlbench 1.07 0 > 401.bzip2 -1.48 -2.27 > 403.gcc 0.68 0.67 > 429.mcf 2.01 1.86 > 445.gobmk 1.19 -0.6 > 456.hmmer 3.45 0 > 458.sjeng 1.37 0.64 > 462.libquantum 0 -0.32 > 464.h264ref 0.92 -2.15 > 471.omnetpp 0.86 0.84 > 473.astar -0.97 -0.91 > 483.xalancbmk 1.82 3.55 > ============================= > CFP GEOMEAN 2.29 1.34 > CINT GEOMEAN 0.65 0 >
On 12/06/2010 09:04 PM, Fang, Changpeng wrote: > Hi, > > Attached is the patch that proposes to turn on -fschedule-insns and -fsched-pressure by default at > -O2 or higher (for speed-runs only) for x86-64 systems. Enablement of these two flags could improve > the performance of floating point programs (For CFP2006, 2.29% under -O2, and 1.34% under -O3). > There is no apparent performance impact on integer programs (as expected). > > We have investigated this optimization since 4.5, and it is the time to turn on these two flags to enjoy > the performance gains. I think enabling rename registers whenever schedule_insns2 is enabled would achieve almost the same freedom without risking bugs like 46829. There are just too many singleton register classes in i386 for schedule-insns to work. :( Paolo
On Tue, Dec 07, 2010 at 08:23:22PM +0100, Paolo Bonzini wrote: > On 12/06/2010 09:04 PM, Fang, Changpeng wrote: >> Hi, >> >> Attached is the patch that proposes to turn on -fschedule-insns and -fsched-pressure by default at >> -O2 or higher (for speed-runs only) for x86-64 systems. Enablement of these two flags could improve >> the performance of floating point programs (For CFP2006, 2.29% under -O2, and 1.34% under -O3). >> There is no apparent performance impact on integer programs (as expected). >> >> We have investigated this optimization since 4.5, and it is the time to turn on these two flags to enjoy >> the performance gains. > > I think enabling rename registers whenever schedule_insns2 is enabled > would achieve almost the same freedom without risking bugs like 46829. > There are just too many singleton register classes in i386 for > schedule-insns to work. :( Paolo, I noticed that the manpage claims this option can degrade debugging on some targets. Which debug formats suffer from this problem and wouldn't we want to avoid enabling -frename-registers on those targets for -O2, -O3 and -Os? Jack > > Paolo
On 12/07/2010 08:55 PM, Jack Howarth wrote: > I noticed that the manpage claims this option can degrade debugging on some targets. > Which debug formats suffer from this problem and wouldn't we want to avoid enabling > -frename-registers on those targets for -O2, -O3 and -Os? I think it is false since -fvariable-tracking was committed (3.4 maybe?) Paolo
On Tue, Dec 07, 2010 at 08:23:22PM +0100, Paolo Bonzini wrote: > On 12/06/2010 09:04 PM, Fang, Changpeng wrote: >> Hi, >> >> Attached is the patch that proposes to turn on -fschedule-insns and -fsched-pressure by default at >> -O2 or higher (for speed-runs only) for x86-64 systems. Enablement of these two flags could improve >> the performance of floating point programs (For CFP2006, 2.29% under -O2, and 1.34% under -O3). >> There is no apparent performance impact on integer programs (as expected). >> >> We have investigated this optimization since 4.5, and it is the time to turn on these two flags to enjoy >> the performance gains. > > I think enabling rename registers whenever schedule_insns2 is enabled > would achieve almost the same freedom without risking bugs like 46829. > There are just too many singleton register classes in i386 for > schedule-insns to work. :( > > Paolo Paolo, The improvements in the Polyhedron benchmark from -frename-registers on x86_64-apple-darwin10 appear marginal. gfortran -mtune=core2 -ffast-math -funroll-loops -O3 %n.f90 -o %n benchmark stock -frename-registers %change ac 8.81 8.80 -0.1 aermod 17.21 17.21 0.0 air 5.53 5.56 0.5 capacita 32.39 32.37 -0.1 channel 1.84 1.84 0.0 doduc 26.40 26.39 0.0 fatigue 8.14 8.11 -0.4 gas_dyn 4.31 4.32 0.2 induct 12.85 12.84 -0.1 linpk 15.47 15.48 0.1 mdbx 11.21 11.22 0.1 nf 29.98 29.94 -0.1 protein 34.03 32.04 -6.2 rnflow 23.13 23.12 -0.0 test_fpu 8.03 8.03 0.0 tfft 1.87 1.86 -0.5 Geometric Mean 10.84 10.80 -0.4 Jack
> On 12/07/2010 08:55 PM, Jack Howarth wrote: >> I noticed that the manpage claims this option can degrade debugging on some targets. >> Which debug formats suffer from this problem and wouldn't we want to avoid enabling >> -frename-registers on those targets for -O2, -O3 and -Os? > > I think it is false since -fvariable-tracking was committed (3.4 maybe?) Yes, variable-tracking was introduced to solve this problem. The manpage probably reffers to non-dwarf2 formats (only dwarf2out use location notes) that are much less of concern now than back then. > > Paolo
Hi, note that -frename-registers was enabled by default for a while and then disabled again. Honza
On 12/11/2010 02:41 PM, Jan Hubicka wrote: > Hi, > note that -frename-registers was enabled by default for a while and then disabled again. It was enabled by me in fact :) but a lot of changes have been made in the meanwhile. Paolo
From 60859804422c1e0093769eaa47b4cc2d95657f86 Mon Sep 17 00:00:00 2001 From: Changpeng Fang <chfang@houghton.(none)> Date: Mon, 6 Dec 2010 11:03:07 -0800 Subject: [PATCH] Turn on -fschedule-insns and -fsched-pressure by default at -O2 or higher for x86-64. * common.opt (fschedule-insns): Add Init(0). * opts.c (default_options_table): Set default of -fschedule-insns to 2 to differentiate it with the user setting of 1 (if any). * config/i386/i386.c (ix86_option_optimization_table): Turn on -fsched-pressure by default at -O2 or higher (for speed run only). (ix86_option_override_internal): Turn off -fschedule-insns by default for 32-bit compilation. Also, turn off -fsched-pressure by default if -fschedule-insns was not turned on. --- gcc/common.opt | 2 +- gcc/config/i386/i386.c | 20 +++++++++++++++++--- gcc/opts.c | 2 +- 3 files changed, 19 insertions(+), 5 deletions(-) diff --git a/gcc/common.opt b/gcc/common.opt index 57f5b0a..72be118 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1570,7 +1570,7 @@ Common Ignore Does nothing. Preserved for backward compatibility. fschedule-insns -Common Report Var(flag_schedule_insns) Optimization +Common Report Var(flag_schedule_insns) Init(0) Optimization Reschedule instructions before register allocation fschedule-insns2 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index a66a0c4..226e4fc 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4079,6 +4079,19 @@ ix86_option_override_internal (bool main_args_p) if (!TARGET_SCHEDULE) flag_schedule_insns_after_reload = flag_schedule_insns = 0; +#ifdef INSN_SCHEDULING + /* For 32-bit compilation, turn off -fschedule-insns by default to hide + the problem with not enough registers. */ + if (!TARGET_64BIT) + flag_schedule_insns = (flag_schedule_insns == 1); + + /* If -fschedule-insns was not turned on, the default of -fsched-pressure + should be off. On the other hand, if user turned on -fschedule-insns + for 32-bit compilation, the default of -fsched-presssure is kept on. */ + if (!flag_schedule_insns) + flag_sched_pressure = (flag_sched_pressure == 1); +#endif + maybe_set_param_value (PARAM_SIMULTANEOUS_PREFETCHES, ix86_cost->simultaneous_prefetches, global_options.x_param_values, @@ -5031,10 +5044,11 @@ x86_output_aligned_bss (FILE *file, tree decl ATTRIBUTE_UNUSED, static const struct default_options ix86_option_optimization_table[] = { - /* Turn off -fschedule-insns by default. It tends to make the - problem with not enough registers even worse. */ + /* Turn on -fsched-pressure by default at -O2 or higher. This could + alleviate the problem with not enough registers caused by turning + on -fschedule-pressure (in opts.c). */ #ifdef INSN_SCHEDULING - { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 }, + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fsched_pressure, NULL, 2 }, #endif #ifdef SUBTARGET_OPTIMIZATION_OPTIONS diff --git a/gcc/opts.c b/gcc/opts.c index cd41c2a..7700f0a 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -483,7 +483,7 @@ static const struct default_options default_options_table[] = { OPT_LEVELS_2_PLUS, OPT_fpeephole2, NULL, 1 }, #ifdef INSN_SCHEDULING /* Only run the pre-regalloc scheduling pass if optimizing for speed. */ - { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fschedule_insns, NULL, 1 }, + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fschedule_insns, NULL, 2 }, { OPT_LEVELS_2_PLUS, OPT_fschedule_insns2, NULL, 1 }, #endif { OPT_LEVELS_2_PLUS, OPT_fregmove, NULL, 1 }, -- 1.6.3.3