Message ID | D4C76825A6780047854A11E93CDE84D02F7750@SAUSEXMBP01.amd.com |
---|---|
State | New |
Headers | show |
On Thu, 24 Jun 2010, Fang, Changpeng wrote: > Hi, > > Attached is the version of the patch that turns prefetching on at -O3 for AMD cpus > only. As discussed elsewhere in this thread, we use tri-state for -fprefetch-loop-arrays. > If this flag is not explicitly set, (for -O3) we turn it on in gcc/config/i386/i386.c > (override_options). > > Is this OK to commit now? Ok. Thanks, Richard. > > Thanks, > > Changpeng > > ________________________________________ > From: Mark Mitchell [mark@codesourcery.com] > Sent: Saturday, June 19, 2010 3:05 PM > To: Christian Borntraeger > Cc: gcc-patches@gcc.gnu.org; H.J. Lu; Fang, Changpeng; rguenther@suse.de; sebpop@gmail.com; Zdenek Dvorak; Maxim Kuvyrkov > Subject: Re: [PATCH] Enabling Software Prefetching by Default at -O3 > > Christian Borntraeger wrote: > > > It also might be worth to investigate if overriding the parameters per > > -mtune=XXX results in an overall win for -fprefetch-loop-arrays. We did > > that on s390 since the default values were not ideal > > Yes, that might be a good idea for i7. > > But, in the meantime, I think we should get a version of the patch that > turns on prefetching on AMD CPUs with -O3. There's no reason to demand > consistency for all CPUs and it clearly benefits the AMD CPUs. > Changpeng, would you please submit a patch that activates this > optimization only with tuning for AMD CPUs? > > Thanks, > > -- > Mark Mitchell > CodeSourcery > mark@codesourcery.com > (650) 331-3385 x713 > >
On Fri, Jun 25, 2010 at 04:37, Richard Guenther <rguenther@suse.de> wrote: > On Thu, 24 Jun 2010, Fang, Changpeng wrote: > >> Hi, >> >> Attached is the version of the patch that turns prefetching on at -O3 for AMD cpus >> only. As discussed elsewhere in this thread, we use tri-state for -fprefetch-loop-arrays. >> If this flag is not explicitly set, (for -O3) we turn it on in gcc/config/i386/i386.c >> (override_options). >> >> Is this OK to commit now? > > Ok. > Committed r161391.
On Fri, Jun 25, 2010 at 8:25 PM, Sebastian Pop <sebpop@gmail.com> wrote: > On Fri, Jun 25, 2010 at 04:37, Richard Guenther <rguenther@suse.de> wrote: >> On Thu, 24 Jun 2010, Fang, Changpeng wrote: >> >>> Hi, >>> >>> Attached is the version of the patch that turns prefetching on at -O3 for AMD cpus >>> only. As discussed elsewhere in this thread, we use tri-state for -fprefetch-loop-arrays. >>> If this flag is not explicitly set, (for -O3) we turn it on in gcc/config/i386/i386.c >>> (override_options). >>> >>> Is this OK to commit now? >> >> Ok. >> > > Committed r161391. This may have regressed scimark sparse matmult by 20% on AMD Fam8 at -O3 -ffast-math -funroll-loops -march=native. Richard.
On Sat, Jun 26, 2010 at 4:21 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Fri, Jun 25, 2010 at 8:25 PM, Sebastian Pop <sebpop@gmail.com> wrote: >> On Fri, Jun 25, 2010 at 04:37, Richard Guenther <rguenther@suse.de> wrote: >>> On Thu, 24 Jun 2010, Fang, Changpeng wrote: >>> >>>> Hi, >>>> >>>> Attached is the version of the patch that turns prefetching on at -O3 for AMD cpus >>>> only. As discussed elsewhere in this thread, we use tri-state for -fprefetch-loop-arrays. >>>> If this flag is not explicitly set, (for -O3) we turn it on in gcc/config/i386/i386.c >>>> (override_options). >>>> >>>> Is this OK to commit now? >>> >>> Ok. >>> >> >> Committed r161391. > > This may have regressed scimark sparse matmult by 20% on AMD Fam8 > at -O3 -ffast-math -funroll-loops -march=native. It also looks like prefetching has a very negative impact on SPEC CPU 2000 FP in 32bit mode: http://gcc.opensuse.org/SPEC/CFP/sb-frescobaldi.suse.de-head-64-32o-32bit/recent.html (that's Fam10). Richard.
> On Fri, Jun 25, 2010 at 8:25 PM, Sebastian Pop <sebpop@gmail.com> wrote: > > On Fri, Jun 25, 2010 at 04:37, Richard Guenther <rguenther@suse.de> wrote: > >> On Thu, 24 Jun 2010, Fang, Changpeng wrote: > >> > >>> Hi, > >>> > >>> Attached is the version of the patch that turns prefetching on at -O3 for AMD cpus > >>> only. As discussed elsewhere in this thread, we use tri-state for -fprefetch-loop-arrays. > >>> If this flag is not explicitly set, (for -O3) we turn it on in gcc/config/i386/i386.c > >>> (override_options). > >>> > >>> Is this OK to commit now? > >> > >> Ok. > >> > > > > Committed r161391. > > This may have regressed scimark sparse matmult by 20% on AMD Fam8 > at -O3 -ffast-math -funroll-loops -march=native. ... and also about 2.6% SPECFP regression along with 8% code size growth in 32bit mode http://gcc.opensuse.org/SPEC/CFP/sb-frescobaldi.suse.de-head-64-32o-32bit/recent.html http://gcc.opensuse.org/SPEC/CFP/sb-frescobaldi.suse.de-head-64-32o-32bit/size.html There was other changes today that might've caused the trouble. In particular my partial inlinig, Sebastians's ifcvt, Martin's ipa-cp improvements and Bernd's RA changes. Frescobaldi and Vangelish both test scimark and Frescobaldi does not show the regression yet, tought it picked changes including my ipa-split code. We might conclude that it must be later patch that would leave Sebastian's and Martin's changes. Those two seem however unlikely candidates for such a flat differences everywhere IMO. I tested partial inlining in the form comitted to mainline on Frescobaldi on SPECint2000 and C++ benchmarks couple days ago and it dod not show the regressions, however I did not test 32bit. The change is quite target independent so I would be surprised if it made 32bit code a lot worse and did not affect 64bit. Some changes are also visible at our Itanium testers. Hopefully we can resolve the slowdowns. I also think that the change should be documented in changes.html as well as we should enable prefetching with -fprofile-use (and ensure that prefetcher uses profile info to figure out if loop has resonable trip count for prefetching and is hot) Honza > > Richard.
> > On Fri, Jun 25, 2010 at 8:25 PM, Sebastian Pop <sebpop@gmail.com> wrote: > > > On Fri, Jun 25, 2010 at 04:37, Richard Guenther <rguenther@suse.de> wrote: > > >> On Thu, 24 Jun 2010, Fang, Changpeng wrote: > > >> > > >>> Hi, > > >>> > > >>> Attached is the version of the patch that turns prefetching on at -O3 for AMD cpus > > >>> only. As discussed elsewhere in this thread, we use tri-state for -fprefetch-loop-arrays. > > >>> If this flag is not explicitly set, (for -O3) we turn it on in gcc/config/i386/i386.c > > >>> (override_options). > > >>> > > >>> Is this OK to commit now? > > >> > > >> Ok. > > >> > > > > > > Committed r161391. > > > > This may have regressed scimark sparse matmult by 20% on AMD Fam8 > > at -O3 -ffast-math -funroll-loops -march=native. > ... and also about 2.6% SPECFP regression along with 8% code size growth in 32bit mode > http://gcc.opensuse.org/SPEC/CFP/sb-frescobaldi.suse.de-head-64-32o-32bit/recent.html > http://gcc.opensuse.org/SPEC/CFP/sb-frescobaldi.suse.de-head-64-32o-32bit/size.html I think the reason why we see regression in 32bit run only is that 32bit run use -march=native while 64bit run uses default arch. So it indeed looks like prefetching change. Honza
From 7f48bc625b0e451dd8c05a3a3cc20f68dcaa695c Mon Sep 17 00:00:00 2001 From: Changpeng Fang <chfang@pathscale.(none)> Date: Wed, 23 Jun 2010 17:05:59 -0700 Subject: [PATCH 3/3] Enable prefetching at -O3 for AMD cpus * gcc/common.opt (fprefetch-loop-arrays): Re-define -fprefetch-loop-arrays as a tri-state option with the initial value of -1. * gcc/tree-ssa-loop.c (gate_tree_ssa_loop_prefetch): Invoke prefetch pass only when flag_prefetch_loop_arrays > 0. * gcc/toplev.c (process_options): Note that, with tri-states, flag_prefetch_loop_arrays>0 means prefetching is enabled. * gcc/config/i386/i386.c (override_options): Enable prefetching at -O3 for a set of CPUs that sw prefetching is helpful. (software_prefetching_beneficial_p): New. Return TRUE if software prefetching is beneficial for the given CPU. --- gcc/common.opt | 2 +- gcc/config/i386/i386.c | 27 +++++++++++++++++++++++++++ gcc/toplev.c | 6 +++--- gcc/tree-ssa-loop.c | 2 +- 4 files changed, 32 insertions(+), 5 deletions(-) diff --git a/gcc/common.opt b/gcc/common.opt index 4904481..74fbd1d 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -937,7 +937,7 @@ Common Report Var(flag_predictive_commoning) Optimization Run predictive commoning optimization. fprefetch-loop-arrays -Common Report Var(flag_prefetch_loop_arrays) Optimization +Common Report Var(flag_prefetch_loop_arrays) Init(-1) Optimization Generate prefetch instructions, if available, for arrays in loops fprofile diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2a46f89..605e57b 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2691,6 +2691,26 @@ ix86_target_string (int isa, int flags, const char *arch, const char *tune, return ret; } +/* Return TRUE if software prefetching is beneficial for the + given CPU. */ + +static bool +software_prefetching_beneficial_p (void) +{ + switch (ix86_tune) + { + case PROCESSOR_GEODE: + case PROCESSOR_K6: + case PROCESSOR_ATHLON: + case PROCESSOR_K8: + case PROCESSOR_AMDFAM10: + return true; + + default: + return false; + } +} + /* Function that is callable from the debugger to print the current options. */ void @@ -3531,6 +3551,13 @@ override_options (bool main_args_p) if (!PARAM_SET_P (PARAM_L2_CACHE_SIZE)) set_param_value ("l2-cache-size", ix86_cost->l2_cache_size); + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ + if (flag_prefetch_loop_arrays < 0 + && HAVE_prefetch + && optimize >= 3 + && software_prefetching_beneficial_p()) + flag_prefetch_loop_arrays = 1; + /* If using typedef char *va_list, signal that __builtin_va_start (&ap, 0) can be optimized to ap = __builtin_next_arg (0). */ if (!TARGET_64BIT) diff --git a/gcc/toplev.c b/gcc/toplev.c index ff4c850..369820b 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -2078,13 +2078,13 @@ process_options (void) } #ifndef HAVE_prefetch - if (flag_prefetch_loop_arrays) + if (flag_prefetch_loop_arrays > 0) { warning (0, "-fprefetch-loop-arrays not supported for this target"); flag_prefetch_loop_arrays = 0; } #else - if (flag_prefetch_loop_arrays && !HAVE_prefetch) + if (flag_prefetch_loop_arrays > 0 && !HAVE_prefetch) { warning (0, "-fprefetch-loop-arrays not supported for this target (try -march switches)"); flag_prefetch_loop_arrays = 0; @@ -2093,7 +2093,7 @@ process_options (void) /* This combination of options isn't handled for i386 targets and doesn't make much sense anyway, so don't allow it. */ - if (flag_prefetch_loop_arrays && optimize_size) + if (flag_prefetch_loop_arrays > 0 && optimize_size) { warning (0, "-fprefetch-loop-arrays is not supported with -Os"); flag_prefetch_loop_arrays = 0; diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c index 344cfa8..c9c5bbd 100644 --- a/gcc/tree-ssa-loop.c +++ b/gcc/tree-ssa-loop.c @@ -600,7 +600,7 @@ tree_ssa_loop_prefetch (void) static bool gate_tree_ssa_loop_prefetch (void) { - return flag_prefetch_loop_arrays != 0; + return flag_prefetch_loop_arrays > 0; } struct gimple_opt_pass pass_loop_prefetch = -- 1.6.3.3