Message ID | CAFULd4Zd0=NwVWZwOUvsD9AWWsGjEzjXsRezTL-Pe-_MDvM46w@mail.gmail.com |
---|---|
State | New |
Headers | show |
Hi, On Thu, 20 Oct 2011, Uros Bizjak wrote: > This patch builds on recent patch by Michael (that implemented > fine-grained control on -mrecip option) and with -ffast-math emits > reciprocal sequences with additional NR step for vectorized SFmode > division and vectorized sqrtf(x). FWIW, I didn't yet come to do the same for cpu2006, but here are the two results of polyhedron (sandybridge, with baseflags "-Ofast -funroll-loops -fpeel-loops -march=corei7-avx -mveclibabi=svml -flto -fwhole-program", i.e. without increasing the inline limits, and linking against libimf and libsvml). With the above flags: Benchmark Compile Executable Ave Run Number Estim Name (secs) (bytes) (secs) Repeats Err % --------- ------- ---------- ------- ------- ------ ac 4.68 4086864 6.16 2 0.0211 aermod 68.22 5603956 13.40 5 0.1864 air 10.46 4961134 3.78 5 0.2888 capacita 3.74 4213850 19.24 3 0.0998 channel 1.44 4808524 1.22 5 0.2898 doduc 12.64 4288238 19.91 5 0.1128 fatigue 4.47 4217301 3.71 5 0.0989 gas_dyn 6.92 4211997 3.43 5 2.8640 induct 7.44 4385543 10.33 5 0.2719 linpk 1.28 4053798 5.88 2 0.0647 mdbx 3.97 4114107 7.63 5 0.1365 nf 4.89 4147809 7.90 2 0.0380 protein 15.07 5049415 20.70 5 0.7615 rnflow 11.89 4260434 16.05 5 0.1359 test_fpu 8.11 4207868 3.69 5 0.6687 tfft 0.99 4110713 0.84 5 0.3024 Geometric Mean Execution Time = 6.35 seconds With the above flags plus "-mrecip=vec-sqrt,vec-div": Benchmark Compile Executable Ave Run Number Estim Name (secs) (bytes) (secs) Repeats Err % --------- ------- ---------- ------- ------- ------ ac 3.85 4086864 6.17 2 0.0227 aermod 68.31 5603956 13.38 2 0.0019 air 10.92 4961134 3.77 5 0.1367 capacita 3.71 4213850 18.68 2 0.0391 channel 1.41 4808524 1.22 5 0.3327 doduc 12.66 4288238 19.93 5 0.2391 fatigue 4.36 4217301 3.70 2 0.0567 gas_dyn 6.91 4211997 2.31 2 0.0867 induct 7.46 4385543 10.31 5 0.1201 linpk 1.70 4053798 5.88 2 0.0383 mdbx 3.98 4114107 7.68 5 0.4000 nf 4.89 4147809 7.89 2 0.0348 protein 14.00 5049415 20.51 2 0.0478 rnflow 11.89 4260434 16.05 4 0.0837 test_fpu 8.09 4207868 3.71 5 0.7097 tfft 1.13 4110713 0.83 5 0.2290 Geometric Mean Execution Time = 6.18 seconds I.e. gas_dyn improves quite a bit (as expected), and the rest still works. I know that cpu2006 also works, but as said have no recent measurements for that, which I'm going to take now. Ciao, Michael.
On Thu, 20 Oct 2011, Uros Bizjak wrote: > The patch was tested on x86_64-pc-linux-gnu, but I would like Joseph > to check if I didn't mess something with options handling. I have no comments on the option handling in this patch. > +for vectorized single float division and vectorized sqrtf(x) already with @code{sqrtf (@var{x})}
Index: config/i386/i386.h =================================================================== --- config/i386/i386.h (revision 180176) +++ config/i386/i386.h (working copy) @@ -2322,6 +2322,7 @@ #define RECIP_MASK_VEC_SQRT 0x08 #define RECIP_MASK_ALL (RECIP_MASK_DIV | RECIP_MASK_SQRT \ | RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT) +#define RECIP_MASK_DEFAULT (RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT) #define TARGET_RECIP_DIV ((recip_mask & RECIP_MASK_DIV) != 0) #define TARGET_RECIP_SQRT ((recip_mask & RECIP_MASK_SQRT) != 0) Index: config/i386/i386.opt =================================================================== --- config/i386/i386.opt (revision 180176) +++ config/i386/i386.opt (working copy) @@ -32,7 +32,7 @@ HOST_WIDE_INT ix86_isa_flags_explicit TargetVariable -int recip_mask +int recip_mask = RECIP_MASK_DEFAULT Variable int recip_mask_explicit Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 180176) +++ doc/invoke.texi (working copy) @@ -12927,6 +12927,11 @@ already with @option{-ffast-math} (or the above option combination), and doesn't need @option{-mrecip}. +Also note that GCC emits the above sequence with additional Newton-Raphson step +for vectorized single float division and vectorized sqrtf(x) already with +@option{-ffast-math} (or the above option combination), and doesn't need +@option{-mrecip}. + @item -mrecip=@var{opt} @opindex mrecip=opt This option allows to control which reciprocal estimate instructions