Message ID | 627BFF31-95EE-4FAD-9E70-B5EA21BCFB70@sandoe-acoustics.co.uk |
---|---|
State | New |
Headers | show |
On 08/13/2010 09:39 AM, IainS wrote:
> OK for trunk and 4.5?
Ok, with appropriate changelog.
r~
On Fri, Aug 13, 2010 at 9:39 AM, IainS <developer@sandoe-acoustics.co.uk> wrote: > Hi, > > This brings the default arch on darwin to core2 - as per the OSX 4.2.1 > system compiler. > (which makes our codegen much neater and takes a big chunk off the size of > cc1*). > > --- > > In order to do that it was necessary to get -mtune=core2 to work .. > ... this had a couple of residual cases where the use of movq is > incompatible with the darwin assembler. Do you know -mtune=core generates slower code than -mtune=generic? > The use of movd for r<->x and r<->Yi seems to be consistent with the > decisions made elsewhere in PRs (i.e. I am not disturbing the status quo, > merely making it consistent in a couple of missed places.) You should post a separate patch for this. Also there is no ChangeLog. > I have bootstrapped this on x86_64 (Core 2 Duo), i686-darwin8 (Xeon) and > i686-darwin8 (Core Duo). > Note that the Darwin8 default for i686 from apple is nocona/gcc-4.0.1, but > the use of core2 here does not appear to create any problems > (the processor cannot execute m64 code anyway). > > OK for trunk and 4.5? > Iain
Hi HJ, On 13 Aug 2010, at 18:22, H.J. Lu wrote: > On Fri, Aug 13, 2010 at 9:39 AM, IainS <developer@sandoe-acoustics.co.uk > > wrote: >> In order to do that it was necessary to get -mtune=core2 to work .. >> ... this had a couple of residual cases where the use of movq is >> incompatible with the darwin assembler. > > Do you know -mtune=core generates slower code than -mtune=generic? I have not benchmarked - unfortunately, we volunteers do not have access to SPEC &c. ... but I observe that the code is circa 7% smaller with -mtune=core2 c.f. generic. ... and we replace all the _pc_thunk calls with a local call and pop - which makes quite an impact on our asm. If you can provide me with a realistic way to perform some suitable benchmarks, I will happily do so. .. otherwise, for the platform, I'm simply making our default the same as the vendor's. It does not, of course, prevent someone from bootstrapping with --with- cpu=generic if they wish to. thanks Iain
On Fri, Aug 13, 2010 at 06:36:12PM +0100, IainS wrote: > Hi HJ, > > On 13 Aug 2010, at 18:22, H.J. Lu wrote: > >> On Fri, Aug 13, 2010 at 9:39 AM, IainS >> <developer@sandoe-acoustics.co.uk> wrote: >>> In order to do that it was necessary to get -mtune=core2 to work .. >>> ... this had a couple of residual cases where the use of movq is >>> incompatible with the darwin assembler. >> >> Do you know -mtune=core generates slower code than -mtune=generic? > > I have not benchmarked - unfortunately, we volunteers do not have access > to SPEC &c. Iain, I can repeat the benchmarks this weekend but the last time I looked at the performance of the Polyhedron 2005 benchmarks on x86_64-apple-darwin10, I found -mtune=core2 was slower than -mtune=generic. http://gcc.gnu.org/ml/gcc-patches/2010-02/msg01272.html There were some earlier messages suggesting that improved cost models would be under development this summer. http://gcc.gnu.org/ml/gcc/2010-05/msg00279.html http://gcc.gnu.org/ml/gcc/2010-05/msg00370.html http://gcc.gnu.org/ml/gcc/2010-05/msg00427.html If these do materialize and make it into gcc 4.6, it would make sense to revisit this issue. Jack > > ... but I observe that the code is circa 7% smaller with -mtune=core2 > c.f. generic. > ... and we replace all the _pc_thunk calls with a local call and pop - > which makes quite an impact on our asm. > > If you can provide me with a realistic way to perform some suitable > benchmarks, I will happily do so. > .. otherwise, for the platform, I'm simply making our default the same > as the vendor's. > > It does not, of course, prevent someone from bootstrapping with --with- > cpu=generic if they wish to. > > thanks > Iain
On 13 Aug 2010, at 20:25, Jack Howarth wrote: > On Fri, Aug 13, 2010 at 06:36:12PM +0100, IainS wrote: >> Hi HJ, >> >> On 13 Aug 2010, at 18:22, H.J. Lu wrote: >> >>> On Fri, Aug 13, 2010 at 9:39 AM, IainS >>> <developer@sandoe-acoustics.co.uk> wrote: >>>> In order to do that it was necessary to get -mtune=core2 to work .. >>>> ... this had a couple of residual cases where the use of movq is >>>> incompatible with the darwin assembler. >>> >>> Do you know -mtune=core generates slower code than -mtune=generic? >> >> I have not benchmarked - unfortunately, we volunteers do not have >> access >> to SPEC &c. > I can repeat the benchmarks this weekend but the last time I > looked > at the performance of the Polyhedron 2005 benchmarks on x86_64-apple- > darwin10, > I found -mtune=core2 was slower than -mtune=generic. Do you have any reasonable body of c/c++ benchmarks? I have the Polyhedron fortran ones. Odd that the vendor would choose that default if it's genuinely less performance... I guess I could look and see if there are changed tuning params... ... anyway I'm not going to nail colors to the mast over this one - it's easily changed. ... remember the old BYTE article "Lies, Damned Lies & Benchmarks" ? cheers, Iain
Index: gcc/config/i386/mmx.md =================================================================== --- gcc/config/i386/mmx.md (revision 163221) +++ gcc/config/i386/mmx.md (working copy) @@ -81,8 +81,8 @@ %vpxor\t%0, %d0 %vmovq\t{%1, %0|%0, %1} %vmovq\t{%1, %0|%0, %1} - %vmovq\t{%1, %0|%0, %1} - %vmovq\t{%1, %0|%0, %1}" + %vmovd\t{%1, %0|%0, %1} + %vmovd\t{%1, %0|%0, %1}" [(set_attr "type" "imov ,imov ,mmx,mmxmov,mmxmov,ssecvt,ssecvt,sselog1,ssemov,ssemov,ssemov,ssemov") (set_attr "unit" "*,*,*,*,*,mmx,mmx,*,*,*,*,*") (set_attr "prefix_rep" "*,*,*,*,*,1,1,*,1,*,*,*") Index: gcc/config/i386/sse.md =================================================================== --- gcc/config/i386/sse.md (revision 163221) +++ gcc/config/i386/sse.md (working copy) @@ -7709,7 +7709,7 @@ "@ pinsrq\t{$0x1, %2, %0|%0, %2, 0x1} movq\t{%1, %0|%0, %1} - movq\t{%1, %0|%0, %1} + movd\t{%1, %0|%0, %1} movq2dq\t{%1, %0|%0, %1} punpcklqdq\t{%2, %0|%0, %2} movlhps\t{%2, %0|%0, %2} @@ -7728,7 +7728,7 @@ "TARGET_64BIT && TARGET_SSE" "@ movq\t{%1, %0|%0, %1} - movq\t{%1, %0|%0, %1} + movd\t{%1, %0|%0, %1} movq2dq\t{%1, %0|%0, %1} punpcklqdq\t{%2, %0|%0, %2} movlhps\t{%2, %0|%0, %2} Index: gcc/config.gcc =================================================================== --- gcc/config.gcc (revision 163221) +++ gcc/config.gcc (working copy) @@ -1127,17 +1127,13 @@ hppa[12]*-*-hpux11*) i[34567]86-*-darwin*) need_64bit_hwint=yes need_64bit_isa=yes - - # This is so that '.../configure && make' doesn't fail due to - # config.guess deciding that the configuration is i386-*-darwin* and - # then this file using that to set --with-cpu=i386 which has no -m64 - # support. - with_cpu=${with_cpu:-generic} + # Baseline choice for a machine that allows m64 support. + with_cpu=${with_cpu:-core2} tmake_file="${tmake_file} t-slibgcc-darwin i386/t-crtpc i386/t-crtfm" lto_binary_reader=lto-macho ;; x86_64-*-darwin*) - with_cpu=${with_cpu:-generic} + with_cpu=${with_cpu:-core2} tmake_file="${tmake_file} ${cpu_type}/t-darwin64 t-slibgcc-darwin i386/t-crtpc i386/t-crtfm" tm_file="${tm_file} ${cpu_type}/darwin64.h"