Message ID | 55145D22.9050208@redhat.com |
---|---|
State | New |
Headers | show |
> + <li>In GCC-4.9 and earlier releases PIC hard register was fixed > + and was not used for other purposes when PIC code was > + generated. Reuse of PIC hard register was implemented in RA > + for GCC-5.0. It improves generated PIC code performance as > + more hard registers can be used. As an example, shared > + libraries and OS Android would significantly benefit from > + such optimization. Currently it is switched on only for > + x86/x86-64 targets. As RA infrastructure is already > + implemented for PIC register reuse, other targets might > + follow this in the future.</li> PIC was always huge performance disaster on x86, I suppose with this patch in, -fPIC benchmarks should improve quite noticeably ;) Thanks! Honza
On 03/26/2015 01:25 PM, Vladimir Makarov wrote: > Hi, I neglected to write about RA changes for the previous releases > and people asked me to write about RA changes for GCC-5. So here is what > I'd like to add to gcc-5/changes.html. I'll do it tomorrow. So any > comments will be appreciated. > > Index: changes.html > =================================================================== > RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v > retrieving revision 1.91 > diff -U 5 -r1.91 changes.html > --- changes.html 23 Mar 2015 10:12:23 -0000 1.91 > +++ changes.html 26 Mar 2015 19:24:32 -0000 > @@ -95,10 +95,40 @@ > <li>The new <code>gcov-tool</code> utility allows manipulating > profiles.</li> > <li>Profiles are now more tolerant to source file changes (this can be > controlled by <code>--param profile-func-internal-id</code>).</li> > </ul></li> > + <li>Register allocation improvements: > + <ul> > + <li>A new local register allocator (LRA) sub-pass was added. > + The sub-pass implements control-flow sensitive global > + register rematerialization (controlled via > + <code>-flra-remat</code>). Instead of spilling and How about rewriting the first two sentences as: A new local register allocator (LRA) sub-pass, controlled by <code>-flra-remat</code>, implements control-flow sensitive global register materialization. > + restoring register value, it is recalculated if it is s/register value/a register value/ > + profitable. The sub-pass improved SPEC2000 generated code > + by 1% and 0.5% correspondingly on ARM and x86-64.</li> > + <li>In GCC-4.9 and earlier releases PIC hard register was fixed > + and was not used for other purposes when PIC code was > + generated. Reuse of PIC hard register was implemented in RA > + for GCC-5.0. It improves generated PIC code performance as > + more hard registers can be used. As an example, shared > + libraries and OS Android would significantly benefit from > + such optimization. Currently it is switched on only for > + x86/x86-64 targets. As RA infrastructure is already > + implemented for PIC register reuse, other targets might > + follow this in the future.</li> How about making this less verbose and repetitive: Reuse of the PIC hard register, instead of using a fixed register, was implemented on x86/x86-64 targets. This improves generated PIC code performance as more hard registers can be used. Shared libraries can significantly benefit from this optimization. > + <li>A simple form of inter-procedural RA was implemented. When > + it is known that a called function does not use caller saved s/caller saved/caller-saved/ > + registers, save/restore code is not generated around the > + call for such registers. This optimization can be controlled > + by <code>-fipa-ra</code></li> > + <li>On some architectures (e.g. modern Intel processors), > + spilling general registers into vector registers can be more > + profitable than spilling into memory. LRA had already such > + optimization. It was significantly improved for GCC-5.0, > + permitting more 85% such spills than in GCC-4.9.</li> I don't understand the last sentence. How about just dropping that 85% bit and making the whole thing less wordy and more focused on the actual improvement: LRA is now much more effective at generating spills of general registers into vector registers instead of memory on architectures (e.g., modern Intel processors) where this is profitable. > + </ul></li> > <li>UndefinedBehaviorSanitizer gained a few new sanitization options: > <ul> > <li><code>-fsanitize=float-divide-by-zero</code>: detect floating-point > division by zero;</li> > <li><code>-fsanitize=float-cast-overflow</code>: check that the result -Sandra
On 03/26/2015 04:38 PM, Sandra Loosemore wrote: > On 03/26/2015 01:25 PM, Vladimir Makarov wrote: >> Hi, I neglected to write about RA changes for the previous releases >> and people asked me to write about RA changes for GCC-5. So here is what >> I'd like to add to gcc-5/changes.html. I'll do it tomorrow. So any >> comments will be appreciated. >> >> Index: changes.html >> =================================================================== >> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v >> retrieving revision 1.91 >> diff -U 5 -r1.91 changes.html >> --- changes.html 23 Mar 2015 10:12:23 -0000 1.91 >> +++ changes.html 26 Mar 2015 19:24:32 -0000 >> @@ -95,10 +95,40 @@ >> <li>The new <code>gcov-tool</code> utility allows manipulating >> profiles.</li> >> <li>Profiles are now more tolerant to source file changes >> (this can be >> controlled by <code>--param >> profile-func-internal-id</code>).</li> >> </ul></li> >> + <li>Register allocation improvements: >> + <ul> >> + <li>A new local register allocator (LRA) sub-pass was added. >> + The sub-pass implements control-flow sensitive global >> + register rematerialization (controlled via >> + <code>-flra-remat</code>). Instead of spilling and > > How about rewriting the first two sentences as: > > A new local register allocator (LRA) sub-pass, controlled by > <code>-flra-remat</code>, implements control-flow sensitive global > register materialization. > That is better. Thanks. >> + restoring register value, it is recalculated if it is > > s/register value/a register value/ > Fixed. >> + profitable. The sub-pass improved SPEC2000 generated code >> + by 1% and 0.5% correspondingly on ARM and x86-64.</li> >> + <li>In GCC-4.9 and earlier releases PIC hard register was fixed >> + and was not used for other purposes when PIC code was >> + generated. Reuse of PIC hard register was implemented in RA >> + for GCC-5.0. It improves generated PIC code performance as >> + more hard registers can be used. As an example, shared >> + libraries and OS Android would significantly benefit from >> + such optimization. Currently it is switched on only for >> + x86/x86-64 targets. As RA infrastructure is already >> + implemented for PIC register reuse, other targets might >> + follow this in the future.</li> > > How about making this less verbose and repetitive: > > Reuse of the PIC hard register, instead of using a fixed register, was > implemented on x86/x86-64 targets. This improves generated PIC code > performance as more hard registers can be used. Shared libraries can > significantly benefit from this optimization. Fixed. Thanks. > >> + <li>A simple form of inter-procedural RA was implemented. When >> + it is known that a called function does not use caller saved > > s/caller saved/caller-saved/ > Done. >> + registers, save/restore code is not generated around the >> + call for such registers. This optimization can be controlled >> + by <code>-fipa-ra</code></li> >> + <li>On some architectures (e.g. modern Intel processors), >> + spilling general registers into vector registers can be more >> + profitable than spilling into memory. LRA had already such >> + optimization. It was significantly improved for GCC-5.0, >> + permitting more 85% such spills than in GCC-4.9.</li> > > I don't understand the last sentence. How about just dropping that > 85% bit and making the whole thing less wordy and more focused on the > actual improvement: > > LRA is now much more effective at generating spills of general > registers into vector registers instead of memory on architectures > (e.g., modern Intel processors) where this is profitable. > Ok. Fixed. Thanks, Sandra. That was very helpful as English is not my native language. >> + </ul></li> >> <li>UndefinedBehaviorSanitizer gained a few new sanitization >> options: >> <ul> >> <li><code>-fsanitize=float-divide-by-zero</code>: detect floating-point >> division by zero;</li> >> <li><code>-fsanitize=float-cast-overflow</code>: check that the result >
Index: changes.html =================================================================== RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v retrieving revision 1.91 diff -U 5 -r1.91 changes.html --- changes.html 23 Mar 2015 10:12:23 -0000 1.91 +++ changes.html 26 Mar 2015 19:24:32 -0000 @@ -95,10 +95,40 @@ <li>The new <code>gcov-tool</code> utility allows manipulating profiles.</li> <li>Profiles are now more tolerant to source file changes (this can be controlled by <code>--param profile-func-internal-id</code>).</li> </ul></li> + <li>Register allocation improvements: + <ul> + <li>A new local register allocator (LRA) sub-pass was added. + The sub-pass implements control-flow sensitive global + register rematerialization (controlled via + <code>-flra-remat</code>). Instead of spilling and + restoring register value, it is recalculated if it is + profitable. The sub-pass improved SPEC2000 generated code + by 1% and 0.5% correspondingly on ARM and x86-64.</li> + <li>In GCC-4.9 and earlier releases PIC hard register was fixed + and was not used for other purposes when PIC code was + generated. Reuse of PIC hard register was implemented in RA + for GCC-5.0. It improves generated PIC code performance as + more hard registers can be used. As an example, shared + libraries and OS Android would significantly benefit from + such optimization. Currently it is switched on only for + x86/x86-64 targets. As RA infrastructure is already + implemented for PIC register reuse, other targets might + follow this in the future.</li> + <li>A simple form of inter-procedural RA was implemented. When + it is known that a called function does not use caller saved + registers, save/restore code is not generated around the + call for such registers. This optimization can be controlled + by <code>-fipa-ra</code></li> + <li>On some architectures (e.g. modern Intel processors), + spilling general registers into vector registers can be more + profitable than spilling into memory. LRA had already such + optimization. It was significantly improved for GCC-5.0, + permitting more 85% such spills than in GCC-4.9.</li> + </ul></li> <li>UndefinedBehaviorSanitizer gained a few new sanitization options: <ul> <li><code>-fsanitize=float-divide-by-zero</code>: detect floating-point division by zero;</li> <li><code>-fsanitize=float-cast-overflow</code>: check that the result