Message ID | CAAs8Hmw+fC9=1VDTGcOnJgPXFmMo3qtC=HqdNUvnyWrHbm51Mw@mail.gmail.com |
---|---|
State | New |
Headers | show |
Hi Uros, Could you please review this patch? Thanks Sri On Fri, Jun 20, 2014 at 5:17 PM, Sriraman Tallam <tmsriram@google.com> wrote: > Patch Updated. > > Sri > > On Mon, Jun 9, 2014 at 3:55 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> Ping. >> >> On Mon, May 19, 2014 at 11:11 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>> Ping. >>> >>> On Thu, May 15, 2014 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> Optimize access to globals with -fpie, x86_64 only: >>>> >>>> Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module >>>> using the GOT. This is two instructions, one to get the address of the global >>>> from the GOT and the other to get the value. If it turns out that the global >>>> gets defined in the executable at link-time, it still needs to go through the >>>> GOT as it is too late then to generate a direct access. >>>> >>>> Examples: >>>> >>>> foo.cc >>>> ------ >>>> int a_glob; >>>> int main () { >>>> return a_glob; // defined in this file >>>> } >>>> >>>> With -O2 -fpie -pie, the generated code directly accesses the global via >>>> PC-relative insn: >>>> >>>> 5e0 <main>: >>>> mov 0x165a(%rip),%eax # 1c40 <a_glob> >>>> >>>> foo.cc >>>> ------ >>>> >>>> extern int a_glob; >>>> int main () { >>>> return a_glob; // defined in this file >>>> } >>>> >>>> With -O2 -fpie -pie, the generated code accesses global via GOT using two >>>> memory loads: >>>> >>>> 6f0 <main>: >>>> mov 0x1609(%rip),%rax # 1d00 <_DYNAMIC+0x230> >>>> mov (%rax),%eax >>>> >>>> This is true even if in the latter case the global was defined in the >>>> executable through a different file. >>>> >>>> Some experiments on google benchmarks shows that the extra memory loads affects >>>> performance by 1% to 5%. >>>> >>>> >>>> Solution - Copy Relocations: >>>> >>>> When the linker supports copy relocations, GCC can always assume that the >>>> global will be defined in the executable. For globals that are truly extern >>>> (come from shared objects), the linker will create copy relocations and have >>>> them defined in the executable. Result is that no global access needs to go >>>> through the GOT and hence improves performance. >>>> >>>> This patch to the gold linker : >>>> https://sourceware.org/ml/binutils/2014-05/msg00092.html >>>> submitted recently allows gold to generate copy relocations for -pie mode when >>>> necessary. >>>> >>>> I have added option -mld-pie-copyrelocs which when combined with -fpie would do >>>> this. Note that the BFD linker does not support pie copyrelocs yet and this >>>> option cannot be used there. >>>> >>>> Please review. >>>> >>>> >>>> ChangeLog: >>>> >>>> * config/i386/i36.opt (mld-pie-copyrelocs): New option. >>>> * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this >>>> address is still legitimate in the presence of copy relocations >>>> and -fpie. >>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test. >>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test. >>>> >>>> >>>> >>>> Patch attached. >>>> Thanks >>>> Sri
Ping. On Thu, Jun 26, 2014 at 10:54 AM, Sriraman Tallam <tmsriram@google.com> wrote: > Hi Uros, > > Could you please review this patch? > > Thanks > Sri > > On Fri, Jun 20, 2014 at 5:17 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> Patch Updated. >> >> Sri >> >> On Mon, Jun 9, 2014 at 3:55 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>> Ping. >>> >>> On Mon, May 19, 2014 at 11:11 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> Ping. >>>> >>>> On Thu, May 15, 2014 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>> Optimize access to globals with -fpie, x86_64 only: >>>>> >>>>> Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module >>>>> using the GOT. This is two instructions, one to get the address of the global >>>>> from the GOT and the other to get the value. If it turns out that the global >>>>> gets defined in the executable at link-time, it still needs to go through the >>>>> GOT as it is too late then to generate a direct access. >>>>> >>>>> Examples: >>>>> >>>>> foo.cc >>>>> ------ >>>>> int a_glob; >>>>> int main () { >>>>> return a_glob; // defined in this file >>>>> } >>>>> >>>>> With -O2 -fpie -pie, the generated code directly accesses the global via >>>>> PC-relative insn: >>>>> >>>>> 5e0 <main>: >>>>> mov 0x165a(%rip),%eax # 1c40 <a_glob> >>>>> >>>>> foo.cc >>>>> ------ >>>>> >>>>> extern int a_glob; >>>>> int main () { >>>>> return a_glob; // defined in this file >>>>> } >>>>> >>>>> With -O2 -fpie -pie, the generated code accesses global via GOT using two >>>>> memory loads: >>>>> >>>>> 6f0 <main>: >>>>> mov 0x1609(%rip),%rax # 1d00 <_DYNAMIC+0x230> >>>>> mov (%rax),%eax >>>>> >>>>> This is true even if in the latter case the global was defined in the >>>>> executable through a different file. >>>>> >>>>> Some experiments on google benchmarks shows that the extra memory loads affects >>>>> performance by 1% to 5%. >>>>> >>>>> >>>>> Solution - Copy Relocations: >>>>> >>>>> When the linker supports copy relocations, GCC can always assume that the >>>>> global will be defined in the executable. For globals that are truly extern >>>>> (come from shared objects), the linker will create copy relocations and have >>>>> them defined in the executable. Result is that no global access needs to go >>>>> through the GOT and hence improves performance. >>>>> >>>>> This patch to the gold linker : >>>>> https://sourceware.org/ml/binutils/2014-05/msg00092.html >>>>> submitted recently allows gold to generate copy relocations for -pie mode when >>>>> necessary. >>>>> >>>>> I have added option -mld-pie-copyrelocs which when combined with -fpie would do >>>>> this. Note that the BFD linker does not support pie copyrelocs yet and this >>>>> option cannot be used there. >>>>> >>>>> Please review. >>>>> >>>>> >>>>> ChangeLog: >>>>> >>>>> * config/i386/i36.opt (mld-pie-copyrelocs): New option. >>>>> * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this >>>>> address is still legitimate in the presence of copy relocations >>>>> and -fpie. >>>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test. >>>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test. >>>>> >>>>> >>>>> >>>>> Patch attached. >>>>> Thanks >>>>> Sri
Ping. On Fri, Jul 11, 2014 at 10:42 AM, Sriraman Tallam <tmsriram@google.com> wrote: > Ping. > > On Thu, Jun 26, 2014 at 10:54 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> Hi Uros, >> >> Could you please review this patch? >> >> Thanks >> Sri >> >> On Fri, Jun 20, 2014 at 5:17 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>> Patch Updated. >>> >>> Sri >>> >>> On Mon, Jun 9, 2014 at 3:55 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> Ping. >>>> >>>> On Mon, May 19, 2014 at 11:11 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>> Ping. >>>>> >>>>> On Thu, May 15, 2014 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>> Optimize access to globals with -fpie, x86_64 only: >>>>>> >>>>>> Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module >>>>>> using the GOT. This is two instructions, one to get the address of the global >>>>>> from the GOT and the other to get the value. If it turns out that the global >>>>>> gets defined in the executable at link-time, it still needs to go through the >>>>>> GOT as it is too late then to generate a direct access. >>>>>> >>>>>> Examples: >>>>>> >>>>>> foo.cc >>>>>> ------ >>>>>> int a_glob; >>>>>> int main () { >>>>>> return a_glob; // defined in this file >>>>>> } >>>>>> >>>>>> With -O2 -fpie -pie, the generated code directly accesses the global via >>>>>> PC-relative insn: >>>>>> >>>>>> 5e0 <main>: >>>>>> mov 0x165a(%rip),%eax # 1c40 <a_glob> >>>>>> >>>>>> foo.cc >>>>>> ------ >>>>>> >>>>>> extern int a_glob; >>>>>> int main () { >>>>>> return a_glob; // defined in this file >>>>>> } >>>>>> >>>>>> With -O2 -fpie -pie, the generated code accesses global via GOT using two >>>>>> memory loads: >>>>>> >>>>>> 6f0 <main>: >>>>>> mov 0x1609(%rip),%rax # 1d00 <_DYNAMIC+0x230> >>>>>> mov (%rax),%eax >>>>>> >>>>>> This is true even if in the latter case the global was defined in the >>>>>> executable through a different file. >>>>>> >>>>>> Some experiments on google benchmarks shows that the extra memory loads affects >>>>>> performance by 1% to 5%. >>>>>> >>>>>> >>>>>> Solution - Copy Relocations: >>>>>> >>>>>> When the linker supports copy relocations, GCC can always assume that the >>>>>> global will be defined in the executable. For globals that are truly extern >>>>>> (come from shared objects), the linker will create copy relocations and have >>>>>> them defined in the executable. Result is that no global access needs to go >>>>>> through the GOT and hence improves performance. >>>>>> >>>>>> This patch to the gold linker : >>>>>> https://sourceware.org/ml/binutils/2014-05/msg00092.html >>>>>> submitted recently allows gold to generate copy relocations for -pie mode when >>>>>> necessary. >>>>>> >>>>>> I have added option -mld-pie-copyrelocs which when combined with -fpie would do >>>>>> this. Note that the BFD linker does not support pie copyrelocs yet and this >>>>>> option cannot be used there. >>>>>> >>>>>> Please review. >>>>>> >>>>>> >>>>>> ChangeLog: >>>>>> >>>>>> * config/i386/i36.opt (mld-pie-copyrelocs): New option. >>>>>> * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this >>>>>> address is still legitimate in the presence of copy relocations >>>>>> and -fpie. >>>>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test. >>>>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test. >>>>>> >>>>>> >>>>>> >>>>>> Patch attached. >>>>>> Thanks >>>>>> Sri
On 06/20/2014 05:17 PM, Sriraman Tallam wrote: > Index: config/i386/i386.c > =================================================================== > --- config/i386/i386.c (revision 211826) > +++ config/i386/i386.c (working copy) > @@ -12691,7 +12691,9 @@ legitimate_pic_address_disp_p (rtx disp) > return true; > } > else if (!SYMBOL_REF_FAR_ADDR_P (op0) > - && SYMBOL_REF_LOCAL_P (op0) > + && (SYMBOL_REF_LOCAL_P (op0) > + || (TARGET_64BIT && ix86_copyrelocs && flag_pie > + && !SYMBOL_REF_FUNCTION_P (op0))) > && ix86_cmodel != CM_LARGE_PIC) > return true; > break; This is the wrong place to patch. You ought to be adjusting SYMBOL_REF_LOCAL_P, by providing a modified TARGET_BINDS_LOCAL_P. Note in particular that I believe that you are doing the wrong thing with weak and COMMON symbols, in that you probably ought not force a copy reloc there. Note the complexity of default_binds_local_p_1, and the fact that all you really want to modify is /* If PIC, then assume that any global name can be overridden by symbols resolved from other modules. */ else if (shlib) local_p = false; near the bottom of that function. r~
On 2 September 2014 22:40:50 CEST, Richard Henderson <rth@redhat.com> wrote: >On 06/20/2014 05:17 PM, Sriraman Tallam wrote: >> Index: config/i386/i386.c >> =================================================================== >> --- config/i386/i386.c (revision 211826) >> +++ config/i386/i386.c (working copy) >> @@ -12691,7 +12691,9 @@ legitimate_pic_address_disp_p (rtx disp) >> return true; >> } >> else if (!SYMBOL_REF_FAR_ADDR_P (op0) >> - && SYMBOL_REF_LOCAL_P (op0) >> + && (SYMBOL_REF_LOCAL_P (op0) >> + || (TARGET_64BIT && ix86_copyrelocs && flag_pie >> + && !SYMBOL_REF_FUNCTION_P (op0))) >> && ix86_cmodel != CM_LARGE_PIC) >> return true; >> break; > >This is the wrong place to patch. > >You ought to be adjusting SYMBOL_REF_LOCAL_P, by providing a modified >TARGET_BINDS_LOCAL_P. > >Note in particular that I believe that you are doing the wrong thing >with weak >and COMMON symbols, in that you probably ought not force a copy reloc >there. > >Note the complexity of default_binds_local_p_1, and the fact that all >you >really want to modify is > > /* If PIC, then assume that any global name can be overridden by > symbols resolved from other modules. */ > else if (shlib) > local_p = false; > >near the bottom of that function. Reminds me of PR32219 https://gcc.gnu.org/ml/gcc-patches/2010-03/msg00665.html but admittedly that is not PIE imposed but still fails on current trunk..
Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 211826) +++ config/i386/i386.c (working copy) @@ -12691,7 +12691,9 @@ legitimate_pic_address_disp_p (rtx disp) return true; } else if (!SYMBOL_REF_FAR_ADDR_P (op0) - && SYMBOL_REF_LOCAL_P (op0) + && (SYMBOL_REF_LOCAL_P (op0) + || (TARGET_64BIT && ix86_copyrelocs && flag_pie + && !SYMBOL_REF_FUNCTION_P (op0))) && ix86_cmodel != CM_LARGE_PIC) return true; break; Index: config/i386/i386.opt =================================================================== --- config/i386/i386.opt (revision 211826) +++ config/i386/i386.opt (working copy) @@ -108,6 +108,10 @@ int x_ix86_dump_tunes TargetSave int x_ix86_force_align_arg_pointer +;; -mcopyrelocs +TargetSave +int x_ix86_copyrelocs + ;; -mforce-drap= TargetSave int x_ix86_force_drap @@ -291,6 +295,10 @@ mfancy-math-387 Target RejectNegative Report InverseMask(NO_FANCY_MATH_387, USE_FANCY_MATH_387) Save Generate sin, cos, sqrt for FPU +mcopyrelocs +Target Report Var(ix86_copyrelocs) Init(0) +Use copy relocations for pie when possible + mforce-drap Target Report Var(ix86_force_drap) Always use Dynamic Realigned Argument Pointer (DRAP) to realign stack Index: testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c =================================================================== --- testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c (revision 0) +++ testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c (revision 0) @@ -0,0 +1,13 @@ +/* Test if -mcopyrelocs does the right thing. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fpie -mcopyrelocs" } */ + +extern int glob_a; + +int foo () +{ + return glob_a; +} + +/* glob_a should never be accessed with a GOTPCREL */ +/* { dg-final { scan-assembler-not "glob_a\\@GOTPCREL" { target { x86_64-*-* } } } } */ Index: testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c =================================================================== --- testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c (revision 0) +++ testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c (revision 0) @@ -0,0 +1,13 @@ +/* Test if -mnoi-copyrelocs does the right thing. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fpie -mno-copyrelocs" } */ + +extern int glob_a; + +int foo () +{ + return glob_a; +} + +/* glob_a should always be accessed via GOT */ +/* { dg-final { scan-assembler "glob_a\\@GOT" { target { x86_64-*-* } } } } */ Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 211826) +++ doc/invoke.texi (working copy) @@ -688,7 +688,8 @@ Objective-C and Objective-C++ Dialects}. -m32 -m64 -mx32 -m16 -mlarge-data-threshold=@var{num} @gol -msse2avx -mfentry -m8bit-idiv @gol -mavx256-split-unaligned-load -mavx256-split-unaligned-store @gol --mstack-protector-guard=@var{guard}} +-mstack-protector-guard=@var{guard} @gol +-mcopyrelocs} @emph{i386 and x86-64 Windows Options} @gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol @@ -15802,6 +15803,15 @@ locations are @samp{global} for global canary or @ canary in the TLS block (the default). This option has effect only when @option{-fstack-protector} or @option{-fstack-protector-all} is specified. +@item -mcopyrelocs +@itemx -mno-copyrelocs +@opindex mcopyrelocs +@opindex mno-copyrelocs +With @option{-fpie} and @option{fPIE}, copy relocations support allows the +compiler to assume that all symbol references are local. This allows the +compiler to skip the GOT for global accesses and this applies only to the +x86-64 architecture. + @end table These @samp{-m} switches are supported in addition to the above