Message ID | CAAs8Hmz8_=m1EoX=eQqX2gV+qwVOR_SO5tTCaz=MmCu3vwkpeQ@mail.gmail.com |
---|---|
State | New |
Headers | show |
Ping. On Thu, May 15, 2014 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote: > Optimize access to globals with -fpie, x86_64 only: > > Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module > using the GOT. This is two instructions, one to get the address of the global > from the GOT and the other to get the value. If it turns out that the global > gets defined in the executable at link-time, it still needs to go through the > GOT as it is too late then to generate a direct access. > > Examples: > > foo.cc > ------ > int a_glob; > int main () { > return a_glob; // defined in this file > } > > With -O2 -fpie -pie, the generated code directly accesses the global via > PC-relative insn: > > 5e0 <main>: > mov 0x165a(%rip),%eax # 1c40 <a_glob> > > foo.cc > ------ > > extern int a_glob; > int main () { > return a_glob; // defined in this file > } > > With -O2 -fpie -pie, the generated code accesses global via GOT using two > memory loads: > > 6f0 <main>: > mov 0x1609(%rip),%rax # 1d00 <_DYNAMIC+0x230> > mov (%rax),%eax > > This is true even if in the latter case the global was defined in the > executable through a different file. > > Some experiments on google benchmarks shows that the extra memory loads affects > performance by 1% to 5%. > > > Solution - Copy Relocations: > > When the linker supports copy relocations, GCC can always assume that the > global will be defined in the executable. For globals that are truly extern > (come from shared objects), the linker will create copy relocations and have > them defined in the executable. Result is that no global access needs to go > through the GOT and hence improves performance. > > This patch to the gold linker : > https://sourceware.org/ml/binutils/2014-05/msg00092.html > submitted recently allows gold to generate copy relocations for -pie mode when > necessary. > > I have added option -mld-pie-copyrelocs which when combined with -fpie would do > this. Note that the BFD linker does not support pie copyrelocs yet and this > option cannot be used there. > > Please review. > > > ChangeLog: > > * config/i386/i36.opt (mld-pie-copyrelocs): New option. > * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this > address is still legitimate in the presence of copy relocations > and -fpie. > * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test. > * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test. > > > > Patch attached. > Thanks > Sri
Ping. On Mon, May 19, 2014 at 11:11 AM, Sriraman Tallam <tmsriram@google.com> wrote: > Ping. > > On Thu, May 15, 2014 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> Optimize access to globals with -fpie, x86_64 only: >> >> Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module >> using the GOT. This is two instructions, one to get the address of the global >> from the GOT and the other to get the value. If it turns out that the global >> gets defined in the executable at link-time, it still needs to go through the >> GOT as it is too late then to generate a direct access. >> >> Examples: >> >> foo.cc >> ------ >> int a_glob; >> int main () { >> return a_glob; // defined in this file >> } >> >> With -O2 -fpie -pie, the generated code directly accesses the global via >> PC-relative insn: >> >> 5e0 <main>: >> mov 0x165a(%rip),%eax # 1c40 <a_glob> >> >> foo.cc >> ------ >> >> extern int a_glob; >> int main () { >> return a_glob; // defined in this file >> } >> >> With -O2 -fpie -pie, the generated code accesses global via GOT using two >> memory loads: >> >> 6f0 <main>: >> mov 0x1609(%rip),%rax # 1d00 <_DYNAMIC+0x230> >> mov (%rax),%eax >> >> This is true even if in the latter case the global was defined in the >> executable through a different file. >> >> Some experiments on google benchmarks shows that the extra memory loads affects >> performance by 1% to 5%. >> >> >> Solution - Copy Relocations: >> >> When the linker supports copy relocations, GCC can always assume that the >> global will be defined in the executable. For globals that are truly extern >> (come from shared objects), the linker will create copy relocations and have >> them defined in the executable. Result is that no global access needs to go >> through the GOT and hence improves performance. >> >> This patch to the gold linker : >> https://sourceware.org/ml/binutils/2014-05/msg00092.html >> submitted recently allows gold to generate copy relocations for -pie mode when >> necessary. >> >> I have added option -mld-pie-copyrelocs which when combined with -fpie would do >> this. Note that the BFD linker does not support pie copyrelocs yet and this >> option cannot be used there. >> >> Please review. >> >> >> ChangeLog: >> >> * config/i386/i36.opt (mld-pie-copyrelocs): New option. >> * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this >> address is still legitimate in the presence of copy relocations >> and -fpie. >> * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test. >> * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test. >> >> >> >> Patch attached. >> Thanks >> Sri
Index: config/i386/i386.opt =================================================================== --- config/i386/i386.opt (revision 210437) +++ config/i386/i386.opt (working copy) @@ -108,6 +108,10 @@ int x_ix86_dump_tunes TargetSave int x_ix86_force_align_arg_pointer +;; -mld-pie-copyrelocs +TargetSave +int x_ix86_ld_pie_copyrelocs + ;; -mforce-drap= TargetSave int x_ix86_force_drap @@ -291,6 +295,10 @@ mfancy-math-387 Target RejectNegative Report InverseMask(NO_FANCY_MATH_387, USE_FANCY_MATH_387) Save Generate sin, cos, sqrt for FPU +mld-pie-copyrelocs +Target Report Var(ix86_ld_pie_copyrelocs) Init(0) +Use linker copy relocs for pie + mforce-drap Target Report Var(ix86_force_drap) Always use Dynamic Realigned Argument Pointer (DRAP) to realign stack Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 210437) +++ config/i386/i386.c (working copy) @@ -12684,7 +12684,9 @@ legitimate_pic_address_disp_p (rtx disp) return true; } else if (!SYMBOL_REF_FAR_ADDR_P (op0) - && SYMBOL_REF_LOCAL_P (op0) + && (SYMBOL_REF_LOCAL_P (op0) + || (TARGET_64BIT && ix86_ld_pie_copyrelocs && flag_pie + && !SYMBOL_REF_FUNCTION_P (op0))) && ix86_cmodel != CM_LARGE_PIC) return true; break; Index: testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c =================================================================== --- testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c (revision 0) +++ testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c (revision 0) @@ -0,0 +1,13 @@ +/* Test if -mld-pie-copyrelocs does the right thing. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fpie -mld-pie-copyrelocs" } */ + +extern int glob_a; + +int foo () +{ + return glob_a; +} + +/* glob_a should never be accessed with a GOTPCREL */ +/* { dg-final { scan-assembler-not "glob_a\\@GOTPCREL" { target { x86_64-*-* } } } } */ Index: testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c =================================================================== --- testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c (revision 0) +++ testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c (revision 0) @@ -0,0 +1,13 @@ +/* Test if -mno-ld-pie-copyrelocs does the right thing. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fpie -mno-ld-pie-copyrelocs" } */ + +extern int glob_a; + +int foo () +{ + return glob_a; +} + +/* glob_a should always be accessed via GOT */ +/* { dg-final { scan-assembler "glob_a\\@GOT" { target { x86_64-*-* } } } } */