diff mbox

[x86_64] Optimize access to globals in "-fpie -pie" builds with copy relocations

Message ID CAAs8Hmw+fC9=1VDTGcOnJgPXFmMo3qtC=HqdNUvnyWrHbm51Mw@mail.gmail.com
State New
Headers show

Commit Message

Sriraman Tallam June 21, 2014, 12:17 a.m. UTC
Patch Updated.

Sri

On Mon, Jun 9, 2014 at 3:55 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Ping.
>
> On Mon, May 19, 2014 at 11:11 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Ping.
>>
>> On Thu, May 15, 2014 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Optimize access to globals with -fpie, x86_64 only:
>>>
>>> Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module
>>> using the GOT.  This is two instructions, one to get the address of the global
>>> from the GOT and the other to get the value.  If it turns out that the global
>>> gets defined in the executable at link-time, it still needs to go through the
>>> GOT as it is too late then to generate a direct access.
>>>
>>> Examples:
>>>
>>> foo.cc
>>> ------
>>> int a_glob;
>>> int main () {
>>>   return a_glob; // defined in this file
>>> }
>>>
>>> With -O2 -fpie -pie, the generated code directly accesses the global via
>>> PC-relative insn:
>>>
>>> 5e0   <main>:
>>>    mov    0x165a(%rip),%eax        # 1c40 <a_glob>
>>>
>>> foo.cc
>>> ------
>>>
>>> extern int a_glob;
>>> int main () {
>>>   return a_glob; // defined in this file
>>> }
>>>
>>> With -O2 -fpie -pie, the generated code accesses global via GOT using two
>>> memory loads:
>>>
>>> 6f0  <main>:
>>>    mov    0x1609(%rip),%rax   # 1d00 <_DYNAMIC+0x230>
>>>    mov    (%rax),%eax
>>>
>>> This is true even if in the latter case the global was defined in the
>>> executable through a different file.
>>>
>>> Some experiments on google benchmarks shows that the extra memory loads affects
>>> performance by 1% to 5%.
>>>
>>>
>>> Solution - Copy Relocations:
>>>
>>> When the linker supports copy relocations, GCC can always assume that the
>>> global will be defined in the executable.  For globals that are truly extern
>>> (come from shared objects), the linker will create copy relocations and have
>>> them defined in the executable. Result is that no global access needs to go
>>> through the GOT and hence improves performance.
>>>
>>> This patch to the gold linker :
>>> https://sourceware.org/ml/binutils/2014-05/msg00092.html
>>> submitted recently allows gold to generate copy relocations for -pie mode when
>>> necessary.
>>>
>>> I have added option -mld-pie-copyrelocs which when combined with -fpie would do
>>> this.  Note that the BFD linker does not support pie copyrelocs yet and this
>>> option cannot be used there.
>>>
>>> Please review.
>>>
>>>
>>> ChangeLog:
>>>
>>> * config/i386/i36.opt (mld-pie-copyrelocs): New option.
>>> * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this
>>>  address is still legitimate in the presence of copy relocations
>>>  and -fpie.
>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test.
>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test.
>>>
>>>
>>>
>>> Patch attached.
>>> Thanks
>>> Sri
Optimize access to globals with -fpie, x86_64 only:

Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module
using the GOT.  This is two instructions, one to get the address of the global
from the GOT and the other to get the value.  If it turns out that the global
gets defined in the executable at link-time, it still needs to go through the
GOT as it is too late then to generate a direct access. 

Examples:

foo.cc
------
int a_glob;
int main () {
  return a_glob; // defined in this file
}

With -O2 -fpie -pie, the generated code directly accesses the global via
PC-relative insn:

5e0   <main>:
   mov    0x165a(%rip),%eax        # 1c40 <a_glob>

foo.cc
------

extern int a_glob;
int main () {
  return a_glob; // defined in this file
}

With -O2 -fpie -pie, the generated code accesses global via GOT using two
memory loads:

6f0  <main>:
   mov    0x1609(%rip),%rax   # 1d00 <_DYNAMIC+0x230>
   mov    (%rax),%eax

This is true even if in the latter case the global was defined in the
executable through a different file.

Some experiments on google benchmarks shows that the extra memory loads affects
performance by 1% to 5%. 


Solution - Copy Relocations:

When the linker supports copy relocations, GCC can always assume that the
global will be defined in the executable.  For globals that are truly extern
(come from shared objects), the linker will create copy relocations and have
them defined in the executable. Result is that no global access needs to go
through the GOT and hence improves performance.

This patch to the gold linker :
https://sourceware.org/ml/binutils/2014-05/msg00092.html
submitted recently allows gold to generate copy relocations for -pie mode when
necessary.

I have added option -mcopyrelocs which when combined with -fpie would do
this.  Note that the BFD linker does not support pie copyrelocs yet and this
option cannot be used there.

Please review.


ChangeLog:

	* config/i386/i36.opt (mcopyrelocs): New option.
	* config/i386/i386.c (legitimate_pic_address_disp_p): Check if this
	  address is still legitimate in the presence of copy relocations
	  and -fpie.
	* doc/invoke.texi (mcopyrelocs): Document.
	* testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test.
	* testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test.

Comments

Sriraman Tallam June 26, 2014, 5:54 p.m. UTC | #1
Hi Uros,

   Could you please review this patch?

Thanks
Sri

On Fri, Jun 20, 2014 at 5:17 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Patch Updated.
>
> Sri
>
> On Mon, Jun 9, 2014 at 3:55 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Ping.
>>
>> On Mon, May 19, 2014 at 11:11 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Ping.
>>>
>>> On Thu, May 15, 2014 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Optimize access to globals with -fpie, x86_64 only:
>>>>
>>>> Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module
>>>> using the GOT.  This is two instructions, one to get the address of the global
>>>> from the GOT and the other to get the value.  If it turns out that the global
>>>> gets defined in the executable at link-time, it still needs to go through the
>>>> GOT as it is too late then to generate a direct access.
>>>>
>>>> Examples:
>>>>
>>>> foo.cc
>>>> ------
>>>> int a_glob;
>>>> int main () {
>>>>   return a_glob; // defined in this file
>>>> }
>>>>
>>>> With -O2 -fpie -pie, the generated code directly accesses the global via
>>>> PC-relative insn:
>>>>
>>>> 5e0   <main>:
>>>>    mov    0x165a(%rip),%eax        # 1c40 <a_glob>
>>>>
>>>> foo.cc
>>>> ------
>>>>
>>>> extern int a_glob;
>>>> int main () {
>>>>   return a_glob; // defined in this file
>>>> }
>>>>
>>>> With -O2 -fpie -pie, the generated code accesses global via GOT using two
>>>> memory loads:
>>>>
>>>> 6f0  <main>:
>>>>    mov    0x1609(%rip),%rax   # 1d00 <_DYNAMIC+0x230>
>>>>    mov    (%rax),%eax
>>>>
>>>> This is true even if in the latter case the global was defined in the
>>>> executable through a different file.
>>>>
>>>> Some experiments on google benchmarks shows that the extra memory loads affects
>>>> performance by 1% to 5%.
>>>>
>>>>
>>>> Solution - Copy Relocations:
>>>>
>>>> When the linker supports copy relocations, GCC can always assume that the
>>>> global will be defined in the executable.  For globals that are truly extern
>>>> (come from shared objects), the linker will create copy relocations and have
>>>> them defined in the executable. Result is that no global access needs to go
>>>> through the GOT and hence improves performance.
>>>>
>>>> This patch to the gold linker :
>>>> https://sourceware.org/ml/binutils/2014-05/msg00092.html
>>>> submitted recently allows gold to generate copy relocations for -pie mode when
>>>> necessary.
>>>>
>>>> I have added option -mld-pie-copyrelocs which when combined with -fpie would do
>>>> this.  Note that the BFD linker does not support pie copyrelocs yet and this
>>>> option cannot be used there.
>>>>
>>>> Please review.
>>>>
>>>>
>>>> ChangeLog:
>>>>
>>>> * config/i386/i36.opt (mld-pie-copyrelocs): New option.
>>>> * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this
>>>>  address is still legitimate in the presence of copy relocations
>>>>  and -fpie.
>>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test.
>>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test.
>>>>
>>>>
>>>>
>>>> Patch attached.
>>>> Thanks
>>>> Sri
Sriraman Tallam July 11, 2014, 5:42 p.m. UTC | #2
Ping.

On Thu, Jun 26, 2014 at 10:54 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi Uros,
>
>    Could you please review this patch?
>
> Thanks
> Sri
>
> On Fri, Jun 20, 2014 at 5:17 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Patch Updated.
>>
>> Sri
>>
>> On Mon, Jun 9, 2014 at 3:55 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Ping.
>>>
>>> On Mon, May 19, 2014 at 11:11 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Ping.
>>>>
>>>> On Thu, May 15, 2014 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>> Optimize access to globals with -fpie, x86_64 only:
>>>>>
>>>>> Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module
>>>>> using the GOT.  This is two instructions, one to get the address of the global
>>>>> from the GOT and the other to get the value.  If it turns out that the global
>>>>> gets defined in the executable at link-time, it still needs to go through the
>>>>> GOT as it is too late then to generate a direct access.
>>>>>
>>>>> Examples:
>>>>>
>>>>> foo.cc
>>>>> ------
>>>>> int a_glob;
>>>>> int main () {
>>>>>   return a_glob; // defined in this file
>>>>> }
>>>>>
>>>>> With -O2 -fpie -pie, the generated code directly accesses the global via
>>>>> PC-relative insn:
>>>>>
>>>>> 5e0   <main>:
>>>>>    mov    0x165a(%rip),%eax        # 1c40 <a_glob>
>>>>>
>>>>> foo.cc
>>>>> ------
>>>>>
>>>>> extern int a_glob;
>>>>> int main () {
>>>>>   return a_glob; // defined in this file
>>>>> }
>>>>>
>>>>> With -O2 -fpie -pie, the generated code accesses global via GOT using two
>>>>> memory loads:
>>>>>
>>>>> 6f0  <main>:
>>>>>    mov    0x1609(%rip),%rax   # 1d00 <_DYNAMIC+0x230>
>>>>>    mov    (%rax),%eax
>>>>>
>>>>> This is true even if in the latter case the global was defined in the
>>>>> executable through a different file.
>>>>>
>>>>> Some experiments on google benchmarks shows that the extra memory loads affects
>>>>> performance by 1% to 5%.
>>>>>
>>>>>
>>>>> Solution - Copy Relocations:
>>>>>
>>>>> When the linker supports copy relocations, GCC can always assume that the
>>>>> global will be defined in the executable.  For globals that are truly extern
>>>>> (come from shared objects), the linker will create copy relocations and have
>>>>> them defined in the executable. Result is that no global access needs to go
>>>>> through the GOT and hence improves performance.
>>>>>
>>>>> This patch to the gold linker :
>>>>> https://sourceware.org/ml/binutils/2014-05/msg00092.html
>>>>> submitted recently allows gold to generate copy relocations for -pie mode when
>>>>> necessary.
>>>>>
>>>>> I have added option -mld-pie-copyrelocs which when combined with -fpie would do
>>>>> this.  Note that the BFD linker does not support pie copyrelocs yet and this
>>>>> option cannot be used there.
>>>>>
>>>>> Please review.
>>>>>
>>>>>
>>>>> ChangeLog:
>>>>>
>>>>> * config/i386/i36.opt (mld-pie-copyrelocs): New option.
>>>>> * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this
>>>>>  address is still legitimate in the presence of copy relocations
>>>>>  and -fpie.
>>>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test.
>>>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test.
>>>>>
>>>>>
>>>>>
>>>>> Patch attached.
>>>>> Thanks
>>>>> Sri
Sriraman Tallam Sept. 2, 2014, 6:15 p.m. UTC | #3
Ping.

On Fri, Jul 11, 2014 at 10:42 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> Ping.
>
> On Thu, Jun 26, 2014 at 10:54 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi Uros,
>>
>>    Could you please review this patch?
>>
>> Thanks
>> Sri
>>
>> On Fri, Jun 20, 2014 at 5:17 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Patch Updated.
>>>
>>> Sri
>>>
>>> On Mon, Jun 9, 2014 at 3:55 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Ping.
>>>>
>>>> On Mon, May 19, 2014 at 11:11 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>> Ping.
>>>>>
>>>>> On Thu, May 15, 2014 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>> Optimize access to globals with -fpie, x86_64 only:
>>>>>>
>>>>>> Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module
>>>>>> using the GOT.  This is two instructions, one to get the address of the global
>>>>>> from the GOT and the other to get the value.  If it turns out that the global
>>>>>> gets defined in the executable at link-time, it still needs to go through the
>>>>>> GOT as it is too late then to generate a direct access.
>>>>>>
>>>>>> Examples:
>>>>>>
>>>>>> foo.cc
>>>>>> ------
>>>>>> int a_glob;
>>>>>> int main () {
>>>>>>   return a_glob; // defined in this file
>>>>>> }
>>>>>>
>>>>>> With -O2 -fpie -pie, the generated code directly accesses the global via
>>>>>> PC-relative insn:
>>>>>>
>>>>>> 5e0   <main>:
>>>>>>    mov    0x165a(%rip),%eax        # 1c40 <a_glob>
>>>>>>
>>>>>> foo.cc
>>>>>> ------
>>>>>>
>>>>>> extern int a_glob;
>>>>>> int main () {
>>>>>>   return a_glob; // defined in this file
>>>>>> }
>>>>>>
>>>>>> With -O2 -fpie -pie, the generated code accesses global via GOT using two
>>>>>> memory loads:
>>>>>>
>>>>>> 6f0  <main>:
>>>>>>    mov    0x1609(%rip),%rax   # 1d00 <_DYNAMIC+0x230>
>>>>>>    mov    (%rax),%eax
>>>>>>
>>>>>> This is true even if in the latter case the global was defined in the
>>>>>> executable through a different file.
>>>>>>
>>>>>> Some experiments on google benchmarks shows that the extra memory loads affects
>>>>>> performance by 1% to 5%.
>>>>>>
>>>>>>
>>>>>> Solution - Copy Relocations:
>>>>>>
>>>>>> When the linker supports copy relocations, GCC can always assume that the
>>>>>> global will be defined in the executable.  For globals that are truly extern
>>>>>> (come from shared objects), the linker will create copy relocations and have
>>>>>> them defined in the executable. Result is that no global access needs to go
>>>>>> through the GOT and hence improves performance.
>>>>>>
>>>>>> This patch to the gold linker :
>>>>>> https://sourceware.org/ml/binutils/2014-05/msg00092.html
>>>>>> submitted recently allows gold to generate copy relocations for -pie mode when
>>>>>> necessary.
>>>>>>
>>>>>> I have added option -mld-pie-copyrelocs which when combined with -fpie would do
>>>>>> this.  Note that the BFD linker does not support pie copyrelocs yet and this
>>>>>> option cannot be used there.
>>>>>>
>>>>>> Please review.
>>>>>>
>>>>>>
>>>>>> ChangeLog:
>>>>>>
>>>>>> * config/i386/i36.opt (mld-pie-copyrelocs): New option.
>>>>>> * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this
>>>>>>  address is still legitimate in the presence of copy relocations
>>>>>>  and -fpie.
>>>>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test.
>>>>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Patch attached.
>>>>>> Thanks
>>>>>> Sri
Richard Henderson Sept. 2, 2014, 8:40 p.m. UTC | #4
On 06/20/2014 05:17 PM, Sriraman Tallam wrote:
> Index: config/i386/i386.c
> ===================================================================
> --- config/i386/i386.c	(revision 211826)
> +++ config/i386/i386.c	(working copy)
> @@ -12691,7 +12691,9 @@ legitimate_pic_address_disp_p (rtx disp)
>  		return true;
>  	    }
>  	  else if (!SYMBOL_REF_FAR_ADDR_P (op0)
> -		   && SYMBOL_REF_LOCAL_P (op0)
> +		   && (SYMBOL_REF_LOCAL_P (op0)
> +		       || (TARGET_64BIT && ix86_copyrelocs && flag_pie
> +			   && !SYMBOL_REF_FUNCTION_P (op0)))
>  		   && ix86_cmodel != CM_LARGE_PIC)
>  	    return true;
>  	  break;

This is the wrong place to patch.

You ought to be adjusting SYMBOL_REF_LOCAL_P, by providing a modified
TARGET_BINDS_LOCAL_P.

Note in particular that I believe that you are doing the wrong thing with weak
and COMMON symbols, in that you probably ought not force a copy reloc there.

Note the complexity of default_binds_local_p_1, and the fact that all you
really want to modify is

  /* If PIC, then assume that any global name can be overridden by
     symbols resolved from other modules.  */
  else if (shlib)
    local_p = false;

near the bottom of that function.


r~
Bernhard Reutner-Fischer Sept. 3, 2014, 7:25 a.m. UTC | #5
On 2 September 2014 22:40:50 CEST, Richard Henderson <rth@redhat.com> wrote:
>On 06/20/2014 05:17 PM, Sriraman Tallam wrote:
>> Index: config/i386/i386.c
>> ===================================================================
>> --- config/i386/i386.c	(revision 211826)
>> +++ config/i386/i386.c	(working copy)
>> @@ -12691,7 +12691,9 @@ legitimate_pic_address_disp_p (rtx disp)
>>  		return true;
>>  	    }
>>  	  else if (!SYMBOL_REF_FAR_ADDR_P (op0)
>> -		   && SYMBOL_REF_LOCAL_P (op0)
>> +		   && (SYMBOL_REF_LOCAL_P (op0)
>> +		       || (TARGET_64BIT && ix86_copyrelocs && flag_pie
>> +			   && !SYMBOL_REF_FUNCTION_P (op0)))
>>  		   && ix86_cmodel != CM_LARGE_PIC)
>>  	    return true;
>>  	  break;
>
>This is the wrong place to patch.
>
>You ought to be adjusting SYMBOL_REF_LOCAL_P, by providing a modified
>TARGET_BINDS_LOCAL_P.
>
>Note in particular that I believe that you are doing the wrong thing
>with weak
>and COMMON symbols, in that you probably ought not force a copy reloc
>there.
>
>Note the complexity of default_binds_local_p_1, and the fact that all
>you
>really want to modify is
>
>  /* If PIC, then assume that any global name can be overridden by
>     symbols resolved from other modules.  */
>  else if (shlib)
>    local_p = false;
>
>near the bottom of that function.

Reminds me of PR32219 https://gcc.gnu.org/ml/gcc-patches/2010-03/msg00665.html
but admittedly that is not PIE imposed but still fails on current trunk..
diff mbox

Patch

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 211826)
+++ config/i386/i386.c	(working copy)
@@ -12691,7 +12691,9 @@  legitimate_pic_address_disp_p (rtx disp)
 		return true;
 	    }
 	  else if (!SYMBOL_REF_FAR_ADDR_P (op0)
-		   && SYMBOL_REF_LOCAL_P (op0)
+		   && (SYMBOL_REF_LOCAL_P (op0)
+		       || (TARGET_64BIT && ix86_copyrelocs && flag_pie
+			   && !SYMBOL_REF_FUNCTION_P (op0)))
 		   && ix86_cmodel != CM_LARGE_PIC)
 	    return true;
 	  break;
Index: config/i386/i386.opt
===================================================================
--- config/i386/i386.opt	(revision 211826)
+++ config/i386/i386.opt	(working copy)
@@ -108,6 +108,10 @@  int x_ix86_dump_tunes
 TargetSave
 int x_ix86_force_align_arg_pointer
 
+;; -mcopyrelocs
+TargetSave
+int x_ix86_copyrelocs
+
 ;; -mforce-drap= 
 TargetSave
 int x_ix86_force_drap
@@ -291,6 +295,10 @@  mfancy-math-387
 Target RejectNegative Report InverseMask(NO_FANCY_MATH_387, USE_FANCY_MATH_387) Save
 Generate sin, cos, sqrt for FPU
 
+mcopyrelocs
+Target Report Var(ix86_copyrelocs) Init(0)
+Use copy relocations for pie when possible
+
 mforce-drap
 Target Report Var(ix86_force_drap)
 Always use Dynamic Realigned Argument Pointer (DRAP) to realign stack
Index: testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c
===================================================================
--- testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c	(revision 0)
+++ testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c	(revision 0)
@@ -0,0 +1,13 @@ 
+/* Test if -mcopyrelocs does the right thing. */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fpie -mcopyrelocs" } */
+
+extern int glob_a;
+
+int foo ()
+{
+  return glob_a;
+}
+
+/* glob_a should never be accessed with a GOTPCREL  */ 
+/* { dg-final { scan-assembler-not "glob_a\\@GOTPCREL" { target { x86_64-*-* } } } } */
Index: testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c
===================================================================
--- testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c	(revision 0)
+++ testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c	(revision 0)
@@ -0,0 +1,13 @@ 
+/* Test if -mnoi-copyrelocs does the right thing. */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fpie -mno-copyrelocs" } */
+
+extern int glob_a;
+
+int foo ()
+{
+  return glob_a;
+}
+
+/* glob_a should always be accessed via GOT  */ 
+/* { dg-final { scan-assembler "glob_a\\@GOT" { target { x86_64-*-* } } } } */
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 211826)
+++ doc/invoke.texi	(working copy)
@@ -688,7 +688,8 @@  Objective-C and Objective-C++ Dialects}.
 -m32 -m64 -mx32 -m16 -mlarge-data-threshold=@var{num} @gol
 -msse2avx -mfentry -m8bit-idiv @gol
 -mavx256-split-unaligned-load -mavx256-split-unaligned-store @gol
--mstack-protector-guard=@var{guard}}
+-mstack-protector-guard=@var{guard} @gol
+-mcopyrelocs}
 
 @emph{i386 and x86-64 Windows Options}
 @gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol
@@ -15802,6 +15803,15 @@  locations are @samp{global} for global canary or @
 canary in the TLS block (the default).  This option has effect only when
 @option{-fstack-protector} or @option{-fstack-protector-all} is specified.
 
+@item -mcopyrelocs
+@itemx -mno-copyrelocs
+@opindex mcopyrelocs
+@opindex mno-copyrelocs
+With @option{-fpie} and @option{fPIE}, copy relocations support allows the
+compiler to assume that all symbol references are local.  This allows the
+compiler to skip the GOT for global accesses and this applies only to the
+x86-64 architecture.
+
 @end table
 
 These @samp{-m} switches are supported in addition to the above