From patchwork Thu May 15 18:34:12 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sriraman Tallam X-Patchwork-Id: 349332 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id B50C7140087 for ; Fri, 16 May 2014 04:34:25 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:date:message-id:subject:from:to:content-type; q= dns; s=default; b=PkM1KF4qqh/UCUikqSn6J7Qf5ysgphbHtT39uxwn8KF2lN zfg9s+16aFGPXkeAqwyqBCotMRTZB3u/nlNgEvWQbdhQQUiFF0txQrhaEP5szAcS aKGVgih0HjhZQuEjeIj1i1AdwVSerQEWJKrGp+sd/4YT1mup22RcsWqKjRatU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:date:message-id:subject:from:to:content-type; s= default; bh=xp1r6u3YwuQoq9AnQSGyB5wvFS0=; b=tSP8fZngLQDFFmNQfemU jgwQ2HcgGwGn8lC/hTEaUfAWV5rOCa/1jN44YUHXu+XqItTd+pSZbQOxBUPxwXC0 GBTAsJauhfXMQl1wj9g8fyfH43zY5hz65WyaUDBJFW9UqPXENhNkw4jYiG1fVuMO DdEhMTJKrMLYX6GjZx/9r1o= Received: (qmail 7342 invoked by alias); 15 May 2014 18:34:19 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 7320 invoked by uid 89); 15 May 2014 18:34:18 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL, BAYES_00, KAM_STOCKGEN, RCVD_IN_DNSWL_LOW, RP_MATCHES_RCVD, SPF_PASS autolearn=no version=3.3.2 X-HELO: mail-vc0-f176.google.com Received: from mail-vc0-f176.google.com (HELO mail-vc0-f176.google.com) (209.85.220.176) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Thu, 15 May 2014 18:34:15 +0000 Received: by mail-vc0-f176.google.com with SMTP id lg15so4839608vcb.7 for ; Thu, 15 May 2014 11:34:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=Fe8O7d9CDXB3faYkt7p1qo4eyODT/uSPxUSlWzDQxkE=; b=lssHSrhGnahkDOwD8MG4Ojf0p16a/78PPrTkD0sjLzF1lD6x5PoJiFfKocJ0TwEscc pX3zC6KvGYWJYwNa8j6pvWqTZR6V/s0aApC7oYqM34bbM2UkxE4xkp8DPSwhhEwUY5s7 BCRvZEKTEKhFnDFryhFu8uzUelhnIClgjo1v53WOC3vZaiort86LKunLdOwKs2pNkbOz UqNz6djzS9cAIQ5a1bmvAJsu0i0B1yOXxeVFGKBWgR1AoaqVMJkzNZBU0Mf/cBL6I71F EMTqb6HRXGsrm7Rci8HQPuQ8grY4ja5vk2SgrwW6r+4GzvrOntPRTV5T8U1b0yHFFqWT CMrA== X-Gm-Message-State: ALoCoQkqVDDOGU8R8sXpbVtg+jAR7q3qRi/sS7zq5uQiasqHB/7mD4pgwd7G1nrWcn3wqAnaClMw MIME-Version: 1.0 X-Received: by 10.220.89.4 with SMTP id c4mr95903vcm.53.1400178852140; Thu, 15 May 2014 11:34:12 -0700 (PDT) Received: by 10.52.29.48 with HTTP; Thu, 15 May 2014 11:34:12 -0700 (PDT) Date: Thu, 15 May 2014 11:34:12 -0700 Message-ID: Subject: [PATCH x86_64] Optimize access to globals in "-fpie -pie" builds with copy relocations From: Sriraman Tallam To: GCC Patches , David Li , Cary Coutant , Ian Lance Taylor X-IsSubscribed: yes Optimize access to globals with -fpie, x86_64 only: Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module using the GOT. This is two instructions, one to get the address of the global from the GOT and the other to get the value. If it turns out that the global gets defined in the executable at link-time, it still needs to go through the GOT as it is too late then to generate a direct access. Examples: foo.cc ------ int a_glob; int main () { return a_glob; // defined in this file } With -O2 -fpie -pie, the generated code directly accesses the global via PC-relative insn: 5e0
: mov 0x165a(%rip),%eax # 1c40 foo.cc ------ extern int a_glob; int main () { return a_glob; // defined in this file } With -O2 -fpie -pie, the generated code accesses global via GOT using two memory loads: 6f0
: mov 0x1609(%rip),%rax # 1d00 <_DYNAMIC+0x230> mov (%rax),%eax This is true even if in the latter case the global was defined in the executable through a different file. Some experiments on google benchmarks shows that the extra memory loads affects performance by 1% to 5%. Solution - Copy Relocations: When the linker supports copy relocations, GCC can always assume that the global will be defined in the executable. For globals that are truly extern (come from shared objects), the linker will create copy relocations and have them defined in the executable. Result is that no global access needs to go through the GOT and hence improves performance. This patch to the gold linker : https://sourceware.org/ml/binutils/2014-05/msg00092.html submitted recently allows gold to generate copy relocations for -pie mode when necessary. I have added option -mld-pie-copyrelocs which when combined with -fpie would do this. Note that the BFD linker does not support pie copyrelocs yet and this option cannot be used there. Please review. ChangeLog: * config/i386/i36.opt (mld-pie-copyrelocs): New option. * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this address is still legitimate in the presence of copy relocations and -fpie. * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test. * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test. Patch attached. Thanks Sri Optimize access to globals with -fpie, x86_64 only: Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module using the GOT. This is two instructions, one to get the address of the global from the GOT and the other to get the value. If it turns out that the global gets defined in the executable at link-time, it still needs to go through the GOT as it is too late then to generate a direct access. Examples: foo.cc ------ int a_glob; int main () {  return a_glob; // defined in this file } With -O2 -fpie -pie, the generated code directly accesses the global via PC-relative insn: 5e0   
:   mov    0x165a(%rip),%eax        # 1c40 foo.cc ------ extern int a_glob; int main () {  return a_glob; // defined in this file } With -O2 -fpie -pie, the generated code accesses global via GOT using two memory loads: 6f0  
:   mov    0x1609(%rip),%rax   # 1d00 <_DYNAMIC+0x230>   mov    (%rax),%eax This is true even if in the latter case the global was defined in the executable through a different file. Some experiments on google benchmarks shows that the extra memory loads affects performance by 1% to 5%. Solution - Copy Relocations: When the linker supports copy relocations, GCC can always assume that the global will be defined in the executable. For globals that are truly extern (come from shared objects), the linker will create copy relocations and have them defined in the executable. Result is that no global access needs to go through the GOT and hence improves performance. This patch to the gold linker : https://sourceware.org/ml/binutils/2014-05/msg00092.html submitted recently allows gold to generate copy relocations for -pie mode when necessary. I have added option -mld-pie-copyrelocs which when combined with -fpie would do this. Note that the BFD linker does not support pie copyrelocs yet and this option cannot be used there. Please review. ChangeLog: * config/i386/i36.opt (mld-pie-copyrelocs): New option. * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this address is still legitimate in the presence of copy relocations and -fpie. * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test. * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test. Index: config/i386/i386.opt =================================================================== --- config/i386/i386.opt (revision 210437) +++ config/i386/i386.opt (working copy) @@ -108,6 +108,10 @@ int x_ix86_dump_tunes TargetSave int x_ix86_force_align_arg_pointer +;; -mld-pie-copyrelocs +TargetSave +int x_ix86_ld_pie_copyrelocs + ;; -mforce-drap= TargetSave int x_ix86_force_drap @@ -291,6 +295,10 @@ mfancy-math-387 Target RejectNegative Report InverseMask(NO_FANCY_MATH_387, USE_FANCY_MATH_387) Save Generate sin, cos, sqrt for FPU +mld-pie-copyrelocs +Target Report Var(ix86_ld_pie_copyrelocs) Init(0) +Use linker copy relocs for pie + mforce-drap Target Report Var(ix86_force_drap) Always use Dynamic Realigned Argument Pointer (DRAP) to realign stack Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 210437) +++ config/i386/i386.c (working copy) @@ -12684,7 +12684,9 @@ legitimate_pic_address_disp_p (rtx disp) return true; } else if (!SYMBOL_REF_FAR_ADDR_P (op0) - && SYMBOL_REF_LOCAL_P (op0) + && (SYMBOL_REF_LOCAL_P (op0) + || (TARGET_64BIT && ix86_ld_pie_copyrelocs && flag_pie + && !SYMBOL_REF_FUNCTION_P (op0))) && ix86_cmodel != CM_LARGE_PIC) return true; break; Index: testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c =================================================================== --- testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c (revision 0) +++ testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c (revision 0) @@ -0,0 +1,13 @@ +/* Test if -mld-pie-copyrelocs does the right thing. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fpie -mld-pie-copyrelocs" } */ + +extern int glob_a; + +int foo () +{ + return glob_a; +} + +/* glob_a should never be accessed with a GOTPCREL */ +/* { dg-final { scan-assembler-not "glob_a\\@GOTPCREL" { target { x86_64-*-* } } } } */ Index: testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c =================================================================== --- testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c (revision 0) +++ testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c (revision 0) @@ -0,0 +1,13 @@ +/* Test if -mno-ld-pie-copyrelocs does the right thing. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fpie -mno-ld-pie-copyrelocs" } */ + +extern int glob_a; + +int foo () +{ + return glob_a; +} + +/* glob_a should always be accessed via GOT */ +/* { dg-final { scan-assembler "glob_a\\@GOT" { target { x86_64-*-* } } } } */