From patchwork Wed Jun 3 20:46:35 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sriraman Tallam X-Patchwork-Id: 480184 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 5FAE1140187 for ; Thu, 4 Jun 2015 06:46:51 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=YCEgCL5p; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=rLnUcyDdsq9ubOB+XL NJWE+uDu9S+RiX2O0MihUC/saygOk4yeCBXOp8ukU6IIFaF7TIlv2MBTvyK3d/Nw 4QpdnXCkITWeoMg8X3CFVsolYz2zTWrlqEnbv2aXv275kA1+ReqbgvidEWyC8O3X 7PSimz3fSSPwxd5UpiykWw1Yg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=ayU3tnGj2Md9EYOAOHAlh26I 8Pw=; b=YCEgCL5pRRuSJc90vAU9YjOYVt91oeIee35nkRRNzK8DX+71v5Djts1z F2WA7d5UQenqt2pOhF+fIKfFRsKRvNjbmeCy1ditUDcqA1wmh6R9JWcBmG1les4U VB6HxzsYoX9H/BM53EYyEzb9xd+Ma5t/MemEWVnl8Dxv0qicGaM= Received: (qmail 29598 invoked by alias); 3 Jun 2015 20:46:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 29582 invoked by uid 89); 3 Jun 2015 20:46:43 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.9 required=5.0 tests=AWL, BAYES_50, KAM_ASCII_DIVIDERS, KAM_STOCKGEN, RCVD_IN_DNSWL_LOW, SPF_PASS, T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: mail-vn0-f47.google.com Received: from mail-vn0-f47.google.com (HELO mail-vn0-f47.google.com) (209.85.216.47) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 03 Jun 2015 20:46:40 +0000 Received: by vnbf1 with SMTP id f1so2763811vnb.6 for ; Wed, 03 Jun 2015 13:46:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=9tPGv+pB6oY5Nf7JarkyzHGpYBkROnp/HA3FvsBciJU=; b=exU0uj/Ftdk8b/2ykRsmOFrEZOjzHhq8VHaYhq/RU90ic7+H2g6m++8hLb8TgzjzVR w80/XETSCwE3pZ1Wb9S1a7ll2kHlmi2RQ9C0lKQKvfx16NixNNM9W9roYstjDiUgdBw1 cyYc6SL+xccZkINvZt+rQmYRHQsdx1+eWWzG61UBVO/Q/HfiwI0MXaWVay8lMs7hDNQC LqGJ5UYRq+oE7Wxg7ND0wP6UlkFoOBQWZsIJyFDvqgbtpWIHeXq8BghzjOos8BhRfUwF uF1UU55V+MVJM9dU7n5ah5mWW4fMlMyG1q4i1cPU/mQf9T2qyRbRRwhyTWGoaZNIVQIl puGA== X-Gm-Message-State: ALoCoQn5p85iBmTjp/gjVWgKBbuun2HbRo58XC5TFBlFMfLP57oTgBYJncbAtBUSQQzoXq3dC6pe MIME-Version: 1.0 X-Received: by 10.52.72.40 with SMTP id a8mr50203746vdv.22.1433364398395; Wed, 03 Jun 2015 13:46:38 -0700 (PDT) Received: by 10.52.231.35 with HTTP; Wed, 3 Jun 2015 13:46:35 -0700 (PDT) In-Reply-To: <556F5F04.80603@redhat.com> References: <20150529193552.GA52215@kam.mff.cuni.cz> <556C16B1.5080606@arm.com> <556F5F04.80603@redhat.com> Date: Wed, 3 Jun 2015 13:46:35 -0700 Message-ID: Subject: Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= From: Sriraman Tallam To: Richard Henderson Cc: Ramana Radhakrishnan , Jan Hubicka , "H.J. Lu" , Pedro Alves , Michael Matz , David Li , GCC Patches X-IsSubscribed: yes On Wed, Jun 3, 2015 at 1:09 PM, Richard Henderson wrote: > On 06/03/2015 11:38 AM, Sriraman Tallam wrote: >> + { "no_plt", 0, 0, true, false, false, >> + handle_no_plt_attribute, false }, > > Call it noplt. We don't add the underscore for noinline, noclone, etc. Done. > > > >> Index: config/i386/i386.c >> =================================================================== >> --- config/i386/i386.c (revision 223720) >> +++ config/i386/i386.c (working copy) >> @@ -5479,7 +5479,10 @@ ix86_function_ok_for_sibcall (tree decl, tree exp) >> && !TARGET_64BIT >> && flag_pic >> && flag_plt >> - && decl && !targetm.binds_local_p (decl)) >> + && decl >> + && (TREE_CODE (decl) != FUNCTION_DECL >> + || !lookup_attribute ("no_plt", DECL_ATTRIBUTES (decl))) >> + && !targetm.binds_local_p (decl)) >> return false; >> >> /* If we need to align the outgoing stack, then sibcalling would > > Is this really necessary? I'd expect DECL to be NULL in this case, > since the non-use of the PLT will mean that the (sib)call is indirect. Removed. > > >> @@ -25497,13 +25500,19 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call >> } >> else >> { >> - /* Static functions and indirect calls don't need the pic register. */ >> + /* Static functions and indirect calls don't need the pic register. Also, >> + check if PLT was explicitly avoided via no-plt or "no_plt" attribute, making >> + it an indirect call. */ >> if (flag_pic >> && (!TARGET_64BIT >> || (ix86_cmodel == CM_LARGE_PIC >> && DEFAULT_ABI != MS_ABI)) >> && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF >> - && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) >> + && !SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)) >> + && flag_plt >> + && (TREE_CODE (SYMBOL_REF_DECL (XEXP(fnaddr, 0))) != FUNCTION_DECL >> + || !lookup_attribute ("no_plt", >> + DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP(fnaddr, 0)))))) >> { >> use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); >> if (ix86_use_pseudo_pic_reg ()) > > Why are you testing FUNCTION_DECL? Even if, somehow, the user were producing a > function call to a data symbol, why do you think that lookup_attribute would > produce incorrect results? > > Similarly in ix86_nopic_no_plt_attribute_p. Fixed. Patch attached with those changes. Thanks Sri * c-family/c-common.c (noplt): New attribute. (handle_noplt_attribute): New handler. * calls.c (prepare_call_address): Check for noplt attribute. * config/i386/i386.c (ix86_function_ok_for_sibcall): Check for noplt attribute. (ix86_expand_call): Ditto. (ix86_nopic_noplt_attribute_p): New function. (ix86_output_call_insn): Output indirect call for non-pic no plt calls. * doc/extend.texi (noplt): Document new attribute. * doc/invoke.texi: Document new attribute. * testsuite/gcc.target/i386/noplt-1.c: New test. * testsuite/gcc.target/i386/noplt-2.c: New test. * testsuite/gcc.target/i386/noplt-3.c: New test. * testsuite/gcc.target/i386/noplt-4.c: New test. This patch does two things: * Adds new generic function attribute "noplt" that is similar in functionality to -fno-plt except that it applies only to calls to functions that are marked with this attribute. * For x86_64, it makes -fno-plt(and the attribute) also work for non-PIC code by directly generating an indirect call via a GOT entry. Index: c-family/c-common.c =================================================================== --- c-family/c-common.c (revision 223720) +++ c-family/c-common.c (working copy) @@ -357,6 +357,7 @@ static tree handle_mode_attribute (tree *, tree, t static tree handle_section_attribute (tree *, tree, tree, int, bool *); static tree handle_aligned_attribute (tree *, tree, tree, int, bool *); static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ; +static tree handle_noplt_attribute (tree *, tree, tree, int, bool *) ; static tree handle_alias_ifunc_attribute (bool, tree *, tree, tree, bool *); static tree handle_ifunc_attribute (tree *, tree, tree, int, bool *); static tree handle_alias_attribute (tree *, tree, tree, int, bool *); @@ -706,6 +707,8 @@ const struct attribute_spec c_common_attribute_tab handle_aligned_attribute, false }, { "weak", 0, 0, true, false, false, handle_weak_attribute, false }, + { "noplt", 0, 0, true, false, false, + handle_noplt_attribute, false }, { "ifunc", 1, 1, true, false, false, handle_ifunc_attribute, false }, { "alias", 1, 1, true, false, false, @@ -8185,6 +8188,25 @@ handle_weak_attribute (tree *node, tree name, return NULL_TREE; } +/* Handle a "noplt" attribute; arguments as in + struct attribute_spec.handler. */ + +static tree +handle_noplt_attribute (tree *node, tree name, + tree ARG_UNUSED (args), + int ARG_UNUSED (flags), + bool * ARG_UNUSED (no_add_attrs)) +{ + if (TREE_CODE (*node) != FUNCTION_DECL) + { + warning (OPT_Wattributes, + "%qE attribute is only applicable on functions", name); + *no_add_attrs = true; + return NULL_TREE; + } + return NULL_TREE; +} + /* Handle an "alias" or "ifunc" attribute; arguments as in struct attribute_spec.handler, except that IS_ALIAS tells us whether this is an alias as opposed to ifunc attribute. */ Index: calls.c =================================================================== --- calls.c (revision 223720) +++ calls.c (working copy) @@ -226,10 +226,16 @@ prepare_call_address (tree fndecl_or_type, rtx fun && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) : memory_address (FUNCTION_MODE, funexp)); - else if (flag_pic && !flag_plt && fndecl_or_type + else if (flag_pic + && fndecl_or_type && TREE_CODE (fndecl_or_type) == FUNCTION_DECL + && (!flag_plt + || lookup_attribute ("noplt", DECL_ATTRIBUTES (fndecl_or_type))) && !targetm.binds_local_p (fndecl_or_type)) { + /* This is done only for PIC code. There is no easy interface to force the + function address into GOT for non-PIC case. non-PIC case needs to be + handled specially by the backend. */ funexp = force_reg (Pmode, funexp); } else if (! sibcallp) Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 223720) +++ config/i386/i386.c (working copy) @@ -25497,13 +25497,18 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call } else { - /* Static functions and indirect calls don't need the pic register. */ + /* Static functions and indirect calls don't need the pic register. Also, + check if PLT was explicitly avoided via no-plt or "noplt" attribute, making + it an indirect call. */ if (flag_pic && (!TARGET_64BIT || (ix86_cmodel == CM_LARGE_PIC && DEFAULT_ABI != MS_ABI)) && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF - && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) + && !SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)) + && flag_plt + && !lookup_attribute ("noplt", + DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP(fnaddr, 0))))) { use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); if (ix86_use_pseudo_pic_reg ()) @@ -25598,7 +25603,31 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call return call; } +/* Return true if the function being called was marked with attribute "noplt" + or using -fno-plt and we are compiling for non-PIC and x86_64. We need to + handle the non-PIC case in the backend because there is no easy interface + for the front-end to force non-PLT calls to use the GOT. This is currently + used only with 64-bit ELF targets to call the function marked "noplt" + indirectly. */ +static bool +ix86_nopic_noplt_attribute_p (rtx call_op) +{ + if (flag_pic || ix86_cmodel == CM_LARGE + || !TARGET_64BIT || TARGET_MACHO|| TARGET_SEH || TARGET_PECOFF + || SYMBOL_REF_LOCAL_P (call_op)) + return false; + + tree symbol_decl = SYMBOL_REF_DECL (call_op); + + if (!flag_plt + || (symbol_decl != NULL_TREE + && lookup_attribute ("noplt", DECL_ATTRIBUTES (symbol_decl)))) + return true; + + return false; +} + /* Output the assembly for a call instruction. */ const char * @@ -25610,7 +25639,9 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op if (SIBLING_CALL_P (insn)) { - if (direct_p) + if (direct_p && ix86_nopic_noplt_attribute_p (call_op)) + xasm = "%!jmp\t*%p0@GOTPCREL(%%rip)"; + else if (direct_p) xasm = "%!jmp\t%P0"; /* SEH epilogue detection requires the indirect branch case to include REX.W. */ @@ -25653,7 +25684,9 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op seh_nop_p = true; } - if (direct_p) + if (direct_p && ix86_nopic_noplt_attribute_p (call_op)) + xasm = "%!call\t*%p0@GOTPCREL(%%rip)"; + else if (direct_p) xasm = "%!call\t%P0"; else xasm = "%!call\t%A0"; Index: doc/extend.texi =================================================================== --- doc/extend.texi (revision 223720) +++ doc/extend.texi (working copy) @@ -2916,6 +2916,35 @@ the standard C library can be guaranteed not to th with the notable exceptions of @code{qsort} and @code{bsearch} that take function pointer arguments. +@item noplt +@cindex @code{noplt} function attribute +The @code{noplt} attribute is the counterpart to option @option{-fno-plt} and +does not use PLT for calls to functions marked with this attribute in position +independent code. + +@smallexample +@group +/* Externally defined function foo. */ +int foo () __attribute__ ((noplt)); + +int +main (/* @r{@dots{}} */) +@{ + /* @r{@dots{}} */ + foo (); + /* @r{@dots{}} */ +@} +@end group +@end smallexample + +The @code{noplt} attribute on function foo tells the compiler to assume that +the function foo is externally defined and the call to foo must avoid the PLT +in position independent code. + +Additionally, a few targets also convert calls to those functions that are +marked to not use the PLT to use the GOT instead for non-position independent +code. + @item optimize @cindex @code{optimize} function attribute The @code{optimize} attribute is used to specify that a function is to Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 223720) +++ doc/invoke.texi (working copy) @@ -23868,6 +23868,14 @@ PLT stubs expect GOT pointer in a specific registe register allocation freedom to the compiler. Lazy binding requires PLT: with @option{-fno-plt} all external symbols are resolved at load time. +Alternatively, function attribute @code{noplt} can be used to avoid PLT +for calls to specific external functions by marking those functions with +this attribute. + +Additionally, a few targets also convert calls to those functions that are +marked to not use the PLT to use the GOT instead for non-position independent +code. + @item -fno-jump-tables @opindex fno-jump-tables Do not use jump tables for switch statements even where it would be Index: testsuite/gcc.target/i386/noplt-1.c =================================================================== --- testsuite/gcc.target/i386/noplt-1.c (revision 0) +++ testsuite/gcc.target/i386/noplt-1.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-fno-pic" } */ + +__attribute__ ((noplt)) +void foo(); + +int main() +{ + foo(); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-2.c =================================================================== --- testsuite/gcc.target/i386/noplt-2.c (revision 0) +++ testsuite/gcc.target/i386/noplt-2.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-O2 -fno-pic" } */ + + +__attribute__ ((noplt)) +int foo(); + +int main() +{ + return foo(); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-3.c =================================================================== --- testsuite/gcc.target/i386/noplt-3.c (revision 0) +++ testsuite/gcc.target/i386/noplt-3.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-fno-pic -fno-plt" } */ + +void foo(); + +int main() +{ + foo(); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-4.c =================================================================== --- testsuite/gcc.target/i386/noplt-4.c (revision 0) +++ testsuite/gcc.target/i386/noplt-4.c (working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-O2 -fno-pic -fno-plt" } */ + +int foo(); + +int main() +{ + return foo(); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */