Message ID | 20141218131150.GA32638@intel.com |
---|---|
State | New |
Headers | show |
On Thu, Dec 18, 2014 at 2:11 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: > The Linux kernel never passes floating point arguments around, vararg > functions or not. Hence no vector registers are ever used when calling a > vararg function. But gcc still dutifully emits an "xor %eax,%eax" before > each and every call of a vararg function. Since no callee use that for > anything, these instructions are redundant. > > This patch adds the -mskip-rax-setup option to skip setting up RAX > register when SSE is disabled and there are no variable arguments passed > in vector registers. Since RAX register is used to avoid unnecessarily > saving vector registers on stack when passing variable arguments, the > impacts of this option are callees may waste some stack space, misbehave > or jump to a random location. GCC 4.4 or newer don't those issues, > regardless the RAX register value since they don't check the RAX register > value when SSE is disabled, regardless the RAX register value: > > https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html > > I used it on kernel 3.17.7: > > text data bss dec hex filename > 11493571 2271232 5926912 19691715 12c78c3 vmlinux.skip-rax > 11517879 2271232 5926912 19716023 12cd7b7 vmlinux.orig > > It removed 14309 redundant "xor %eax,%eax" instructions and saved about > 27KB. I am currently running the new kernel without any problem. OK > for trunk? How about skipping RAX setup unconditionally for !TARGET_SSE? Please see ix86_conditional_register_usage, where SSE registers are squashed for !TARGET_SSE, so it is not possible to use them even in the inline asm. Uros.
On Thu, Dec 18, 2014 at 2:24 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Thu, Dec 18, 2014 at 2:11 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: >> The Linux kernel never passes floating point arguments around, vararg >> functions or not. Hence no vector registers are ever used when calling a >> vararg function. But gcc still dutifully emits an "xor %eax,%eax" before >> each and every call of a vararg function. Since no callee use that for >> anything, these instructions are redundant. >> >> This patch adds the -mskip-rax-setup option to skip setting up RAX >> register when SSE is disabled and there are no variable arguments passed >> in vector registers. Since RAX register is used to avoid unnecessarily >> saving vector registers on stack when passing variable arguments, the >> impacts of this option are callees may waste some stack space, misbehave >> or jump to a random location. GCC 4.4 or newer don't those issues, >> regardless the RAX register value since they don't check the RAX register >> value when SSE is disabled, regardless the RAX register value: >> >> https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html >> >> I used it on kernel 3.17.7: >> >> text data bss dec hex filename >> 11493571 2271232 5926912 19691715 12c78c3 vmlinux.skip-rax >> 11517879 2271232 5926912 19716023 12cd7b7 vmlinux.orig >> >> It removed 14309 redundant "xor %eax,%eax" instructions and saved about >> 27KB. I am currently running the new kernel without any problem. OK >> for trunk? > > How about skipping RAX setup unconditionally for !TARGET_SSE? Please > see ix86_conditional_register_usage, where SSE registers are squashed > for !TARGET_SSE, so it is not possible to use them even in the inline > asm. ... when -ffreestanding is in effect, of course. Uros.
On Thu, Dec 18, 2014 at 02:24:06PM +0100, Uros Bizjak wrote: > > It removed 14309 redundant "xor %eax,%eax" instructions and saved about > > 27KB. I am currently running the new kernel without any problem. OK > > for trunk? > > How about skipping RAX setup unconditionally for !TARGET_SSE? Please > see ix86_conditional_register_usage, where SSE registers are squashed > for !TARGET_SSE, so it is not possible to use them even in the inline > asm. I'd say a problem is if a -mno-sse TU calls a vararg function (obviously it can't pass any float/double arguments) to a function in a TU compiled with -msse2 or higher where the stdarg pass can't figure out anything, say #include <stdarg.h> extern void bar (int, va_list); void foo (int x, ...) { va_list ap; va_start (ap, x); bar (x, ap); va_end (ap); } If foo is compiled with gcc 4.4 and earlier, it might crash when called from -mno-sse caller that would not xor %eax,%eax. If foo is compiled with gcc 4.5? and higher, then it might just randomly save all the xmm registers to stack (as the test is %al != 0, I think it will be more likely that it will save it unnecessarily than not). So I view H.J.'s new option as a user guarantee the callee will be also -mno-sse. Jakub
On Thu, Dec 18, 2014 at 5:51 AM, Jakub Jelinek <jakub@redhat.com> wrote: > On Thu, Dec 18, 2014 at 02:24:06PM +0100, Uros Bizjak wrote: >> > It removed 14309 redundant "xor %eax,%eax" instructions and saved about >> > 27KB. I am currently running the new kernel without any problem. OK >> > for trunk? >> >> How about skipping RAX setup unconditionally for !TARGET_SSE? Please >> see ix86_conditional_register_usage, where SSE registers are squashed >> for !TARGET_SSE, so it is not possible to use them even in the inline >> asm. > > I'd say a problem is if a -mno-sse TU calls a vararg function (obviously > it can't pass any float/double arguments) to a function in a TU compiled > with -msse2 or higher where the stdarg pass can't figure out anything, say > #include <stdarg.h> > extern void bar (int, va_list); > void > foo (int x, ...) > { > va_list ap; > va_start (ap, x); > bar (x, ap); > va_end (ap); > } > If foo is compiled with gcc 4.4 and earlier, it might crash when called from This was checked into GCC 4.4: commit d5d9458afdae9056a9624ae9c332dbcdc2f383be Author: jakub <jakub@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Tue Sep 2 19:49:41 2008 +0000 * config/i386/i386.c (X86_64_VARARGS_SIZE): Removed. (setup_incoming_varargs_64): Assume cum != NULL. Set/check ix86_varargs_gpr_size and ix86_varargs_fpr_size. Use ix86_varargs_gpr_size instead of X86_64_REGPARM_MAX. Don't set ix86_save_varrargs_registers. (ix86_setup_incoming_varargs): Assume cum != NULL. (ix86_va_start): Check ix86_varargs_gpr_size and ix86_varargs_fpr_size instead of cfun->va_list_gpr_size and cfun->va_list_fpr_size, respectively. Subtract 8*X86_64_REGPARM_MAX from frame pointer if ix86_varargs_gpr_size == 0. (ix86_compute_frame_layout): Updated. * config/i386/i386.h (ix86_save_varrargs_registers): Removed. (ix86_varargs_gpr_size): Define. (ix86_varargs_fpr_size): Likewise. (machine_function): Remove save_varrargs_registers. Add varargs_gpr_size and varargs_fpr_size. * gcc.target/i386/amd64-abi-3.c: New test. * gcc.target/i386/amd64-abi-4.c: Likewise. * gcc.target/i386/amd64-abi-5.c: Likewise. * gcc.target/i386/amd64-abi-6.c: Likewise. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@139910 138bc75d-0d04-0410-961f-82ee72b054a4 > -mno-sse caller that would not xor %eax,%eax. If foo is compiled with gcc > 4.5? and higher, then it might just randomly save all the xmm registers to GCC 4.4 is fine. > stack (as the test is %al != 0, I think it will be more likely that it will > save it unnecessarily than not). > So I view H.J.'s new option as a user guarantee the callee will be also -mno-sse. > If foo may be compiled by GCC 4.3 or older, we can't skip setting up RAX in foo's callers. Such an option is useful.
On Thu, Dec 18, 2014 at 2:49 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >>> The Linux kernel never passes floating point arguments around, vararg >>> functions or not. Hence no vector registers are ever used when calling a >>> vararg function. But gcc still dutifully emits an "xor %eax,%eax" before >>> each and every call of a vararg function. Since no callee use that for >>> anything, these instructions are redundant. >>> >>> This patch adds the -mskip-rax-setup option to skip setting up RAX >>> register when SSE is disabled and there are no variable arguments passed >>> in vector registers. Since RAX register is used to avoid unnecessarily >>> saving vector registers on stack when passing variable arguments, the >>> impacts of this option are callees may waste some stack space, misbehave >>> or jump to a random location. GCC 4.4 or newer don't those issues, >>> regardless the RAX register value since they don't check the RAX register >>> value when SSE is disabled, regardless the RAX register value: >>> >>> https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html >>> >>> I used it on kernel 3.17.7: >>> >>> text data bss dec hex filename >>> 11493571 2271232 5926912 19691715 12c78c3 vmlinux.skip-rax >>> 11517879 2271232 5926912 19716023 12cd7b7 vmlinux.orig >>> >>> It removed 14309 redundant "xor %eax,%eax" instructions and saved about >>> 27KB. I am currently running the new kernel without any problem. OK >>> for trunk? >> >> How about skipping RAX setup unconditionally for !TARGET_SSE? Please >> see ix86_conditional_register_usage, where SSE registers are squashed >> for !TARGET_SSE, so it is not possible to use them even in the inline >> asm. > > ... when -ffreestanding is in effect, of course. Ops, this is not the unconditional default kernel compile flag. It is defined only for 32bit builds, where: # temporary until string.h is fixed KBUILD_CFLAGS += -ffreestanding Yes, it looks to me that new option is the way to go. Uros.
On Thu, Dec 18, 2014 at 6:03 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Thu, Dec 18, 2014 at 2:49 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > >>>> The Linux kernel never passes floating point arguments around, vararg >>>> functions or not. Hence no vector registers are ever used when calling a >>>> vararg function. But gcc still dutifully emits an "xor %eax,%eax" before >>>> each and every call of a vararg function. Since no callee use that for >>>> anything, these instructions are redundant. >>>> >>>> This patch adds the -mskip-rax-setup option to skip setting up RAX >>>> register when SSE is disabled and there are no variable arguments passed >>>> in vector registers. Since RAX register is used to avoid unnecessarily >>>> saving vector registers on stack when passing variable arguments, the >>>> impacts of this option are callees may waste some stack space, misbehave >>>> or jump to a random location. GCC 4.4 or newer don't those issues, >>>> regardless the RAX register value since they don't check the RAX register >>>> value when SSE is disabled, regardless the RAX register value: >>>> >>>> https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html >>>> >>>> I used it on kernel 3.17.7: >>>> >>>> text data bss dec hex filename >>>> 11493571 2271232 5926912 19691715 12c78c3 vmlinux.skip-rax >>>> 11517879 2271232 5926912 19716023 12cd7b7 vmlinux.orig >>>> >>>> It removed 14309 redundant "xor %eax,%eax" instructions and saved about >>>> 27KB. I am currently running the new kernel without any problem. OK >>>> for trunk? >>> >>> How about skipping RAX setup unconditionally for !TARGET_SSE? Please >>> see ix86_conditional_register_usage, where SSE registers are squashed >>> for !TARGET_SSE, so it is not possible to use them even in the inline >>> asm. >> >> ... when -ffreestanding is in effect, of course. > > Ops, this is not the unconditional default kernel compile flag. It is > defined only for 32bit builds, where: > > # temporary until string.h is fixed > KBUILD_CFLAGS += -ffreestanding > > Yes, it looks to me that new option is the way to go. Is this an OK? Some really old gcc versions used an indirect jump based on the eax input and they didn't zero extend first. So with those compilers, you could actually jump to a random location. You can enable it only when you can compile everything with a newer GCC. Thanks.
On Thu, Dec 18, 2014 at 3:09 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>> The Linux kernel never passes floating point arguments around, vararg >>>>> functions or not. Hence no vector registers are ever used when calling a >>>>> vararg function. But gcc still dutifully emits an "xor %eax,%eax" before >>>>> each and every call of a vararg function. Since no callee use that for >>>>> anything, these instructions are redundant. >>>>> >>>>> This patch adds the -mskip-rax-setup option to skip setting up RAX >>>>> register when SSE is disabled and there are no variable arguments passed >>>>> in vector registers. Since RAX register is used to avoid unnecessarily >>>>> saving vector registers on stack when passing variable arguments, the >>>>> impacts of this option are callees may waste some stack space, misbehave >>>>> or jump to a random location. GCC 4.4 or newer don't those issues, >>>>> regardless the RAX register value since they don't check the RAX register >>>>> value when SSE is disabled, regardless the RAX register value: >>>>> >>>>> https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html >>>>> >>>>> I used it on kernel 3.17.7: >>>>> >>>>> text data bss dec hex filename >>>>> 11493571 2271232 5926912 19691715 12c78c3 vmlinux.skip-rax >>>>> 11517879 2271232 5926912 19716023 12cd7b7 vmlinux.orig >>>>> >>>>> It removed 14309 redundant "xor %eax,%eax" instructions and saved about >>>>> 27KB. I am currently running the new kernel without any problem. OK >>>>> for trunk? >>>> >>>> How about skipping RAX setup unconditionally for !TARGET_SSE? Please >>>> see ix86_conditional_register_usage, where SSE registers are squashed >>>> for !TARGET_SSE, so it is not possible to use them even in the inline >>>> asm. >>> >>> ... when -ffreestanding is in effect, of course. >> >> Ops, this is not the unconditional default kernel compile flag. It is >> defined only for 32bit builds, where: >> >> # temporary until string.h is fixed >> KBUILD_CFLAGS += -ffreestanding >> >> Yes, it looks to me that new option is the way to go. > > Is this an OK? In principle, I'm OK with the patch approach, but let's wait for eventual comments from Linux people. > Some really old gcc versions used an indirect jump based on the eax input > and they didn't zero extend first. So with those compilers, you could actually > jump to a random location. You can enable it only when you can compile > everything with a newer GCC. Uros.
On 12/18/2014 06:12 AM, Uros Bizjak wrote: >>> >>> # temporary until string.h is fixed >>> KBUILD_CFLAGS += -ffreestanding >>> >>> Yes, it looks to me that new option is the way to go. >> >> Is this an OK? > > In principle, I'm OK with the patch approach, but let's wait for > eventual comments from Linux people. > Acked-by: H. Peter Anvin <hpa@linux.intel.com> H.J. already coordinated with us; we are more than happy with this approach. Thank you! -hpa
On Thu, Dec 18, 2014 at 9:08 AM, H. Peter Anvin <hpa@zytor.com> wrote: > On 12/18/2014 06:12 AM, Uros Bizjak wrote: >>>> >>>> # temporary until string.h is fixed >>>> KBUILD_CFLAGS += -ffreestanding >>>> >>>> Yes, it looks to me that new option is the way to go. >>> >>> Is this an OK? >> >> In principle, I'm OK with the patch approach, but let's wait for >> eventual comments from Linux people. >> > > Acked-by: H. Peter Anvin <hpa@linux.intel.com> > > H.J. already coordinated with us; we are more than happy with this approach. > > Thank you! I am checking it in now. Thanks.
On Thu, Dec 18, 2014 at 6:08 PM, H. Peter Anvin <hpa@zytor.com> wrote: > On 12/18/2014 06:12 AM, Uros Bizjak wrote: >>>> >>>> # temporary until string.h is fixed >>>> KBUILD_CFLAGS += -ffreestanding >>>> >>>> Yes, it looks to me that new option is the way to go. >>> >>> Is this an OK? >> >> In principle, I'm OK with the patch approach, but let's wait for >> eventual comments from Linux people. >> > > Acked-by: H. Peter Anvin <hpa@linux.intel.com> > > H.J. already coordinated with us; we are more than happy with this approach. Great, I'm glad that this specialized option has uses. Patch is OK. Thanks, Uros.
diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 24a252a..de7907a 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,11 @@ +2014-12-18 H.J. Lu <hongjiu.lu@intel.com> + + * config/i386/i386.c (ix86_expand_call): Skip setting up RAX + register for -mskip-rax-setup when there are no parameters + passed in vector registers. + * config/i386/i386.opt (mskip-rax-setup): New option. + * doc/invoke.texi: Document -mskip-rax-setup. + 2014-12-18 Martin Liska <mliska@suse.cz> PR ipa/64146 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 17ef751..122a350 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -25461,7 +25461,12 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, } } - if (TARGET_64BIT && INTVAL (callarg2) >= 0) + /* Skip setting up RAX register for -mskip-rax-setup when there are no + parameters passed in vector registers. */ + if (TARGET_64BIT + && (INTVAL (callarg2) > 0 + || (INTVAL (callarg2) == 0 + && (TARGET_SSE || !flag_skip_rax_setup)))) { rtx al = gen_rtx_REG (QImode, AX_REG); emit_move_insn (al, callarg2); diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 3d54bfa..6dc4da2 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -831,6 +831,10 @@ Target Report Var(flag_nop_mcount) Init(0) Generate mcount/__fentry__ calls as nops. To activate they need to be patched in. +mskip-rax-setup +Target Report Var(flag_skip_rax_setup) Init(0) +Skip setting up RAX register when passing variable arguments. + m8bit-idiv Target Report Mask(USE_8BIT_IDIV) Save Expand 32bit/64bit integer divide into 8bit unsigned integer divide with run-time check diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 15068da..33a7ed2 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -16256,6 +16256,19 @@ the profiling functions as nops. This is useful when they should be patched in later dynamically. This is likely only useful together with @option{-mrecord-mcount}. +@item -mskip-rax-setup +@itemx -mno-skip-rax-setup +@opindex mskip-rax-setup +When generating code for the x86-64 architecture with SSE extensions +disabled, @option{-skip-rax-setup} can be used to skip setting up RAX +register when there are no variable arguments passed in vector registers. + +@strong{Warning:} Since RAX register is used to avoid unnecessarily +saving vector registers on stack when passing variable arguments, the +impacts of this option are callees may waste some stack space, +misbehave or jump to a random location. GCC 4.4 or newer don't have +those issues, regardless the RAX register value. + @item -m8bit-idiv @itemx -mno-8bit-idiv @opindex 8bit-idiv diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 025dfce..6c06503 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,9 @@ +2014-12-18 H.J. Lu <hongjiu.lu@intel.com> + + * gcc.target/i386/amd64-abi-7.c: New tests. + * gcc.target/i386/amd64-abi-8.c: Likwise. + * gcc.target/i386/amd64-abi-9.c: Likwise. + 2014-12-18 Martin Liska <mliska@suse.cz> * g++.dg/ipa/pr64146.C: New test. diff --git a/gcc/testsuite/gcc.target/i386/amd64-abi-7.c b/gcc/testsuite/gcc.target/i386/amd64-abi-7.c new file mode 100644 index 0000000..fcca680 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/amd64-abi-7.c @@ -0,0 +1,46 @@ +/* { dg-do run { target { ! { ia32 } } } } */ +/* { dg-options "-O2 -mno-sse" } */ + +#include <stdarg.h> +#include <assert.h> + +int n1 = 30; +int n2 = 324; +void *n3 = (void *) &n2; +int n4 = 407; + +int e1; +int e2; +void *e3; +int e4; + +static void +__attribute__((noinline)) +foo (va_list va_arglist) +{ + e2 = va_arg (va_arglist, int); + e3 = va_arg (va_arglist, void *); + e4 = va_arg (va_arglist, int); +} + +static void +__attribute__((noinline)) +test (int a1, ...) +{ + va_list va_arglist; + e1 = a1; + va_start (va_arglist, a1); + foo (va_arglist); + va_end (va_arglist); +} + +int +main () +{ + test (n1, n2, n3, n4); + assert (n1 == e1); + assert (n2 == e2); + assert (n3 == e3); + assert (n4 == e4); + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/amd64-abi-8.c b/gcc/testsuite/gcc.target/i386/amd64-abi-8.c new file mode 100644 index 0000000..b25ceec --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/amd64-abi-8.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target { ! { ia32 } } } } */ +/* { dg-options "-O2 -mno-sse -mskip-rax-setup" } */ +/* { dg-final { scan-assembler-not "xorl\[\\t \]*\\\%eax,\[\\t \]*%eax" } } */ + +void foo (const char *, ...); + +void +test1 (void) +{ + foo ("%d", 20); +} + +int +test2 (void) +{ + foo ("%d", 20); + return 3; +} diff --git a/gcc/testsuite/gcc.target/i386/amd64-abi-9.c b/gcc/testsuite/gcc.target/i386/amd64-abi-9.c new file mode 100644 index 0000000..4707eb7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/amd64-abi-9.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target { ! { ia32 } } } } */ +/* { dg-options "-O2 -mno-sse -mno-skip-rax-setup" } */ +/* { dg-final { scan-assembler-times "xorl\[\\t \]*\\\%eax,\[\\t \]*%eax" 2 } } */ + +void foo (const char *, ...); + +void +test1 (void) +{ + foo ("%d", 20); +} + +int +test2 (void) +{ + foo ("%d", 20); + return 3; +}