Message ID | 20160804043344.GA8391@ibm-tiger.the-meissners.org |
---|---|
State | New |
Headers | show |
Hi Mike, On Thu, Aug 04, 2016 at 12:33:44AM -0400, Michael Meissner wrote: > I built spec 2006 with these patches on a little endian power8 system, and at > least 18 of the benchmarks had vector initializations replaced. Most > benchmarks only used the initialization in a few places, but gamess, dealII, > h264ref, and wrf each had over 100 initializations changed. Did performance change? > I have tried these patches on a big endian power7 system (both 32-bit and > 64-bit targets), on a big endian power8 system (just 64-bit targets), and a > little endian power8 system (just 64-bit targets). There were no regressions > on any of the systems. Can I install these patches to the trunk? Some questions below, okay for trunk with those taken care of. Thanks. > --- gcc/config/rs6000/rs6000.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 239098) > +++ gcc/config/rs6000/rs6000.c (.../gcc/config/rs6000) (working copy) > @@ -6736,6 +6736,38 @@ rs6000_expand_vector_init (rtx target, r > return; > } > > + /* Special case initializing vector int if we are on 64-bit systems with > + direct move. This bug tickles a bug in reload for fortran's > + cray_pointers_2 test unless -mvsx-timode is enabled. */ "This bug"? It's not clear to me what this says, could you rephrase? Just say what the code does, not what would happen without the code. Or say both. > +static inline int > +regno_or_subregno (rtx op) > +{ > + if (REG_P (op)) > + return REGNO (op); > + else if (SUBREG_P (op)) > + return subreg_regno (op); > + else > + gcc_unreachable (); > +} Maybe this should check the subreg is lowpart, too? For robustness. > ;; Build a V2DF/V2DI vector from two scalars > (define_insn "vsx_concat_<mode>" > - [(set (match_operand:VSX_D 0 "vsx_register_operand" "=<VSr>,?<VSa>") > + [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we") > (vec_concat:VSX_D > - (match_operand:<VS_scalar> 1 "vsx_register_operand" "<VS_64reg>,<VSa>") > - (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")))] > + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,r") > + (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,r"))) > + (clobber (match_scratch:DI 3 "=X,X"))] X,X? How is that useful? > + (set_attr "length" "4")]) One insn is the default. Segher
On 08/03/2016 11:33 PM, Michael Meissner wrote: > { > - if (BYTES_BIG_ENDIAN) > - return "xxpermdi %x0,%x1,%x2,0"; > + if (which_alternative == 0) > + return (BYTES_BIG_ENDIAN > + ? "xxpermdi %x0,%x1,%x2,0" > + : "xxpermdi %x0,%x2,%x1,0"); > + > + else if (which_alternative == 1) > + return (BYTES_BIG_ENDIAN > + ? "mtvsrdd %x0,%1,%2" > + : "mtvsrdd %x0,%2,%1"); > + > else > - return "xxpermdi %x0,%x2,%x1,0"; > + gcc_unreachable (); > } > - [(set_attr "type" "vecperm")]) > + [(set_attr "type" "vecperm,mftgpr") > + (set_attr "length" "4")]) mtvsrdd actually behaves like a permute, so vecperm would be best insn type for it. -Pat
On Thu, Aug 04, 2016 at 10:03:36AM -0500, Segher Boessenkool wrote: > Hi Mike, > > On Thu, Aug 04, 2016 at 12:33:44AM -0400, Michael Meissner wrote: > > I built spec 2006 with these patches on a little endian power8 system, and at > > least 18 of the benchmarks had vector initializations replaced. Most > > benchmarks only used the initialization in a few places, but gamess, dealII, > > h264ref, and wrf each had over 100 initializations changed. > > Did performance change? I ran a selected set of spec benchmarks, and I saw one regression (3.5% in tonto). I'll run the full suite, and see if I can see what the slow down is. > > I have tried these patches on a big endian power7 system (both 32-bit and > > 64-bit targets), on a big endian power8 system (just 64-bit targets), and a > > little endian power8 system (just 64-bit targets). There were no regressions > > on any of the systems. Can I install these patches to the trunk? > > Some questions below, okay for trunk with those taken care of. Thanks. > > > > --- gcc/config/rs6000/rs6000.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 239098) > > +++ gcc/config/rs6000/rs6000.c (.../gcc/config/rs6000) (working copy) > > @@ -6736,6 +6736,38 @@ rs6000_expand_vector_init (rtx target, r > > return; > > } > > > > + /* Special case initializing vector int if we are on 64-bit systems with > > + direct move. This bug tickles a bug in reload for fortran's > > + cray_pointers_2 test unless -mvsx-timode is enabled. */ > > "This bug"? It's not clear to me what this says, could you rephrase? > Just say what the code does, not what would happen without the code. Or > say both. Here is the comment I tried to reword to be clearer. Since the bug only occurs if you do -mno-lra -mno-vsx-timode, I made the test to be those conditions. I don't think it is worth the time at this point to track down what the reload issue is. /* Special case initializing vector int if we are on 64-bit systems with direct move. This optimization tickles a bug in RELOAD for fortran's cray_pointers_2 test unless -mvsx-timode is enabled (the register allocator is trying to load up a V4SImode vector in GPRs with a TImode address using a SUBREG). Since RELOAD is no longer the default register allocator, just don't do the optimization. */ if (mode == V4SImode && TARGET_DIRECT_MOVE_64BIT && (TARGET_LRA || TARGET_VSX_TIMODE)) > > +static inline int > > +regno_or_subregno (rtx op) > > +{ > > + if (REG_P (op)) > > + return REGNO (op); > > + else if (SUBREG_P (op)) > > + return subreg_regno (op); > > + else > > + gcc_unreachable (); > > +} > > Maybe this should check the subreg is lowpart, too? For robustness. No, subreg_regno already does that. Since this is just infrastructure for a potential future change, I can remove this change until the next patch. > > ;; Build a V2DF/V2DI vector from two scalars > > (define_insn "vsx_concat_<mode>" > > - [(set (match_operand:VSX_D 0 "vsx_register_operand" "=<VSr>,?<VSa>") > > + [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we") > > (vec_concat:VSX_D > > - (match_operand:<VS_scalar> 1 "vsx_register_operand" "<VS_64reg>,<VSa>") > > - (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")))] > > + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,r") > > + (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,r"))) > > + (clobber (match_scratch:DI 3 "=X,X"))] > > X,X? How is that useful? Because I had a more elaborate version for vsx_concat_<mode> that did need a base register for memory addresses. I missed deleting the useless clobber. I deleted it in my source tree that will become the next iteration of the patch. When I add support for vector insert to/from memory, I will probably need the clobbers (and this BTW, was the reason to split off regno_or_subregno). > > + (set_attr "length" "4")]) > > One insn is the default. Yep. I had been working on a larger change, and some of the other alternatives did have other lengths.
On Fri, Aug 05, 2016 at 05:00:39PM -0500, Pat Haugen wrote: > On 08/03/2016 11:33 PM, Michael Meissner wrote: > > { > > - if (BYTES_BIG_ENDIAN) > > - return "xxpermdi %x0,%x1,%x2,0"; > > + if (which_alternative == 0) > > + return (BYTES_BIG_ENDIAN > > + ? "xxpermdi %x0,%x1,%x2,0" > > + : "xxpermdi %x0,%x2,%x1,0"); > > + > > + else if (which_alternative == 1) > > + return (BYTES_BIG_ENDIAN > > + ? "mtvsrdd %x0,%1,%2" > > + : "mtvsrdd %x0,%2,%1"); > > + > > else > > - return "xxpermdi %x0,%x2,%x1,0"; > > + gcc_unreachable (); > > } > > - [(set_attr "type" "vecperm")]) > > + [(set_attr "type" "vecperm,mftgpr") > > + (set_attr "length" "4")]) > > mtvsrdd actually behaves like a permute, so vecperm would be best insn type for it. Ok, when I submit the patch again, I will change the type to "vecperm". I will also change it in "vsx_splat_<mode>" which also generates MTVSRDD. Thanks.
Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 239098) +++ gcc/config/rs6000/rs6000.c (.../gcc/config/rs6000) (working copy) @@ -6736,6 +6736,38 @@ rs6000_expand_vector_init (rtx target, r return; } + /* Special case initializing vector int if we are on 64-bit systems with + direct move. This bug tickles a bug in reload for fortran's + cray_pointers_2 test unless -mvsx-timode is enabled. */ + if (mode == V4SImode && TARGET_DIRECT_MOVE_64BIT && TARGET_VSX_TIMODE) + { + rtx di_hi, di_lo, elements[4], tmp; + size_t i; + + for (i = 0; i < 4; i++) + { + rtx element_si = XVECEXP (vals, 0, VECTOR_ELT_ORDER_BIG ? i : 3 - i); + element_si = copy_to_mode_reg (SImode, element_si); + elements[i] = gen_reg_rtx (DImode); + convert_move (elements[i], element_si, true); + } + + di_hi = gen_reg_rtx (DImode); + tmp = gen_reg_rtx (DImode); + emit_insn (gen_ashldi3 (tmp, elements[0], GEN_INT (32))); + emit_insn (gen_iordi3 (di_hi, tmp, elements[1])); + + di_lo = gen_reg_rtx (DImode); + tmp = gen_reg_rtx (DImode); + emit_insn (gen_ashldi3 (tmp, elements[2], GEN_INT (32))); + emit_insn (gen_iordi3 (di_lo, tmp, elements[3])); + + emit_insn (gen_rtx_CLOBBER (VOIDmode, target)); + emit_move_insn (gen_highpart (DImode, target), di_hi); + emit_move_insn (gen_lowpart (DImode, target), di_lo); + return; + } + /* With single precision floating point on VSX, know that internally single precision is actually represented as a double, and either make 2 V2DF vectors, and convert these vectors to single precision, or do one @@ -7021,6 +7053,18 @@ rs6000_expand_vector_extract (rtx target emit_move_insn (target, adjust_address_nv (mem, inner_mode, 0)); } +/* Helper function to return the register number of a RTX. */ +static inline int +regno_or_subregno (rtx op) +{ + if (REG_P (op)) + return REGNO (op); + else if (SUBREG_P (op)) + return subreg_regno (op); + else + gcc_unreachable (); +} + /* Adjust a memory address (MEM) of a vector type to point to a scalar field within the vector (ELEMENT) with a mode (SCALAR_MODE). Use a base register temporary (BASE_TMP) to fixup the address. Return the new memory address @@ -7136,14 +7180,7 @@ rs6000_adjust_vec_address (rtx scalar_re { rtx op1 = XEXP (new_addr, 1); addr_mask_type addr_mask; - int scalar_regno; - - if (REG_P (scalar_reg)) - scalar_regno = REGNO (scalar_reg); - else if (SUBREG_P (scalar_reg)) - scalar_regno = subreg_regno (scalar_reg); - else - gcc_unreachable (); + int scalar_regno = regno_or_subregno (scalar_reg); gcc_assert (scalar_regno < FIRST_PSEUDO_REGISTER); if (INT_REGNO_P (scalar_regno)) Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 239098) +++ gcc/config/rs6000/vsx.md (.../gcc/config/rs6000) (working copy) @@ -1899,18 +1899,28 @@ (define_insn "*vsx_float_fix_v2df2" ;; Build a V2DF/V2DI vector from two scalars (define_insn "vsx_concat_<mode>" - [(set (match_operand:VSX_D 0 "vsx_register_operand" "=<VSr>,?<VSa>") + [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we") (vec_concat:VSX_D - (match_operand:<VS_scalar> 1 "vsx_register_operand" "<VS_64reg>,<VSa>") - (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")))] + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,r") + (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,r"))) + (clobber (match_scratch:DI 3 "=X,X"))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - if (BYTES_BIG_ENDIAN) - return "xxpermdi %x0,%x1,%x2,0"; + if (which_alternative == 0) + return (BYTES_BIG_ENDIAN + ? "xxpermdi %x0,%x1,%x2,0" + : "xxpermdi %x0,%x2,%x1,0"); + + else if (which_alternative == 1) + return (BYTES_BIG_ENDIAN + ? "mtvsrdd %x0,%1,%2" + : "mtvsrdd %x0,%2,%1"); + else - return "xxpermdi %x0,%x2,%x1,0"; + gcc_unreachable (); } - [(set_attr "type" "vecperm")]) + [(set_attr "type" "vecperm,mftgpr") + (set_attr "length" "4")]) ;; Special purpose concat using xxpermdi to glue two single precision values ;; together, relying on the fact that internally scalar floats are represented Index: gcc/testsuite/gcc.target/powerpc/vec-init-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vec-init-1.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-init-1.c (.../gcc/testsuite/gcc.target/powerpc) (revision 239099) @@ -0,0 +1,36 @@ +/* { dg-do run { target { powerpc*-*-linux* } } } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-options "-O2 -mvsx" } */ + +#include <stdlib.h> +#include <stddef.h> +#include <altivec.h> + +extern void check (vector int a) __attribute__((__noinline__)); +extern vector int pack (int a, int b, int c, int d) __attribute__((__noinline__)); + +void +check (vector int a) +{ + static const int expected[] = { -1, 2, 0, -3 }; + size_t i; + + for (i = 0; i < 4; i++) + if (vec_extract (a, i) != expected[i]) + abort (); +} + +vector int +pack (int a, int b, int c, int d) +{ + return (vector int) { a, b, c, d }; +} + +vector int sv = (vector int) { -1, 2, 0, -3 }; + +int main (void) +{ + check (sv); + check (pack (-1, 2, 0, -3)); + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/vec-init-2.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vec-init-2.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-init-2.c (.../gcc/testsuite/gcc.target/powerpc) (revision 239099) @@ -0,0 +1,36 @@ +/* { dg-do run { target { powerpc*-*-linux* && lp64 } } } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-options "-O2 -mvsx" } */ + +#include <stdlib.h> +#include <stddef.h> +#include <altivec.h> + +extern void check (vector long a) __attribute__((__noinline__)); +extern vector long pack (long a, long b) __attribute__((__noinline__)); + +void +check (vector long a) +{ + static const long expected[] = { 2L, -3L }; + size_t i; + + for (i = 0; i < 2; i++) + if (vec_extract (a, i) != expected[i]) + abort (); +} + +vector long +pack (long a, long b) +{ + return (vector long) { a, b }; +} + +vector long sv = (vector long) { 2L, -3L }; + +int main (void) +{ + check (sv); + check (pack (2L, -3L)); + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/vec-init-3.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vec-init-3.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-init-3.c (.../gcc/testsuite/gcc.target/powerpc) (revision 239099) @@ -0,0 +1,12 @@ +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mcpu=power9 -O2 -mupper-regs-di" } */ + +vector long +merge (long a, long b) +{ + return (vector long) { a, b }; +} + +/* { dg-final { scan-assembler "mtvsrdd" } } */