Message ID | 1395257438.17148.9.camel@gnopaine |
---|---|
State | New |
Headers | show |
On Wed, 19 Mar 2014, Bill Schmidt wrote: > Hi, > > This patch (diff-le-vector) backports the changes to support vector > infrastructure on powerpc64le. Copying Richard and Jakub for the libcpp > bits. The libcpp bits are fine. Thanks, Richard. > Thanks, > Bill > > > [gcc] > > 2014-03-29 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > Backport from mainline r205333 > 2013-11-24 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/rs6000.c (rs6000_expand_vec_perm_const_1): Correct > for little endian. > > Backport from mainline r205241 > 2013-11-21 Bill Schmidt <wschmidt@vnet.ibm.com> > > * config/rs6000/vector.md (vec_pack_trunc_v2df): Revert previous > little endian change. > (vec_pack_sfix_trunc_v2df): Likewise. > (vec_pack_ufix_trunc_v2df): Likewise. > * config/rs6000/rs6000.c (rs6000_expand_interleave): Correct > double checking of endianness. > > Backport from mainline r205146 > 2013-11-20 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/vsx.md (vsx_set_<mode>): Adjust for little endian. > (vsx_extract_<mode>): Likewise. > (*vsx_extract_<mode>_one_le): New LE variant on > *vsx_extract_<mode>_zero. > (vsx_extract_v4sf): Adjust for little endian. > > Backport from mainline r205080 > 2013-11-19 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Adjust > V16QI vector splat case for little endian. > > Backport from mainline r205045: > > 2013-11-19 Ulrich Weigand <Ulrich.Weigand@de.ibm.com> > > * config/rs6000/vector.md ("mov<mode>"): Do not call > rs6000_emit_le_vsx_move to move into or out of GPRs. > * config/rs6000/rs6000.c (rs6000_emit_le_vsx_move): Assert > source and destination are not GPR hard regs. > > Backport from mainline r204920 > 2011-11-17 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/rs6000.c (rs6000_frame_related): Add split_reg > parameter and use it in REG_FRAME_RELATED_EXPR note. > (emit_frame_save): Call rs6000_frame_related with extra NULL_RTX > parameter. > (rs6000_emit_prologue): Likewise, but for little endian VSX > stores, pass the source register of the store instead. > > Backport from mainline r204862 > 2013-11-15 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/altivec.md (UNSPEC_VPERM_X, UNSPEC_VPERM_UNS_X): > Remove. > (altivec_vperm_<mode>): Revert earlier little endian change. > (*altivec_vperm_<mode>_internal): Remove. > (altivec_vperm_<mode>_uns): Revert earlier little endian change. > (*altivec_vperm_<mode>_uns_internal): Remove. > * config/rs6000/vector.md (vec_realign_load_<mode>): Revise > commentary. > > Backport from mainline r204441 > 2013-11-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/rs6000.c (rs6000_option_override_internal): > Remove restriction against use of VSX instructions when generating > code for little endian mode. > > Backport from mainline r204440 > 2013-11-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/altivec.md (mulv4si3): Ensure we generate vmulouh > for both big and little endian. > (mulv8hi3): Swap input operands for merge high and merge low > instructions for little endian. > > Backport from mainline r204439 > 2013-11-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/altivec.md (vec_widen_umult_even_v16qi): Change > define_insn to define_expand that uses even patterns for big > endian and odd patterns for little endian. > (vec_widen_smult_even_v16qi): Likewise. > (vec_widen_umult_even_v8hi): Likewise. > (vec_widen_smult_even_v8hi): Likewise. > (vec_widen_umult_odd_v16qi): Likewise. > (vec_widen_smult_odd_v16qi): Likewise. > (vec_widen_umult_odd_v8hi): Likewise. > (vec_widen_smult_odd_v8hi): Likewise. > (altivec_vmuleub): New define_insn. > (altivec_vmuloub): Likewise. > (altivec_vmulesb): Likewise. > (altivec_vmulosb): Likewise. > (altivec_vmuleuh): Likewise. > (altivec_vmulouh): Likewise. > (altivec_vmulesh): Likewise. > (altivec_vmulosh): Likewise. > > Backport from mainline r204395 > 2013-11-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/vector.md (vec_pack_sfix_trunc_v2df): Adjust for > little endian. > (vec_pack_ufix_trunc_v2df): Likewise. > > Backport from mainline r204363 > 2013-11-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/altivec.md (vec_widen_umult_hi_v16qi): Swap > arguments to merge instruction for little endian. > (vec_widen_umult_lo_v16qi): Likewise. > (vec_widen_smult_hi_v16qi): Likewise. > (vec_widen_smult_lo_v16qi): Likewise. > (vec_widen_umult_hi_v8hi): Likewise. > (vec_widen_umult_lo_v8hi): Likewise. > (vec_widen_smult_hi_v8hi): Likewise. > (vec_widen_smult_lo_v8hi): Likewise. > > Backport from mainline r204350 > 2013-11-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/vsx.md (*vsx_le_perm_store_<mode> for VSX_D): > Replace the define_insn_and_split with a define_insn and two > define_splits, with the split after reload re-permuting the source > register to its original value. > (*vsx_le_perm_store_<mode> for VSX_W): Likewise. > (*vsx_le_perm_store_v8hi): Likewise. > (*vsx_le_perm_store_v16qi): Likewise. > > Backport from mainline r204321 > 2013-11-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/vector.md (vec_pack_trunc_v2df): Adjust for > little endian. > > Backport from mainline r204321 > 2013-11-02 Bill Schmidt <wschmidt@vnet.linux.ibm.com> > > * config/rs6000/rs6000.c (rs6000_expand_vector_set): Adjust for > little endian. > > Backport from mainline r203980 > 2013-10-23 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/altivec.md (mulv8hi3): Adjust for little endian. > > Backport from mainline r203930 > 2013-10-22 Bill Schmidt <wschmidt@vnet.ibm.com> > > * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse > meaning of merge-high and merge-low masks for little endian; avoid > use of vector-pack masks for little endian for mismatched modes. > > Backport from mainline r203877 > 2013-10-20 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Adjust for > little endian. > (vec_unpacku_hi_v8hi): Likewise. > (vec_unpacku_lo_v16qi): Likewise. > (vec_unpacku_lo_v8hi): Likewise. > > Backport from mainline r203863 > 2013-10-19 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/rs6000.c (vspltis_constant): Make sure we check > all elements for both endian flavors. > > Backport from mainline r203714 > 2013-10-16 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * gcc/config/rs6000/vector.md (vec_unpacks_hi_v4sf): Correct for > endianness. > (vec_unpacks_lo_v4sf): Likewise. > (vec_unpacks_float_hi_v4si): Likewise. > (vec_unpacks_float_lo_v4si): Likewise. > (vec_unpacku_float_hi_v4si): Likewise. > (vec_unpacku_float_lo_v4si): Likewise. > > Backport from mainline r203713 > 2013-10-16 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/vsx.md (vsx_concat_<mode>): Adjust output for LE. > (vsx_concat_v2sf): Likewise. > > Backport from mainline r203458 > 2013-10-11 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/vsx.md (*vsx_le_perm_load_v2di): Generalize to > handle vector float as well. > (*vsx_le_perm_load_v4si): Likewise. > (*vsx_le_perm_store_v2di): Likewise. > (*vsx_le_perm_store_v4si): Likewise. > > Backport from mainline r203457 > 2013-10-11 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/vector.md (vec_realign_load<mode>): Generate vperm > directly to circumvent subtract from splat{31} workaround. > * config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_le): New > prototype. > * config/rs6000/rs6000.c (altivec_expand_vec_perm_le): New. > * config/rs6000/altivec.md (define_c_enum "unspec"): Add > UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X. > (altivec_vperm_<mode>): Convert to define_insn_and_split to > separate big and little endian logic. > (*altivec_vperm_<mode>_internal): New define_insn. > (altivec_vperm_<mode>_uns): Convert to define_insn_and_split to > separate big and little endian logic. > (*altivec_vperm_<mode>_uns_internal): New define_insn. > (vec_permv16qi): Add little endian logic. > > Backport from mainline r203247 > 2013-10-07 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/rs6000.c (altivec_expand_vec_perm_const_le): New. > (altivec_expand_vec_perm_const): Call it. > > Backport from mainline r203246 > 2013-10-07 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/vector.md (mov<mode>): Emit permuted move > sequences for LE VSX loads and stores at expand time. > * config/rs6000/rs6000-protos.h (rs6000_emit_le_vsx_move): New > prototype. > * config/rs6000/rs6000.c (rs6000_const_vec): New. > (rs6000_gen_le_vsx_permute): New. > (rs6000_gen_le_vsx_load): New. > (rs6000_gen_le_vsx_store): New. > (rs6000_gen_le_vsx_move): New. > * config/rs6000/vsx.md (*vsx_le_perm_load_v2di): New. > (*vsx_le_perm_load_v4si): New. > (*vsx_le_perm_load_v8hi): New. > (*vsx_le_perm_load_v16qi): New. > (*vsx_le_perm_store_v2di): New. > (*vsx_le_perm_store_v4si): New. > (*vsx_le_perm_store_v8hi): New. > (*vsx_le_perm_store_v16qi): New. > (*vsx_xxpermdi2_le_<mode>): New. > (*vsx_xxpermdi4_le_<mode>): New. > (*vsx_xxpermdi8_le_V8HI): New. > (*vsx_xxpermdi16_le_V16QI): New. > (*vsx_lxvd2x2_le_<mode>): New. > (*vsx_lxvd2x4_le_<mode>): New. > (*vsx_lxvd2x8_le_V8HI): New. > (*vsx_lxvd2x16_le_V16QI): New. > (*vsx_stxvd2x2_le_<mode>): New. > (*vsx_stxvd2x4_le_<mode>): New. > (*vsx_stxvd2x8_le_V8HI): New. > (*vsx_stxvd2x16_le_V16QI): New. > > Backport from mainline r201235 > 2013-07-24 Bill Schmidt <wschmidt@linux.ibm.com> > Anton Blanchard <anton@au1.ibm.com> > > * config/rs6000/altivec.md (altivec_vpkpx): Handle little endian. > (altivec_vpks<VI_char>ss): Likewise. > (altivec_vpks<VI_char>us): Likewise. > (altivec_vpku<VI_char>us): Likewise. > (altivec_vpku<VI_char>um): Likewise. > > Backport from mainline r201208 > 2013-07-24 Bill Schmidt <wschmidt@vnet.linux.ibm.com> > Anton Blanchard <anton@au1.ibm.com> > > * config/rs6000/vector.md (vec_realign_load_<mode>): Reorder input > operands to vperm for little endian. > * config/rs6000/rs6000.c (rs6000_expand_builtin): Use lvsr instead > of lvsl to create the control mask for a vperm for little endian. > > Backport from mainline r201195 > 2013-07-23 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > Anton Blanchard <anton@au1.ibm.com> > > * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse > two operands for little-endian. > > Backport from mainline r201193 > 2013-07-23 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > Anton Blanchard <anton@au1.ibm.com> > > * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Correct > selection of field for vector splat in little endian mode. > > Backport from mainline r201149 > 2013-07-22 Bill Schmidt <wschmidt@vnet.linux.ibm.com> > Anton Blanchard <anton@au1.ibm.com> > > * config/rs6000/rs6000.c (rs6000_expand_vector_init): Fix > endianness when selecting field to splat. > > [gcc/testsuite] > > 2014-03-29 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > Backport from mainline r205638 > 2013-12-03 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c: Skip for little > endian. > > Backport from mainline r205146 > 2013-11-20 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * gcc.target/powerpc/pr48258-1.c: Skip for little endian. > > Backport from mainline r204862 > 2013-11-15 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * gcc.dg/vmx/3b-15.c: Revise for little endian. > > Backport from mainline r204321 > 2013-11-02 Bill Schmidt <wschmidt@vnet.linux.ibm.com> > > * gcc.dg/vmx/vec-set.c: New. > > Backport from mainline r204138 > 2013-10-28 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * gcc.dg/vmx/gcc-bug-i.c: Add little endian variant. > * gcc.dg/vmx/eg-5.c: Likewise. > > Backport from mainline r203930 > 2013-10-22 Bill Schmidt <wschmidt@vnet.ibm.com> > > * gcc.target/powerpc/altivec-perm-1.c: Move the two vector pack > tests into... > * gcc.target/powerpc/altivec-perm-3.c: ...this new test, which is > restricted to big-endian targets. > > Backport from mainline r203246 > 2013-10-07 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * gcc.target/powerpc/pr43154.c: Skip for ppc64 little endian. > * gcc.target/powerpc/fusion.c: Likewise. > > [libcpp] > > 2014-03-29 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > Backport from mainline > 2013-11-18 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * lex.c (search_line_fast): Correct for little endian. > > > Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c > =================================================================== > --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c > +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c > @@ -3216,11 +3216,6 @@ rs6000_option_override_internal (bool gl > } > else if (TARGET_PAIRED_FLOAT) > msg = N_("-mvsx and -mpaired are incompatible"); > - /* The hardware will allow VSX and little endian, but until we make sure > - things like vector select, etc. work don't allow VSX on little endian > - systems at this point. */ > - else if (!BYTES_BIG_ENDIAN) > - msg = N_("-mvsx used with little endian code"); > else if (TARGET_AVOID_XFORM > 0) > msg = N_("-mvsx needs indexed addressing"); > else if (!TARGET_ALTIVEC && (rs6000_isa_flags_explicit > @@ -4991,15 +4986,16 @@ vspltis_constant (rtx op, unsigned step, > > /* Check if VAL is present in every STEP-th element, and the > other elements are filled with its most significant bit. */ > - for (i = 0; i < nunits - 1; ++i) > + for (i = 1; i < nunits; ++i) > { > HOST_WIDE_INT desired_val; > - if (((BYTES_BIG_ENDIAN ? i + 1 : i) & (step - 1)) == 0) > + unsigned elt = BYTES_BIG_ENDIAN ? nunits - 1 - i : i; > + if ((i & (step - 1)) == 0) > desired_val = val; > else > desired_val = msb_val; > > - if (desired_val != const_vector_elt_as_int (op, i)) > + if (desired_val != const_vector_elt_as_int (op, elt)) > return false; > } > > @@ -5446,6 +5442,7 @@ rs6000_expand_vector_init (rtx target, r > of 64-bit items is not supported on Altivec. */ > if (all_same && GET_MODE_SIZE (inner_mode) <= 4) > { > + rtx field; > mem = assign_stack_temp (mode, GET_MODE_SIZE (inner_mode)); > emit_move_insn (adjust_address_nv (mem, inner_mode, 0), > XVECEXP (vals, 0, 0)); > @@ -5456,9 +5453,11 @@ rs6000_expand_vector_init (rtx target, r > gen_rtx_SET (VOIDmode, > target, mem), > x))); > + field = (BYTES_BIG_ENDIAN ? const0_rtx > + : GEN_INT (GET_MODE_NUNITS (mode) - 1)); > x = gen_rtx_VEC_SELECT (inner_mode, target, > gen_rtx_PARALLEL (VOIDmode, > - gen_rtvec (1, const0_rtx))); > + gen_rtvec (1, field))); > emit_insn (gen_rtx_SET (VOIDmode, target, > gen_rtx_VEC_DUPLICATE (mode, x))); > return; > @@ -5531,10 +5530,27 @@ rs6000_expand_vector_set (rtx target, rt > XVECEXP (mask, 0, elt*width + i) > = GEN_INT (i + 0x10); > x = gen_rtx_CONST_VECTOR (V16QImode, XVEC (mask, 0)); > - x = gen_rtx_UNSPEC (mode, > - gen_rtvec (3, target, reg, > - force_reg (V16QImode, x)), > - UNSPEC_VPERM); > + > + if (BYTES_BIG_ENDIAN) > + x = gen_rtx_UNSPEC (mode, > + gen_rtvec (3, target, reg, > + force_reg (V16QImode, x)), > + UNSPEC_VPERM); > + else > + { > + /* Invert selector. */ > + rtx splat = gen_rtx_VEC_DUPLICATE (V16QImode, > + gen_rtx_CONST_INT (QImode, -1)); > + rtx tmp = gen_reg_rtx (V16QImode); > + emit_move_insn (tmp, splat); > + x = gen_rtx_MINUS (V16QImode, tmp, force_reg (V16QImode, x)); > + emit_move_insn (tmp, x); > + > + /* Permute with operands reversed and adjusted selector. */ > + x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp), > + UNSPEC_VPERM); > + } > + > emit_insn (gen_rtx_SET (VOIDmode, target, x)); > } > > @@ -7830,6 +7846,107 @@ rs6000_eliminate_indexed_memrefs (rtx op > copy_addr_to_reg (XEXP (operands[1], 0))); > } > > +/* Generate a vector of constants to permute MODE for a little-endian > + storage operation by swapping the two halves of a vector. */ > +static rtvec > +rs6000_const_vec (enum machine_mode mode) > +{ > + int i, subparts; > + rtvec v; > + > + switch (mode) > + { > + case V2DFmode: > + case V2DImode: > + subparts = 2; > + break; > + case V4SFmode: > + case V4SImode: > + subparts = 4; > + break; > + case V8HImode: > + subparts = 8; > + break; > + case V16QImode: > + subparts = 16; > + break; > + default: > + gcc_unreachable(); > + } > + > + v = rtvec_alloc (subparts); > + > + for (i = 0; i < subparts / 2; ++i) > + RTVEC_ELT (v, i) = gen_rtx_CONST_INT (DImode, i + subparts / 2); > + for (i = subparts / 2; i < subparts; ++i) > + RTVEC_ELT (v, i) = gen_rtx_CONST_INT (DImode, i - subparts / 2); > + > + return v; > +} > + > +/* Generate a permute rtx that represents an lxvd2x, stxvd2x, or xxpermdi > + for a VSX load or store operation. */ > +rtx > +rs6000_gen_le_vsx_permute (rtx source, enum machine_mode mode) > +{ > + rtx par = gen_rtx_PARALLEL (VOIDmode, rs6000_const_vec (mode)); > + return gen_rtx_VEC_SELECT (mode, source, par); > +} > + > +/* Emit a little-endian load from vector memory location SOURCE to VSX > + register DEST in mode MODE. The load is done with two permuting > + insn's that represent an lxvd2x and xxpermdi. */ > +void > +rs6000_emit_le_vsx_load (rtx dest, rtx source, enum machine_mode mode) > +{ > + rtx tmp = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (dest) : dest; > + rtx permute_mem = rs6000_gen_le_vsx_permute (source, mode); > + rtx permute_reg = rs6000_gen_le_vsx_permute (tmp, mode); > + emit_insn (gen_rtx_SET (VOIDmode, tmp, permute_mem)); > + emit_insn (gen_rtx_SET (VOIDmode, dest, permute_reg)); > +} > + > +/* Emit a little-endian store to vector memory location DEST from VSX > + register SOURCE in mode MODE. The store is done with two permuting > + insn's that represent an xxpermdi and an stxvd2x. */ > +void > +rs6000_emit_le_vsx_store (rtx dest, rtx source, enum machine_mode mode) > +{ > + rtx tmp = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (source) : source; > + rtx permute_src = rs6000_gen_le_vsx_permute (source, mode); > + rtx permute_tmp = rs6000_gen_le_vsx_permute (tmp, mode); > + emit_insn (gen_rtx_SET (VOIDmode, tmp, permute_src)); > + emit_insn (gen_rtx_SET (VOIDmode, dest, permute_tmp)); > +} > + > +/* Emit a sequence representing a little-endian VSX load or store, > + moving data from SOURCE to DEST in mode MODE. This is done > + separately from rs6000_emit_move to ensure it is called only > + during expand. LE VSX loads and stores introduced later are > + handled with a split. The expand-time RTL generation allows > + us to optimize away redundant pairs of register-permutes. */ > +void > +rs6000_emit_le_vsx_move (rtx dest, rtx source, enum machine_mode mode) > +{ > + gcc_assert (!BYTES_BIG_ENDIAN > + && VECTOR_MEM_VSX_P (mode) > + && mode != TImode > + && !gpr_or_gpr_p (dest, source) > + && (MEM_P (source) ^ MEM_P (dest))); > + > + if (MEM_P (source)) > + { > + gcc_assert (REG_P (dest)); > + rs6000_emit_le_vsx_load (dest, source, mode); > + } > + else > + { > + if (!REG_P (source)) > + source = force_reg (mode, source); > + rs6000_emit_le_vsx_store (dest, source, mode); > + } > +} > + > /* Emit a move from SOURCE to DEST in mode MODE. */ > void > rs6000_emit_move (rtx dest, rtx source, enum machine_mode mode) > @@ -12589,7 +12706,8 @@ rs6000_expand_builtin (tree exp, rtx tar > case ALTIVEC_BUILTIN_MASK_FOR_LOAD: > case ALTIVEC_BUILTIN_MASK_FOR_STORE: > { > - int icode = (int) CODE_FOR_altivec_lvsr; > + int icode = (BYTES_BIG_ENDIAN ? (int) CODE_FOR_altivec_lvsr > + : (int) CODE_FOR_altivec_lvsl); > enum machine_mode tmode = insn_data[icode].operand[0].mode; > enum machine_mode mode = insn_data[icode].operand[1].mode; > tree arg; > @@ -20880,7 +20998,7 @@ output_probe_stack_range (rtx reg1, rtx > > static rtx > rs6000_frame_related (rtx insn, rtx reg, HOST_WIDE_INT val, > - rtx reg2, rtx rreg) > + rtx reg2, rtx rreg, rtx split_reg) > { > rtx real, temp; > > @@ -20971,6 +21089,11 @@ rs6000_frame_related (rtx insn, rtx reg, > } > } > > + /* If a store insn has been split into multiple insns, the > + true source register is given by split_reg. */ > + if (split_reg != NULL_RTX) > + real = gen_rtx_SET (VOIDmode, SET_DEST (real), split_reg); > + > RTX_FRAME_RELATED_P (insn) = 1; > add_reg_note (insn, REG_FRAME_RELATED_EXPR, real); > > @@ -21078,7 +21201,7 @@ emit_frame_save (rtx frame_reg, enum mac > reg = gen_rtx_REG (mode, regno); > insn = emit_insn (gen_frame_store (reg, frame_reg, offset)); > return rs6000_frame_related (insn, frame_reg, frame_reg_to_sp, > - NULL_RTX, NULL_RTX); > + NULL_RTX, NULL_RTX, NULL_RTX); > } > > /* Emit an offset memory reference suitable for a frame store, while > @@ -21599,7 +21722,7 @@ rs6000_emit_prologue (void) > > insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p)); > rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off, > - treg, GEN_INT (-info->total_size)); > + treg, GEN_INT (-info->total_size), NULL_RTX); > sp_off = frame_off = info->total_size; > } > > @@ -21684,7 +21807,7 @@ rs6000_emit_prologue (void) > > insn = emit_move_insn (mem, reg); > rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off, > - NULL_RTX, NULL_RTX); > + NULL_RTX, NULL_RTX, NULL_RTX); > END_USE (0); > } > } > @@ -21752,7 +21875,7 @@ rs6000_emit_prologue (void) > info->lr_save_offset, > DFmode, sel); > rs6000_frame_related (insn, ptr_reg, sp_off, > - NULL_RTX, NULL_RTX); > + NULL_RTX, NULL_RTX, NULL_RTX); > if (lr) > END_USE (0); > } > @@ -21831,7 +21954,7 @@ rs6000_emit_prologue (void) > SAVRES_SAVE | SAVRES_GPR); > > rs6000_frame_related (insn, spe_save_area_ptr, sp_off - save_off, > - NULL_RTX, NULL_RTX); > + NULL_RTX, NULL_RTX, NULL_RTX); > } > > /* Move the static chain pointer back. */ > @@ -21881,7 +22004,7 @@ rs6000_emit_prologue (void) > info->lr_save_offset + ptr_off, > reg_mode, sel); > rs6000_frame_related (insn, ptr_reg, sp_off - ptr_off, > - NULL_RTX, NULL_RTX); > + NULL_RTX, NULL_RTX, NULL_RTX); > if (lr) > END_USE (0); > } > @@ -21897,7 +22020,7 @@ rs6000_emit_prologue (void) > info->gp_save_offset + frame_off + reg_size * i); > insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p)); > rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off, > - NULL_RTX, NULL_RTX); > + NULL_RTX, NULL_RTX, NULL_RTX); > } > else if (!WORLD_SAVE_P (info)) > { > @@ -22124,7 +22247,7 @@ rs6000_emit_prologue (void) > info->altivec_save_offset + ptr_off, > 0, V4SImode, SAVRES_SAVE | SAVRES_VR); > rs6000_frame_related (insn, scratch_reg, sp_off - ptr_off, > - NULL_RTX, NULL_RTX); > + NULL_RTX, NULL_RTX, NULL_RTX); > if (REGNO (frame_reg_rtx) == REGNO (scratch_reg)) > { > /* The oddity mentioned above clobbered our frame reg. */ > @@ -22140,7 +22263,7 @@ rs6000_emit_prologue (void) > for (i = info->first_altivec_reg_save; i <= LAST_ALTIVEC_REGNO; ++i) > if (info->vrsave_mask & ALTIVEC_REG_BIT (i)) > { > - rtx areg, savereg, mem; > + rtx areg, savereg, mem, split_reg; > int offset; > > offset = (info->altivec_save_offset + frame_off > @@ -22158,8 +22281,18 @@ rs6000_emit_prologue (void) > > insn = emit_move_insn (mem, savereg); > > + /* When we split a VSX store into two insns, we need to make > + sure the DWARF info knows which register we are storing. > + Pass it in to be used on the appropriate note. */ > + if (!BYTES_BIG_ENDIAN > + && GET_CODE (PATTERN (insn)) == SET > + && GET_CODE (SET_SRC (PATTERN (insn))) == VEC_SELECT) > + split_reg = savereg; > + else > + split_reg = NULL_RTX; > + > rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off, > - areg, GEN_INT (offset)); > + areg, GEN_INT (offset), split_reg); > } > } > > @@ -28813,6 +28946,136 @@ rs6000_emit_parity (rtx dst, rtx src) > } > } > > +/* Expand an Altivec constant permutation for little endian mode. > + There are two issues: First, the two input operands must be > + swapped so that together they form a double-wide array in LE > + order. Second, the vperm instruction has surprising behavior > + in LE mode: it interprets the elements of the source vectors > + in BE mode ("left to right") and interprets the elements of > + the destination vector in LE mode ("right to left"). To > + correct for this, we must subtract each element of the permute > + control vector from 31. > + > + For example, suppose we want to concatenate vr10 = {0, 1, 2, 3} > + with vr11 = {4, 5, 6, 7} and extract {0, 2, 4, 6} using a vperm. > + We place {0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27} in vr12 to > + serve as the permute control vector. Then, in BE mode, > + > + vperm 9,10,11,12 > + > + places the desired result in vr9. However, in LE mode the > + vector contents will be > + > + vr10 = 00000003 00000002 00000001 00000000 > + vr11 = 00000007 00000006 00000005 00000004 > + > + The result of the vperm using the same permute control vector is > + > + vr9 = 05000000 07000000 01000000 03000000 > + > + That is, the leftmost 4 bytes of vr10 are interpreted as the > + source for the rightmost 4 bytes of vr9, and so on. > + > + If we change the permute control vector to > + > + vr12 = {31,20,29,28,23,22,21,20,15,14,13,12,7,6,5,4} > + > + and issue > + > + vperm 9,11,10,12 > + > + we get the desired > + > + vr9 = 00000006 00000004 00000002 00000000. */ > + > +void > +altivec_expand_vec_perm_const_le (rtx operands[4]) > +{ > + unsigned int i; > + rtx perm[16]; > + rtx constv, unspec; > + rtx target = operands[0]; > + rtx op0 = operands[1]; > + rtx op1 = operands[2]; > + rtx sel = operands[3]; > + > + /* Unpack and adjust the constant selector. */ > + for (i = 0; i < 16; ++i) > + { > + rtx e = XVECEXP (sel, 0, i); > + unsigned int elt = 31 - (INTVAL (e) & 31); > + perm[i] = GEN_INT (elt); > + } > + > + /* Expand to a permute, swapping the inputs and using the > + adjusted selector. */ > + if (!REG_P (op0)) > + op0 = force_reg (V16QImode, op0); > + if (!REG_P (op1)) > + op1 = force_reg (V16QImode, op1); > + > + constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm)); > + constv = force_reg (V16QImode, constv); > + unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, constv), > + UNSPEC_VPERM); > + if (!REG_P (target)) > + { > + rtx tmp = gen_reg_rtx (V16QImode); > + emit_move_insn (tmp, unspec); > + unspec = tmp; > + } > + > + emit_move_insn (target, unspec); > +} > + > +/* Similarly to altivec_expand_vec_perm_const_le, we must adjust the > + permute control vector. But here it's not a constant, so we must > + generate a vector splat/subtract to do the adjustment. */ > + > +void > +altivec_expand_vec_perm_le (rtx operands[4]) > +{ > + rtx splat, unspec; > + rtx target = operands[0]; > + rtx op0 = operands[1]; > + rtx op1 = operands[2]; > + rtx sel = operands[3]; > + rtx tmp = target; > + > + /* Get everything in regs so the pattern matches. */ > + if (!REG_P (op0)) > + op0 = force_reg (V16QImode, op0); > + if (!REG_P (op1)) > + op1 = force_reg (V16QImode, op1); > + if (!REG_P (sel)) > + sel = force_reg (V16QImode, sel); > + if (!REG_P (target)) > + tmp = gen_reg_rtx (V16QImode); > + > + /* SEL = splat(31) - SEL. */ > + /* We want to subtract from 31, but we can't vspltisb 31 since > + it's out of range. -1 works as well because only the low-order > + five bits of the permute control vector elements are used. */ > + splat = gen_rtx_VEC_DUPLICATE (V16QImode, > + gen_rtx_CONST_INT (QImode, -1)); > + emit_move_insn (tmp, splat); > + sel = gen_rtx_MINUS (V16QImode, tmp, sel); > + emit_move_insn (tmp, sel); > + > + /* Permute with operands reversed and adjusted selector. */ > + unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, tmp), > + UNSPEC_VPERM); > + > + /* Copy into target, possibly by way of a register. */ > + if (!REG_P (target)) > + { > + emit_move_insn (tmp, unspec); > + unspec = tmp; > + } > + > + emit_move_insn (target, unspec); > +} > + > /* Expand an Altivec constant permutation. Return true if we match > an efficient implementation; false to fall back to VPERM. */ > > @@ -28829,17 +29092,23 @@ altivec_expand_vec_perm_const (rtx opera > { 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } }, > { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum, > { 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } }, > - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb, > + { OPTION_MASK_ALTIVEC, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb : CODE_FOR_altivec_vmrglb, > { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, > - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh, > + { OPTION_MASK_ALTIVEC, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh : CODE_FOR_altivec_vmrglh, > { 0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23 } }, > - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw, > + { OPTION_MASK_ALTIVEC, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw : CODE_FOR_altivec_vmrglw, > { 0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23 } }, > - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb, > + { OPTION_MASK_ALTIVEC, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb : CODE_FOR_altivec_vmrghb, > { 8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31 } }, > - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh, > + { OPTION_MASK_ALTIVEC, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh : CODE_FOR_altivec_vmrghh, > { 8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31 } }, > - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw, > + { OPTION_MASK_ALTIVEC, > + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw : CODE_FOR_altivec_vmrghw, > { 8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } }, > { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew, > { 0, 1, 2, 3, 16, 17, 18, 19, 8, 9, 10, 11, 24, 25, 26, 27 } }, > @@ -28901,6 +29170,8 @@ altivec_expand_vec_perm_const (rtx opera > break; > if (i == 16) > { > + if (!BYTES_BIG_ENDIAN) > + elt = 15 - elt; > emit_insn (gen_altivec_vspltb (target, op0, GEN_INT (elt))); > return true; > } > @@ -28912,9 +29183,10 @@ altivec_expand_vec_perm_const (rtx opera > break; > if (i == 16) > { > + int field = BYTES_BIG_ENDIAN ? elt / 2 : 7 - elt / 2; > x = gen_reg_rtx (V8HImode); > emit_insn (gen_altivec_vsplth (x, gen_lowpart (V8HImode, op0), > - GEN_INT (elt / 2))); > + GEN_INT (field))); > emit_move_insn (target, gen_lowpart (V16QImode, x)); > return true; > } > @@ -28930,9 +29202,10 @@ altivec_expand_vec_perm_const (rtx opera > break; > if (i == 16) > { > + int field = BYTES_BIG_ENDIAN ? elt / 4 : 3 - elt / 4; > x = gen_reg_rtx (V4SImode); > emit_insn (gen_altivec_vspltw (x, gen_lowpart (V4SImode, op0), > - GEN_INT (elt / 4))); > + GEN_INT (field))); > emit_move_insn (target, gen_lowpart (V16QImode, x)); > return true; > } > @@ -28970,7 +29243,30 @@ altivec_expand_vec_perm_const (rtx opera > enum machine_mode omode = insn_data[icode].operand[0].mode; > enum machine_mode imode = insn_data[icode].operand[1].mode; > > - if (swapped) > + /* For little-endian, don't use vpkuwum and vpkuhum if the > + underlying vector type is not V4SI and V8HI, respectively. > + For example, using vpkuwum with a V8HI picks up the even > + halfwords (BE numbering) when the even halfwords (LE > + numbering) are what we need. */ > + if (!BYTES_BIG_ENDIAN > + && icode == CODE_FOR_altivec_vpkuwum > + && ((GET_CODE (op0) == REG > + && GET_MODE (op0) != V4SImode) > + || (GET_CODE (op0) == SUBREG > + && GET_MODE (XEXP (op0, 0)) != V4SImode))) > + continue; > + if (!BYTES_BIG_ENDIAN > + && icode == CODE_FOR_altivec_vpkuhum > + && ((GET_CODE (op0) == REG > + && GET_MODE (op0) != V8HImode) > + || (GET_CODE (op0) == SUBREG > + && GET_MODE (XEXP (op0, 0)) != V8HImode))) > + continue; > + > + /* For little-endian, the two input operands must be swapped > + (or swapped back) to ensure proper right-to-left numbering > + from 0 to 2N-1. */ > + if (swapped ^ !BYTES_BIG_ENDIAN) > x = op0, op0 = op1, op1 = x; > if (imode != V16QImode) > { > @@ -28988,6 +29284,12 @@ altivec_expand_vec_perm_const (rtx opera > } > } > > + if (!BYTES_BIG_ENDIAN) > + { > + altivec_expand_vec_perm_const_le (operands); > + return true; > + } > + > return false; > } > > @@ -29037,6 +29339,21 @@ rs6000_expand_vec_perm_const_1 (rtx targ > gcc_assert (GET_MODE_NUNITS (vmode) == 2); > dmode = mode_for_vector (GET_MODE_INNER (vmode), 4); > > + /* For little endian, swap operands and invert/swap selectors > + to get the correct xxpermdi. The operand swap sets up the > + inputs as a little endian array. The selectors are swapped > + because they are defined to use big endian ordering. The > + selectors are inverted to get the correct doublewords for > + little endian ordering. */ > + if (!BYTES_BIG_ENDIAN) > + { > + int n; > + perm0 = 3 - perm0; > + perm1 = 3 - perm1; > + n = perm0, perm0 = perm1, perm1 = n; > + x = op0, op0 = op1, op1 = x; > + } > + > x = gen_rtx_VEC_CONCAT (dmode, op0, op1); > v = gen_rtvec (2, GEN_INT (perm0), GEN_INT (perm1)); > x = gen_rtx_VEC_SELECT (vmode, x, gen_rtx_PARALLEL (VOIDmode, v)); > @@ -29132,7 +29449,7 @@ rs6000_expand_interleave (rtx target, rt > unsigned i, high, nelt = GET_MODE_NUNITS (vmode); > rtx perm[16]; > > - high = (highp == BYTES_BIG_ENDIAN ? 0 : nelt / 2); > + high = (highp ? 0 : nelt / 2); > for (i = 0; i < nelt / 2; i++) > { > perm[i * 2] = GEN_INT (i + high); > Index: gcc-4_8-test/gcc/config/rs6000/vector.md > =================================================================== > --- gcc-4_8-test.orig/gcc/config/rs6000/vector.md > +++ gcc-4_8-test/gcc/config/rs6000/vector.md > @@ -88,7 +88,8 @@ > (smax "smax")]) > > > -;; Vector move instructions. > +;; Vector move instructions. Little-endian VSX loads and stores require > +;; special handling to circumvent "element endianness." > (define_expand "mov<mode>" > [(set (match_operand:VEC_M 0 "nonimmediate_operand" "") > (match_operand:VEC_M 1 "any_operand" ""))] > @@ -104,6 +105,16 @@ > && !vlogical_operand (operands[1], <MODE>mode)) > operands[1] = force_reg (<MODE>mode, operands[1]); > } > + if (!BYTES_BIG_ENDIAN > + && VECTOR_MEM_VSX_P (<MODE>mode) > + && <MODE>mode != TImode > + && !gpr_or_gpr_p (operands[0], operands[1]) > + && (memory_operand (operands[0], <MODE>mode) > + ^ memory_operand (operands[1], <MODE>mode))) > + { > + rs6000_emit_le_vsx_move (operands[0], operands[1], <MODE>mode); > + DONE; > + } > }) > > ;; Generic vector floating point load/store instructions. These will match > @@ -862,7 +873,7 @@ > { > rtx reg = gen_reg_rtx (V4SFmode); > > - rs6000_expand_interleave (reg, operands[1], operands[1], true); > + rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN); > emit_insn (gen_vsx_xvcvspdp (operands[0], reg)); > DONE; > }) > @@ -874,7 +885,7 @@ > { > rtx reg = gen_reg_rtx (V4SFmode); > > - rs6000_expand_interleave (reg, operands[1], operands[1], false); > + rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN); > emit_insn (gen_vsx_xvcvspdp (operands[0], reg)); > DONE; > }) > @@ -886,7 +897,7 @@ > { > rtx reg = gen_reg_rtx (V4SImode); > > - rs6000_expand_interleave (reg, operands[1], operands[1], true); > + rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN); > emit_insn (gen_vsx_xvcvsxwdp (operands[0], reg)); > DONE; > }) > @@ -898,7 +909,7 @@ > { > rtx reg = gen_reg_rtx (V4SImode); > > - rs6000_expand_interleave (reg, operands[1], operands[1], false); > + rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN); > emit_insn (gen_vsx_xvcvsxwdp (operands[0], reg)); > DONE; > }) > @@ -910,7 +921,7 @@ > { > rtx reg = gen_reg_rtx (V4SImode); > > - rs6000_expand_interleave (reg, operands[1], operands[1], true); > + rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN); > emit_insn (gen_vsx_xvcvuxwdp (operands[0], reg)); > DONE; > }) > @@ -922,7 +933,7 @@ > { > rtx reg = gen_reg_rtx (V4SImode); > > - rs6000_expand_interleave (reg, operands[1], operands[1], false); > + rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN); > emit_insn (gen_vsx_xvcvuxwdp (operands[0], reg)); > DONE; > }) > @@ -936,8 +947,19 @@ > (match_operand:V16QI 3 "vlogical_operand" "")] > "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)" > { > - emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1], operands[2], > - operands[3])); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1], > + operands[2], operands[3])); > + else > + { > + /* We have changed lvsr to lvsl, so to complete the transformation > + of vperm for LE, we must swap the inputs. */ > + rtx unspec = gen_rtx_UNSPEC (<MODE>mode, > + gen_rtvec (3, operands[2], > + operands[1], operands[3]), > + UNSPEC_VPERM); > + emit_move_insn (operands[0], unspec); > + } > DONE; > }) > > Index: gcc-4_8-test/gcc/config/rs6000/altivec.md > =================================================================== > --- gcc-4_8-test.orig/gcc/config/rs6000/altivec.md > +++ gcc-4_8-test/gcc/config/rs6000/altivec.md > @@ -649,7 +649,7 @@ > convert_move (small_swap, swap, 0); > > low_product = gen_reg_rtx (V4SImode); > - emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two)); > + emit_insn (gen_altivec_vmulouh (low_product, one, two)); > > high_product = gen_reg_rtx (V4SImode); > emit_insn (gen_altivec_vmsumuhm (high_product, one, small_swap, zero)); > @@ -676,10 +676,18 @@ > emit_insn (gen_vec_widen_smult_even_v8hi (even, operands[1], operands[2])); > emit_insn (gen_vec_widen_smult_odd_v8hi (odd, operands[1], operands[2])); > > - emit_insn (gen_altivec_vmrghw (high, even, odd)); > - emit_insn (gen_altivec_vmrglw (low, even, odd)); > - > - emit_insn (gen_altivec_vpkuwum (operands[0], high, low)); > + if (BYTES_BIG_ENDIAN) > + { > + emit_insn (gen_altivec_vmrghw (high, even, odd)); > + emit_insn (gen_altivec_vmrglw (low, even, odd)); > + emit_insn (gen_altivec_vpkuwum (operands[0], high, low)); > + } > + else > + { > + emit_insn (gen_altivec_vmrghw (high, odd, even)); > + emit_insn (gen_altivec_vmrglw (low, odd, even)); > + emit_insn (gen_altivec_vpkuwum (operands[0], low, high)); > + } > > DONE; > }") > @@ -967,7 +975,111 @@ > "vmrgow %0,%1,%2" > [(set_attr "type" "vecperm")]) > > -(define_insn "vec_widen_umult_even_v16qi" > +(define_expand "vec_widen_umult_even_v16qi" > + [(use (match_operand:V8HI 0 "register_operand" "")) > + (use (match_operand:V16QI 1 "register_operand" "")) > + (use (match_operand:V16QI 2 "register_operand" ""))] > + "TARGET_ALTIVEC" > +{ > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmuleub (operands[0], operands[1], operands[2])); > + else > + emit_insn (gen_altivec_vmuloub (operands[0], operands[1], operands[2])); > + DONE; > +}) > + > +(define_expand "vec_widen_smult_even_v16qi" > + [(use (match_operand:V8HI 0 "register_operand" "")) > + (use (match_operand:V16QI 1 "register_operand" "")) > + (use (match_operand:V16QI 2 "register_operand" ""))] > + "TARGET_ALTIVEC" > +{ > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmulesb (operands[0], operands[1], operands[2])); > + else > + emit_insn (gen_altivec_vmulosb (operands[0], operands[1], operands[2])); > + DONE; > +}) > + > +(define_expand "vec_widen_umult_even_v8hi" > + [(use (match_operand:V4SI 0 "register_operand" "")) > + (use (match_operand:V8HI 1 "register_operand" "")) > + (use (match_operand:V8HI 2 "register_operand" ""))] > + "TARGET_ALTIVEC" > +{ > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmuleuh (operands[0], operands[1], operands[2])); > + else > + emit_insn (gen_altivec_vmulouh (operands[0], operands[1], operands[2])); > + DONE; > +}) > + > +(define_expand "vec_widen_smult_even_v8hi" > + [(use (match_operand:V4SI 0 "register_operand" "")) > + (use (match_operand:V8HI 1 "register_operand" "")) > + (use (match_operand:V8HI 2 "register_operand" ""))] > + "TARGET_ALTIVEC" > +{ > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmulesh (operands[0], operands[1], operands[2])); > + else > + emit_insn (gen_altivec_vmulosh (operands[0], operands[1], operands[2])); > + DONE; > +}) > + > +(define_expand "vec_widen_umult_odd_v16qi" > + [(use (match_operand:V8HI 0 "register_operand" "")) > + (use (match_operand:V16QI 1 "register_operand" "")) > + (use (match_operand:V16QI 2 "register_operand" ""))] > + "TARGET_ALTIVEC" > +{ > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmuloub (operands[0], operands[1], operands[2])); > + else > + emit_insn (gen_altivec_vmuleub (operands[0], operands[1], operands[2])); > + DONE; > +}) > + > +(define_expand "vec_widen_smult_odd_v16qi" > + [(use (match_operand:V8HI 0 "register_operand" "")) > + (use (match_operand:V16QI 1 "register_operand" "")) > + (use (match_operand:V16QI 2 "register_operand" ""))] > + "TARGET_ALTIVEC" > +{ > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmulosb (operands[0], operands[1], operands[2])); > + else > + emit_insn (gen_altivec_vmulesb (operands[0], operands[1], operands[2])); > + DONE; > +}) > + > +(define_expand "vec_widen_umult_odd_v8hi" > + [(use (match_operand:V4SI 0 "register_operand" "")) > + (use (match_operand:V8HI 1 "register_operand" "")) > + (use (match_operand:V8HI 2 "register_operand" ""))] > + "TARGET_ALTIVEC" > +{ > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmulouh (operands[0], operands[1], operands[2])); > + else > + emit_insn (gen_altivec_vmuleuh (operands[0], operands[1], operands[2])); > + DONE; > +}) > + > +(define_expand "vec_widen_smult_odd_v8hi" > + [(use (match_operand:V4SI 0 "register_operand" "")) > + (use (match_operand:V8HI 1 "register_operand" "")) > + (use (match_operand:V8HI 2 "register_operand" ""))] > + "TARGET_ALTIVEC" > +{ > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmulosh (operands[0], operands[1], operands[2])); > + else > + emit_insn (gen_altivec_vmulesh (operands[0], operands[1], operands[2])); > + DONE; > +}) > + > +(define_insn "altivec_vmuleub" > [(set (match_operand:V8HI 0 "register_operand" "=v") > (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v") > (match_operand:V16QI 2 "register_operand" "v")] > @@ -976,43 +1088,25 @@ > "vmuleub %0,%1,%2" > [(set_attr "type" "veccomplex")]) > > -(define_insn "vec_widen_smult_even_v16qi" > +(define_insn "altivec_vmuloub" > [(set (match_operand:V8HI 0 "register_operand" "=v") > (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v") > (match_operand:V16QI 2 "register_operand" "v")] > - UNSPEC_VMULESB))] > - "TARGET_ALTIVEC" > - "vmulesb %0,%1,%2" > - [(set_attr "type" "veccomplex")]) > - > -(define_insn "vec_widen_umult_even_v8hi" > - [(set (match_operand:V4SI 0 "register_operand" "=v") > - (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v") > - (match_operand:V8HI 2 "register_operand" "v")] > - UNSPEC_VMULEUH))] > - "TARGET_ALTIVEC" > - "vmuleuh %0,%1,%2" > - [(set_attr "type" "veccomplex")]) > - > -(define_insn "vec_widen_smult_even_v8hi" > - [(set (match_operand:V4SI 0 "register_operand" "=v") > - (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v") > - (match_operand:V8HI 2 "register_operand" "v")] > - UNSPEC_VMULESH))] > + UNSPEC_VMULOUB))] > "TARGET_ALTIVEC" > - "vmulesh %0,%1,%2" > + "vmuloub %0,%1,%2" > [(set_attr "type" "veccomplex")]) > > -(define_insn "vec_widen_umult_odd_v16qi" > +(define_insn "altivec_vmulesb" > [(set (match_operand:V8HI 0 "register_operand" "=v") > (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v") > (match_operand:V16QI 2 "register_operand" "v")] > - UNSPEC_VMULOUB))] > + UNSPEC_VMULESB))] > "TARGET_ALTIVEC" > - "vmuloub %0,%1,%2" > + "vmulesb %0,%1,%2" > [(set_attr "type" "veccomplex")]) > > -(define_insn "vec_widen_smult_odd_v16qi" > +(define_insn "altivec_vmulosb" > [(set (match_operand:V8HI 0 "register_operand" "=v") > (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v") > (match_operand:V16QI 2 "register_operand" "v")] > @@ -1021,7 +1115,16 @@ > "vmulosb %0,%1,%2" > [(set_attr "type" "veccomplex")]) > > -(define_insn "vec_widen_umult_odd_v8hi" > +(define_insn "altivec_vmuleuh" > + [(set (match_operand:V4SI 0 "register_operand" "=v") > + (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v") > + (match_operand:V8HI 2 "register_operand" "v")] > + UNSPEC_VMULEUH))] > + "TARGET_ALTIVEC" > + "vmuleuh %0,%1,%2" > + [(set_attr "type" "veccomplex")]) > + > +(define_insn "altivec_vmulouh" > [(set (match_operand:V4SI 0 "register_operand" "=v") > (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v") > (match_operand:V8HI 2 "register_operand" "v")] > @@ -1030,7 +1133,16 @@ > "vmulouh %0,%1,%2" > [(set_attr "type" "veccomplex")]) > > -(define_insn "vec_widen_smult_odd_v8hi" > +(define_insn "altivec_vmulesh" > + [(set (match_operand:V4SI 0 "register_operand" "=v") > + (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v") > + (match_operand:V8HI 2 "register_operand" "v")] > + UNSPEC_VMULESH))] > + "TARGET_ALTIVEC" > + "vmulesh %0,%1,%2" > + [(set_attr "type" "veccomplex")]) > + > +(define_insn "altivec_vmulosh" > [(set (match_operand:V4SI 0 "register_operand" "=v") > (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v") > (match_operand:V8HI 2 "register_operand" "v")] > @@ -1047,7 +1159,13 @@ > (match_operand:V4SI 2 "register_operand" "v")] > UNSPEC_VPKPX))] > "TARGET_ALTIVEC" > - "vpkpx %0,%1,%2" > + "* > + { > + if (BYTES_BIG_ENDIAN) > + return \"vpkpx %0,%1,%2\"; > + else > + return \"vpkpx %0,%2,%1\"; > + }" > [(set_attr "type" "vecperm")]) > > (define_insn "altivec_vpks<VI_char>ss" > @@ -1056,7 +1174,13 @@ > (match_operand:VP 2 "register_operand" "v")] > UNSPEC_VPACK_SIGN_SIGN_SAT))] > "<VI_unit>" > - "vpks<VI_char>ss %0,%1,%2" > + "* > + { > + if (BYTES_BIG_ENDIAN) > + return \"vpks<VI_char>ss %0,%1,%2\"; > + else > + return \"vpks<VI_char>ss %0,%2,%1\"; > + }" > [(set_attr "type" "vecperm")]) > > (define_insn "altivec_vpks<VI_char>us" > @@ -1065,7 +1189,13 @@ > (match_operand:VP 2 "register_operand" "v")] > UNSPEC_VPACK_SIGN_UNS_SAT))] > "<VI_unit>" > - "vpks<VI_char>us %0,%1,%2" > + "* > + { > + if (BYTES_BIG_ENDIAN) > + return \"vpks<VI_char>us %0,%1,%2\"; > + else > + return \"vpks<VI_char>us %0,%2,%1\"; > + }" > [(set_attr "type" "vecperm")]) > > (define_insn "altivec_vpku<VI_char>us" > @@ -1074,7 +1204,13 @@ > (match_operand:VP 2 "register_operand" "v")] > UNSPEC_VPACK_UNS_UNS_SAT))] > "<VI_unit>" > - "vpku<VI_char>us %0,%1,%2" > + "* > + { > + if (BYTES_BIG_ENDIAN) > + return \"vpku<VI_char>us %0,%1,%2\"; > + else > + return \"vpku<VI_char>us %0,%2,%1\"; > + }" > [(set_attr "type" "vecperm")]) > > (define_insn "altivec_vpku<VI_char>um" > @@ -1083,7 +1219,13 @@ > (match_operand:VP 2 "register_operand" "v")] > UNSPEC_VPACK_UNS_UNS_MOD))] > "<VI_unit>" > - "vpku<VI_char>um %0,%1,%2" > + "* > + { > + if (BYTES_BIG_ENDIAN) > + return \"vpku<VI_char>um %0,%1,%2\"; > + else > + return \"vpku<VI_char>um %0,%2,%1\"; > + }" > [(set_attr "type" "vecperm")]) > > (define_insn "*altivec_vrl<VI_char>" > @@ -1276,7 +1418,12 @@ > (match_operand:V16QI 3 "register_operand" "")] > UNSPEC_VPERM))] > "TARGET_ALTIVEC" > - "") > +{ > + if (!BYTES_BIG_ENDIAN) { > + altivec_expand_vec_perm_le (operands); > + DONE; > + } > +}) > > (define_expand "vec_perm_constv16qi" > [(match_operand:V16QI 0 "register_operand" "") > @@ -1928,25 +2075,26 @@ > rtx vzero = gen_reg_rtx (V8HImode); > rtx mask = gen_reg_rtx (V16QImode); > rtvec v = rtvec_alloc (16); > + bool be = BYTES_BIG_ENDIAN; > > emit_insn (gen_altivec_vspltish (vzero, const0_rtx)); > > - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 0); > - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 1); > - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 2); > - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 3); > - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 4); > - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 5); > - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 6); > - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 7); > + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 7); > + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 0 : 16); > + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 16 : 6); > + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 1 : 16); > + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 5); > + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 2 : 16); > + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 16 : 4); > + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 3 : 16); > + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 3); > + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 4 : 16); > + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 2); > + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 5 : 16); > + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 1); > + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 6 : 16); > + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 : 0); > + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 7 : 16); > > emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v))); > emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask)); > @@ -1963,25 +2111,26 @@ > rtx vzero = gen_reg_rtx (V4SImode); > rtx mask = gen_reg_rtx (V16QImode); > rtvec v = rtvec_alloc (16); > + bool be = BYTES_BIG_ENDIAN; > > emit_insn (gen_altivec_vspltisw (vzero, const0_rtx)); > > - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 17); > - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 0); > - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 1); > - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 17); > - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 2); > - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 3); > - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17); > - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 4); > - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 5); > - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 17); > - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 6); > - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 7); > + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 7); > + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 17 : 6); > + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 0 : 17); > + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 1 : 16); > + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 5); > + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 17 : 4); > + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 2 : 17); > + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 3 : 16); > + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 3); > + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 17 : 2); > + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 4 : 17); > + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 5 : 16); > + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 1); > + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 : 0); > + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 6 : 17); > + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 7 : 16); > > emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v))); > emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask)); > @@ -1998,25 +2147,26 @@ > rtx vzero = gen_reg_rtx (V8HImode); > rtx mask = gen_reg_rtx (V16QImode); > rtvec v = rtvec_alloc (16); > + bool be = BYTES_BIG_ENDIAN; > > emit_insn (gen_altivec_vspltish (vzero, const0_rtx)); > > - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 8); > - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 9); > - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 10); > - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 11); > - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 12); > - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 13); > - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 14); > - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 15); > + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15); > + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 8 : 16); > + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 16 : 14); > + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 9 : 16); > + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13); > + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 10 : 16); > + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 16 : 12); > + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16); > + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11); > + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 12 : 16); > + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 10); > + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16); > + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 9); > + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 14 : 16); > + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 : 8); > + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16); > > emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v))); > emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask)); > @@ -2033,25 +2183,26 @@ > rtx vzero = gen_reg_rtx (V4SImode); > rtx mask = gen_reg_rtx (V16QImode); > rtvec v = rtvec_alloc (16); > + bool be = BYTES_BIG_ENDIAN; > > emit_insn (gen_altivec_vspltisw (vzero, const0_rtx)); > > - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 17); > - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 8); > - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 9); > - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 17); > - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 10); > - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 11); > - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17); > - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 12); > - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 13); > - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16); > - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 17); > - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 14); > - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 15); > + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15); > + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 17 : 14); > + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 8 : 17); > + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 9 : 16); > + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13); > + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 17 : 12); > + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 10 : 17); > + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16); > + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11); > + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 17 : 10); > + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 12 : 17); > + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16); > + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 9); > + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 : 8); > + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17); > + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16); > > emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v))); > emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask)); > @@ -2071,7 +2222,10 @@ > > emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2])); > emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); > + else > + emit_insn (gen_altivec_vmrghh (operands[0], vo, ve)); > DONE; > }") > > @@ -2088,7 +2242,10 @@ > > emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2])); > emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); > + else > + emit_insn (gen_altivec_vmrglh (operands[0], vo, ve)); > DONE; > }") > > @@ -2105,7 +2262,10 @@ > > emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2])); > emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); > + else > + emit_insn (gen_altivec_vmrghh (operands[0], vo, ve)); > DONE; > }") > > @@ -2122,7 +2282,10 @@ > > emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2])); > emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); > + else > + emit_insn (gen_altivec_vmrglh (operands[0], vo, ve)); > DONE; > }") > > @@ -2139,7 +2302,10 @@ > > emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2])); > emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); > + else > + emit_insn (gen_altivec_vmrghw (operands[0], vo, ve)); > DONE; > }") > > @@ -2156,7 +2322,10 @@ > > emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2])); > emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw (operands[0], ve, vo)); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrglw (operands[0], ve, vo)); > + else > + emit_insn (gen_altivec_vmrglw (operands[0], vo, ve)); > DONE; > }") > > @@ -2173,7 +2342,10 @@ > > emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2])); > emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); > + else > + emit_insn (gen_altivec_vmrghw (operands[0], vo, ve)); > DONE; > }") > > @@ -2190,7 +2362,10 @@ > > emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2])); > emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw (operands[0], ve, vo)); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrglw (operands[0], ve, vo)); > + else > + emit_insn (gen_altivec_vmrglw (operands[0], vo, ve)); > DONE; > }") > > Index: gcc-4_8-test/gcc/config/rs6000/rs6000-protos.h > =================================================================== > --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000-protos.h > +++ gcc-4_8-test/gcc/config/rs6000/rs6000-protos.h > @@ -56,6 +56,7 @@ extern void paired_expand_vector_init (r > extern void rs6000_expand_vector_set (rtx, rtx, int); > extern void rs6000_expand_vector_extract (rtx, rtx, int); > extern bool altivec_expand_vec_perm_const (rtx op[4]); > +extern void altivec_expand_vec_perm_le (rtx op[4]); > extern bool rs6000_expand_vec_perm_const (rtx op[4]); > extern void rs6000_expand_extract_even (rtx, rtx, rtx); > extern void rs6000_expand_interleave (rtx, rtx, rtx, bool); > @@ -122,6 +123,7 @@ extern rtx rs6000_longcall_ref (rtx); > extern void rs6000_fatal_bad_address (rtx); > extern rtx create_TOC_reference (rtx, rtx); > extern void rs6000_split_multireg_move (rtx, rtx); > +extern void rs6000_emit_le_vsx_move (rtx, rtx, enum machine_mode); > extern void rs6000_emit_move (rtx, rtx, enum machine_mode); > extern rtx rs6000_secondary_memory_needed_rtx (enum machine_mode); > extern rtx (*rs6000_legitimize_reload_address_ptr) (rtx, enum machine_mode, > Index: gcc-4_8-test/gcc/config/rs6000/vsx.md > =================================================================== > --- gcc-4_8-test.orig/gcc/config/rs6000/vsx.md > +++ gcc-4_8-test/gcc/config/rs6000/vsx.md > @@ -216,6 +216,359 @@ > ]) > > ;; VSX moves > + > +;; The patterns for LE permuted loads and stores come before the general > +;; VSX moves so they match first. > +(define_insn_and_split "*vsx_le_perm_load_<mode>" > + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") > + (match_operand:VSX_D 1 "memory_operand" "Z"))] > + "!BYTES_BIG_ENDIAN && TARGET_VSX" > + "#" > + "!BYTES_BIG_ENDIAN && TARGET_VSX" > + [(set (match_dup 2) > + (vec_select:<MODE> > + (match_dup 1) > + (parallel [(const_int 1) (const_int 0)]))) > + (set (match_dup 0) > + (vec_select:<MODE> > + (match_dup 2) > + (parallel [(const_int 1) (const_int 0)])))] > + " > +{ > + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0]) > + : operands[0]; > +} > + " > + [(set_attr "type" "vecload") > + (set_attr "length" "8")]) > + > +(define_insn_and_split "*vsx_le_perm_load_<mode>" > + [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") > + (match_operand:VSX_W 1 "memory_operand" "Z"))] > + "!BYTES_BIG_ENDIAN && TARGET_VSX" > + "#" > + "!BYTES_BIG_ENDIAN && TARGET_VSX" > + [(set (match_dup 2) > + (vec_select:<MODE> > + (match_dup 1) > + (parallel [(const_int 2) (const_int 3) > + (const_int 0) (const_int 1)]))) > + (set (match_dup 0) > + (vec_select:<MODE> > + (match_dup 2) > + (parallel [(const_int 2) (const_int 3) > + (const_int 0) (const_int 1)])))] > + " > +{ > + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0]) > + : operands[0]; > +} > + " > + [(set_attr "type" "vecload") > + (set_attr "length" "8")]) > + > +(define_insn_and_split "*vsx_le_perm_load_v8hi" > + [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa") > + (match_operand:V8HI 1 "memory_operand" "Z"))] > + "!BYTES_BIG_ENDIAN && TARGET_VSX" > + "#" > + "!BYTES_BIG_ENDIAN && TARGET_VSX" > + [(set (match_dup 2) > + (vec_select:V8HI > + (match_dup 1) > + (parallel [(const_int 4) (const_int 5) > + (const_int 6) (const_int 7) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3)]))) > + (set (match_dup 0) > + (vec_select:V8HI > + (match_dup 2) > + (parallel [(const_int 4) (const_int 5) > + (const_int 6) (const_int 7) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3)])))] > + " > +{ > + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0]) > + : operands[0]; > +} > + " > + [(set_attr "type" "vecload") > + (set_attr "length" "8")]) > + > +(define_insn_and_split "*vsx_le_perm_load_v16qi" > + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") > + (match_operand:V16QI 1 "memory_operand" "Z"))] > + "!BYTES_BIG_ENDIAN && TARGET_VSX" > + "#" > + "!BYTES_BIG_ENDIAN && TARGET_VSX" > + [(set (match_dup 2) > + (vec_select:V16QI > + (match_dup 1) > + (parallel [(const_int 8) (const_int 9) > + (const_int 10) (const_int 11) > + (const_int 12) (const_int 13) > + (const_int 14) (const_int 15) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3) > + (const_int 4) (const_int 5) > + (const_int 6) (const_int 7)]))) > + (set (match_dup 0) > + (vec_select:V16QI > + (match_dup 2) > + (parallel [(const_int 8) (const_int 9) > + (const_int 10) (const_int 11) > + (const_int 12) (const_int 13) > + (const_int 14) (const_int 15) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3) > + (const_int 4) (const_int 5) > + (const_int 6) (const_int 7)])))] > + " > +{ > + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0]) > + : operands[0]; > +} > + " > + [(set_attr "type" "vecload") > + (set_attr "length" "8")]) > + > +(define_insn "*vsx_le_perm_store_<mode>" > + [(set (match_operand:VSX_D 0 "memory_operand" "=Z") > + (match_operand:VSX_D 1 "vsx_register_operand" "+wa"))] > + "!BYTES_BIG_ENDIAN && TARGET_VSX" > + "#" > + [(set_attr "type" "vecstore") > + (set_attr "length" "12")]) > + > +(define_split > + [(set (match_operand:VSX_D 0 "memory_operand" "") > + (match_operand:VSX_D 1 "vsx_register_operand" ""))] > + "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed" > + [(set (match_dup 2) > + (vec_select:<MODE> > + (match_dup 1) > + (parallel [(const_int 1) (const_int 0)]))) > + (set (match_dup 0) > + (vec_select:<MODE> > + (match_dup 2) > + (parallel [(const_int 1) (const_int 0)])))] > +{ > + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) > + : operands[1]; > +}) > + > +;; The post-reload split requires that we re-permute the source > +;; register in case it is still live. > +(define_split > + [(set (match_operand:VSX_D 0 "memory_operand" "") > + (match_operand:VSX_D 1 "vsx_register_operand" ""))] > + "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed" > + [(set (match_dup 1) > + (vec_select:<MODE> > + (match_dup 1) > + (parallel [(const_int 1) (const_int 0)]))) > + (set (match_dup 0) > + (vec_select:<MODE> > + (match_dup 1) > + (parallel [(const_int 1) (const_int 0)]))) > + (set (match_dup 1) > + (vec_select:<MODE> > + (match_dup 1) > + (parallel [(const_int 1) (const_int 0)])))] > + "") > + > +(define_insn "*vsx_le_perm_store_<mode>" > + [(set (match_operand:VSX_W 0 "memory_operand" "=Z") > + (match_operand:VSX_W 1 "vsx_register_operand" "+wa"))] > + "!BYTES_BIG_ENDIAN && TARGET_VSX" > + "#" > + [(set_attr "type" "vecstore") > + (set_attr "length" "12")]) > + > +(define_split > + [(set (match_operand:VSX_W 0 "memory_operand" "") > + (match_operand:VSX_W 1 "vsx_register_operand" ""))] > + "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed" > + [(set (match_dup 2) > + (vec_select:<MODE> > + (match_dup 1) > + (parallel [(const_int 2) (const_int 3) > + (const_int 0) (const_int 1)]))) > + (set (match_dup 0) > + (vec_select:<MODE> > + (match_dup 2) > + (parallel [(const_int 2) (const_int 3) > + (const_int 0) (const_int 1)])))] > +{ > + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) > + : operands[1]; > +}) > + > +;; The post-reload split requires that we re-permute the source > +;; register in case it is still live. > +(define_split > + [(set (match_operand:VSX_W 0 "memory_operand" "") > + (match_operand:VSX_W 1 "vsx_register_operand" ""))] > + "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed" > + [(set (match_dup 1) > + (vec_select:<MODE> > + (match_dup 1) > + (parallel [(const_int 2) (const_int 3) > + (const_int 0) (const_int 1)]))) > + (set (match_dup 0) > + (vec_select:<MODE> > + (match_dup 1) > + (parallel [(const_int 2) (const_int 3) > + (const_int 0) (const_int 1)]))) > + (set (match_dup 1) > + (vec_select:<MODE> > + (match_dup 1) > + (parallel [(const_int 2) (const_int 3) > + (const_int 0) (const_int 1)])))] > + "") > + > +(define_insn "*vsx_le_perm_store_v8hi" > + [(set (match_operand:V8HI 0 "memory_operand" "=Z") > + (match_operand:V8HI 1 "vsx_register_operand" "+wa"))] > + "!BYTES_BIG_ENDIAN && TARGET_VSX" > + "#" > + [(set_attr "type" "vecstore") > + (set_attr "length" "12")]) > + > +(define_split > + [(set (match_operand:V8HI 0 "memory_operand" "") > + (match_operand:V8HI 1 "vsx_register_operand" ""))] > + "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed" > + [(set (match_dup 2) > + (vec_select:V8HI > + (match_dup 1) > + (parallel [(const_int 4) (const_int 5) > + (const_int 6) (const_int 7) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3)]))) > + (set (match_dup 0) > + (vec_select:V8HI > + (match_dup 2) > + (parallel [(const_int 4) (const_int 5) > + (const_int 6) (const_int 7) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3)])))] > +{ > + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) > + : operands[1]; > +}) > + > +;; The post-reload split requires that we re-permute the source > +;; register in case it is still live. > +(define_split > + [(set (match_operand:V8HI 0 "memory_operand" "") > + (match_operand:V8HI 1 "vsx_register_operand" ""))] > + "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed" > + [(set (match_dup 1) > + (vec_select:V8HI > + (match_dup 1) > + (parallel [(const_int 4) (const_int 5) > + (const_int 6) (const_int 7) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3)]))) > + (set (match_dup 0) > + (vec_select:V8HI > + (match_dup 1) > + (parallel [(const_int 4) (const_int 5) > + (const_int 6) (const_int 7) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3)]))) > + (set (match_dup 1) > + (vec_select:V8HI > + (match_dup 1) > + (parallel [(const_int 4) (const_int 5) > + (const_int 6) (const_int 7) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3)])))] > + "") > + > +(define_insn "*vsx_le_perm_store_v16qi" > + [(set (match_operand:V16QI 0 "memory_operand" "=Z") > + (match_operand:V16QI 1 "vsx_register_operand" "+wa"))] > + "!BYTES_BIG_ENDIAN && TARGET_VSX" > + "#" > + [(set_attr "type" "vecstore") > + (set_attr "length" "12")]) > + > +(define_split > + [(set (match_operand:V16QI 0 "memory_operand" "") > + (match_operand:V16QI 1 "vsx_register_operand" ""))] > + "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed" > + [(set (match_dup 2) > + (vec_select:V16QI > + (match_dup 1) > + (parallel [(const_int 8) (const_int 9) > + (const_int 10) (const_int 11) > + (const_int 12) (const_int 13) > + (const_int 14) (const_int 15) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3) > + (const_int 4) (const_int 5) > + (const_int 6) (const_int 7)]))) > + (set (match_dup 0) > + (vec_select:V16QI > + (match_dup 2) > + (parallel [(const_int 8) (const_int 9) > + (const_int 10) (const_int 11) > + (const_int 12) (const_int 13) > + (const_int 14) (const_int 15) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3) > + (const_int 4) (const_int 5) > + (const_int 6) (const_int 7)])))] > +{ > + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) > + : operands[1]; > +}) > + > +;; The post-reload split requires that we re-permute the source > +;; register in case it is still live. > +(define_split > + [(set (match_operand:V16QI 0 "memory_operand" "") > + (match_operand:V16QI 1 "vsx_register_operand" ""))] > + "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed" > + [(set (match_dup 1) > + (vec_select:V16QI > + (match_dup 1) > + (parallel [(const_int 8) (const_int 9) > + (const_int 10) (const_int 11) > + (const_int 12) (const_int 13) > + (const_int 14) (const_int 15) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3) > + (const_int 4) (const_int 5) > + (const_int 6) (const_int 7)]))) > + (set (match_dup 0) > + (vec_select:V16QI > + (match_dup 1) > + (parallel [(const_int 8) (const_int 9) > + (const_int 10) (const_int 11) > + (const_int 12) (const_int 13) > + (const_int 14) (const_int 15) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3) > + (const_int 4) (const_int 5) > + (const_int 6) (const_int 7)]))) > + (set (match_dup 1) > + (vec_select:V16QI > + (match_dup 1) > + (parallel [(const_int 8) (const_int 9) > + (const_int 10) (const_int 11) > + (const_int 12) (const_int 13) > + (const_int 14) (const_int 15) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3) > + (const_int 4) (const_int 5) > + (const_int 6) (const_int 7)])))] > + "") > + > + > (define_insn "*vsx_mov<mode>" > [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=Z,<VSr>,<VSr>,?Z,?wa,?wa,wQ,?&r,??Y,??r,??r,<VSr>,?wa,*r,v,wZ, v") > (match_operand:VSX_M 1 "input_operand" "<VSr>,Z,<VSr>,wa,Z,wa,r,wQ,r,Y,r,j,j,j,W,v,wZ"))] > @@ -962,7 +1315,12 @@ > (match_operand:<VS_scalar> 1 "vsx_register_operand" "ws,wa") > (match_operand:<VS_scalar> 2 "vsx_register_operand" "ws,wa")))] > "VECTOR_MEM_VSX_P (<MODE>mode)" > - "xxpermdi %x0,%x1,%x2,0" > +{ > + if (BYTES_BIG_ENDIAN) > + return "xxpermdi %x0,%x1,%x2,0"; > + else > + return "xxpermdi %x0,%x2,%x1,0"; > +} > [(set_attr "type" "vecperm")]) > > ;; Special purpose concat using xxpermdi to glue two single precision values > @@ -975,9 +1333,161 @@ > (match_operand:SF 2 "vsx_register_operand" "f,f")] > UNSPEC_VSX_CONCAT))] > "VECTOR_MEM_VSX_P (V2DFmode)" > - "xxpermdi %x0,%x1,%x2,0" > +{ > + if (BYTES_BIG_ENDIAN) > + return "xxpermdi %x0,%x1,%x2,0"; > + else > + return "xxpermdi %x0,%x2,%x1,0"; > +} > + [(set_attr "type" "vecperm")]) > + > +;; xxpermdi for little endian loads and stores. We need several of > +;; these since the form of the PARALLEL differs by mode. > +(define_insn "*vsx_xxpermdi2_le_<mode>" > + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") > + (vec_select:VSX_D > + (match_operand:VSX_D 1 "vsx_register_operand" "wa") > + (parallel [(const_int 1) (const_int 0)])))] > + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)" > + "xxpermdi %x0,%x1,%x1,2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "*vsx_xxpermdi4_le_<mode>" > + [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") > + (vec_select:VSX_W > + (match_operand:VSX_W 1 "vsx_register_operand" "wa") > + (parallel [(const_int 2) (const_int 3) > + (const_int 0) (const_int 1)])))] > + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)" > + "xxpermdi %x0,%x1,%x1,2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "*vsx_xxpermdi8_le_V8HI" > + [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa") > + (vec_select:V8HI > + (match_operand:V8HI 1 "vsx_register_operand" "wa") > + (parallel [(const_int 4) (const_int 5) > + (const_int 6) (const_int 7) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3)])))] > + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)" > + "xxpermdi %x0,%x1,%x1,2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "*vsx_xxpermdi16_le_V16QI" > + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") > + (vec_select:V16QI > + (match_operand:V16QI 1 "vsx_register_operand" "wa") > + (parallel [(const_int 8) (const_int 9) > + (const_int 10) (const_int 11) > + (const_int 12) (const_int 13) > + (const_int 14) (const_int 15) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3) > + (const_int 4) (const_int 5) > + (const_int 6) (const_int 7)])))] > + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)" > + "xxpermdi %x0,%x1,%x1,2" > [(set_attr "type" "vecperm")]) > > +;; lxvd2x for little endian loads. We need several of > +;; these since the form of the PARALLEL differs by mode. > +(define_insn "*vsx_lxvd2x2_le_<mode>" > + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") > + (vec_select:VSX_D > + (match_operand:VSX_D 1 "memory_operand" "Z") > + (parallel [(const_int 1) (const_int 0)])))] > + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)" > + "lxvd2x %x0,%y1" > + [(set_attr "type" "vecload")]) > + > +(define_insn "*vsx_lxvd2x4_le_<mode>" > + [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") > + (vec_select:VSX_W > + (match_operand:VSX_W 1 "memory_operand" "Z") > + (parallel [(const_int 2) (const_int 3) > + (const_int 0) (const_int 1)])))] > + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)" > + "lxvd2x %x0,%y1" > + [(set_attr "type" "vecload")]) > + > +(define_insn "*vsx_lxvd2x8_le_V8HI" > + [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa") > + (vec_select:V8HI > + (match_operand:V8HI 1 "memory_operand" "Z") > + (parallel [(const_int 4) (const_int 5) > + (const_int 6) (const_int 7) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3)])))] > + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)" > + "lxvd2x %x0,%y1" > + [(set_attr "type" "vecload")]) > + > +(define_insn "*vsx_lxvd2x16_le_V16QI" > + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") > + (vec_select:V16QI > + (match_operand:V16QI 1 "memory_operand" "Z") > + (parallel [(const_int 8) (const_int 9) > + (const_int 10) (const_int 11) > + (const_int 12) (const_int 13) > + (const_int 14) (const_int 15) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3) > + (const_int 4) (const_int 5) > + (const_int 6) (const_int 7)])))] > + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)" > + "lxvd2x %x0,%y1" > + [(set_attr "type" "vecload")]) > + > +;; stxvd2x for little endian stores. We need several of > +;; these since the form of the PARALLEL differs by mode. > +(define_insn "*vsx_stxvd2x2_le_<mode>" > + [(set (match_operand:VSX_D 0 "memory_operand" "=Z") > + (vec_select:VSX_D > + (match_operand:VSX_D 1 "vsx_register_operand" "wa") > + (parallel [(const_int 1) (const_int 0)])))] > + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)" > + "stxvd2x %x1,%y0" > + [(set_attr "type" "vecstore")]) > + > +(define_insn "*vsx_stxvd2x4_le_<mode>" > + [(set (match_operand:VSX_W 0 "memory_operand" "=Z") > + (vec_select:VSX_W > + (match_operand:VSX_W 1 "vsx_register_operand" "wa") > + (parallel [(const_int 2) (const_int 3) > + (const_int 0) (const_int 1)])))] > + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)" > + "stxvd2x %x1,%y0" > + [(set_attr "type" "vecstore")]) > + > +(define_insn "*vsx_stxvd2x8_le_V8HI" > + [(set (match_operand:V8HI 0 "memory_operand" "=Z") > + (vec_select:V8HI > + (match_operand:V8HI 1 "vsx_register_operand" "wa") > + (parallel [(const_int 4) (const_int 5) > + (const_int 6) (const_int 7) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3)])))] > + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)" > + "stxvd2x %x1,%y0" > + [(set_attr "type" "vecstore")]) > + > +(define_insn "*vsx_stxvd2x16_le_V16QI" > + [(set (match_operand:V16QI 0 "memory_operand" "=Z") > + (vec_select:V16QI > + (match_operand:V16QI 1 "vsx_register_operand" "wa") > + (parallel [(const_int 8) (const_int 9) > + (const_int 10) (const_int 11) > + (const_int 12) (const_int 13) > + (const_int 14) (const_int 15) > + (const_int 0) (const_int 1) > + (const_int 2) (const_int 3) > + (const_int 4) (const_int 5) > + (const_int 6) (const_int 7)])))] > + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)" > + "stxvd2x %x1,%y0" > + [(set_attr "type" "vecstore")]) > + > ;; Set the element of a V2DI/VD2F mode > (define_insn "vsx_set_<mode>" > [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?wa") > @@ -987,9 +1497,10 @@ > UNSPEC_VSX_SET))] > "VECTOR_MEM_VSX_P (<MODE>mode)" > { > - if (INTVAL (operands[3]) == 0) > + int idx_first = BYTES_BIG_ENDIAN ? 0 : 1; > + if (INTVAL (operands[3]) == idx_first) > return \"xxpermdi %x0,%x2,%x1,1\"; > - else if (INTVAL (operands[3]) == 1) > + else if (INTVAL (operands[3]) == 1 - idx_first) > return \"xxpermdi %x0,%x1,%x2,0\"; > else > gcc_unreachable (); > @@ -1004,8 +1515,12 @@ > [(match_operand:QI 2 "u5bit_cint_operand" "i,i,i")])))] > "VECTOR_MEM_VSX_P (<MODE>mode)" > { > + int fldDM; > gcc_assert (UINTVAL (operands[2]) <= 1); > - operands[3] = GEN_INT (INTVAL (operands[2]) << 1); > + fldDM = INTVAL (operands[2]) << 1; > + if (!BYTES_BIG_ENDIAN) > + fldDM = 3 - fldDM; > + operands[3] = GEN_INT (fldDM); > return \"xxpermdi %x0,%x1,%x1,%3\"; > } > [(set_attr "type" "vecperm")]) > @@ -1025,6 +1540,21 @@ > (const_string "fpload"))) > (set_attr "length" "4")]) > > +;; Optimize extracting element 1 from memory for little endian > +(define_insn "*vsx_extract_<mode>_one_le" > + [(set (match_operand:<VS_scalar> 0 "vsx_register_operand" "=ws,d,?wa") > + (vec_select:<VS_scalar> > + (match_operand:VSX_D 1 "indexed_or_indirect_operand" "Z,Z,Z") > + (parallel [(const_int 1)])))] > + "VECTOR_MEM_VSX_P (<MODE>mode) && !WORDS_BIG_ENDIAN" > + "lxsd%U1x %x0,%y1" > + [(set (attr "type") > + (if_then_else > + (match_test "update_indexed_address_mem (operands[1], VOIDmode)") > + (const_string "fpload_ux") > + (const_string "fpload"))) > + (set_attr "length" "4")]) > + > ;; Extract a SF element from V4SF > (define_insn_and_split "vsx_extract_v4sf" > [(set (match_operand:SF 0 "vsx_register_operand" "=f,f") > @@ -1045,7 +1575,7 @@ > rtx op2 = operands[2]; > rtx op3 = operands[3]; > rtx tmp; > - HOST_WIDE_INT ele = INTVAL (op2); > + HOST_WIDE_INT ele = BYTES_BIG_ENDIAN ? INTVAL (op2) : 3 - INTVAL (op2); > > if (ele == 0) > tmp = op1; > Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/fusion.c > =================================================================== > --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/fusion.c > +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/fusion.c > @@ -1,5 +1,6 @@ > /* { dg-do compile { target { powerpc*-*-* } } } */ > /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ > +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */ > /* { dg-require-effective-target powerpc_p8vector_ok } */ > /* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */ > > Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr43154.c > =================================================================== > --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/pr43154.c > +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr43154.c > @@ -1,5 +1,6 @@ > /* { dg-do compile { target { powerpc*-*-* } } } */ > /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ > +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */ > /* { dg-require-effective-target powerpc_vsx_ok } */ > /* { dg-options "-O2 -mcpu=power7" } */ > > Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c > =================================================================== > --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c > +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c > @@ -19,19 +19,6 @@ V b4(V x) > return __builtin_shuffle(x, (V){ 4,5,6,7, 4,5,6,7, 4,5,6,7, 4,5,6,7, }); > } > > -V p2(V x, V y) > -{ > - return __builtin_shuffle(x, y, > - (V){ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 }); > - > -} > - > -V p4(V x, V y) > -{ > - return __builtin_shuffle(x, y, > - (V){ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 }); > -} > - > V h1(V x, V y) > { > return __builtin_shuffle(x, y, > @@ -72,5 +59,3 @@ V l4(V x, V y) > /* { dg-final { scan-assembler "vspltb" } } */ > /* { dg-final { scan-assembler "vsplth" } } */ > /* { dg-final { scan-assembler "vspltw" } } */ > -/* { dg-final { scan-assembler "vpkuhum" } } */ > -/* { dg-final { scan-assembler "vpkuwum" } } */ > Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c > =================================================================== > --- /dev/null > +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c > @@ -0,0 +1,23 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target powerpc_altivec_ok } */ > +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */ > +/* { dg-options "-O -maltivec -mno-vsx" } */ > + > +typedef unsigned char V __attribute__((vector_size(16))); > + > +V p2(V x, V y) > +{ > + return __builtin_shuffle(x, y, > + (V){ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 }); > + > +} > + > +V p4(V x, V y) > +{ > + return __builtin_shuffle(x, y, > + (V){ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 }); > +} > + > +/* { dg-final { scan-assembler-not "vperm" } } */ > +/* { dg-final { scan-assembler "vpkuhum" } } */ > +/* { dg-final { scan-assembler "vpkuwum" } } */ > Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/eg-5.c > =================================================================== > --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vmx/eg-5.c > +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/eg-5.c > @@ -7,10 +7,17 @@ matvecmul4 (vector float c0, vector floa > /* Set result to a vector of f32 0's */ > vector float result = ((vector float){0.,0.,0.,0.}); > > +#ifdef __LITTLE_ENDIAN__ > + result = vec_madd (c0, vec_splat (v, 3), result); > + result = vec_madd (c1, vec_splat (v, 2), result); > + result = vec_madd (c2, vec_splat (v, 1), result); > + result = vec_madd (c3, vec_splat (v, 0), result); > +#else > result = vec_madd (c0, vec_splat (v, 0), result); > result = vec_madd (c1, vec_splat (v, 1), result); > result = vec_madd (c2, vec_splat (v, 2), result); > result = vec_madd (c3, vec_splat (v, 3), result); > +#endif > > return result; > } > Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c > =================================================================== > --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c > +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c > @@ -13,12 +13,27 @@ > #define DO_INLINE __attribute__ ((always_inline)) > #define DONT_INLINE __attribute__ ((noinline)) > > +#ifdef __LITTLE_ENDIAN__ > +static inline DO_INLINE int inline_me(vector signed short data) > +{ > + union {vector signed short v; signed short s[8];} u; > + signed short x; > + unsigned char x1, x2; > + > + u.v = data; > + x = u.s[7]; > + x1 = (x >> 8) & 0xff; > + x2 = x & 0xff; > + return ((x2 << 8) | x1); > +} > +#else > static inline DO_INLINE int inline_me(vector signed short data) > { > union {vector signed short v; signed short s[8];} u; > u.v = data; > return u.s[7]; > } > +#endif > > static DONT_INLINE int foo(vector signed short data) > { > Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/vec-set.c > =================================================================== > --- /dev/null > +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/vec-set.c > @@ -0,0 +1,14 @@ > +#include "harness.h" > + > +vector short > +vec_set (short m) > +{ > + return (vector short){m, 0, 0, 0, 0, 0, 0, 0}; > +} > + > +static void test() > +{ > + check (vec_all_eq (vec_set (7), > + ((vector short){7, 0, 0, 0, 0, 0, 0, 0})), > + "vec_set"); > +} > Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/3b-15.c > =================================================================== > --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vmx/3b-15.c > +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/3b-15.c > @@ -3,7 +3,11 @@ > vector unsigned char > f (vector unsigned char a, vector unsigned char b, vector unsigned char c) > { > +#ifdef __BIG_ENDIAN__ > return vec_perm(a,b,c); > +#else > + return vec_perm(b,a,c); > +#endif > } > > static void test() > @@ -12,8 +16,13 @@ static void test() > 8,9,10,11,12,13,14,15}), > ((vector unsigned char){70,71,72,73,74,75,76,77, > 78,79,80,81,82,83,84,85}), > +#ifdef __BIG_ENDIAN__ > ((vector unsigned char){0x1,0x14,0x18,0x10,0x16,0x15,0x19,0x1a, > 0x1c,0x1c,0x1c,0x12,0x8,0x1d,0x1b,0xe})), > +#else > + ((vector unsigned char){0x1e,0xb,0x7,0xf,0x9,0xa,0x6,0x5, > + 0x3,0x3,0x3,0xd,0x17,0x2,0x4,0x11})), > +#endif > ((vector unsigned char){1,74,78,70,76,75,79,80,82,82,82,72,8,83,81,14})), > "f"); > } > Index: gcc-4_8-test/libcpp/lex.c > =================================================================== > --- gcc-4_8-test.orig/libcpp/lex.c > +++ gcc-4_8-test/libcpp/lex.c > @@ -559,8 +559,13 @@ search_line_fast (const uchar *s, const > beginning with all ones and shifting in zeros according to the > mis-alignment. The LVSR instruction pulls the exact shift we > want from the address. */ > +#ifdef __BIG_ENDIAN__ > mask = __builtin_vec_lvsr(0, s); > mask = __builtin_vec_perm(zero, ones, mask); > +#else > + mask = __builtin_vec_lvsl(0, s); > + mask = __builtin_vec_perm(ones, zero, mask); > +#endif > data &= mask; > > /* While altivec loads mask addresses, we still need to align S so > @@ -624,7 +629,11 @@ search_line_fast (const uchar *s, const > /* L now contains 0xff in bytes for which we matched one of the > relevant characters. We can find the byte index by finding > its bit index and dividing by 8. */ > +#ifdef __BIG_ENDIAN__ > l = __builtin_clzl(l) >> 3; > +#else > + l = __builtin_ctzl(l) >> 3; > +#endif > return s + l; > > #undef N > Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr48258-1.c > =================================================================== > --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/pr48258-1.c > +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr48258-1.c > @@ -1,5 +1,6 @@ > /* { dg-do compile } */ > /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ > +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */ > /* { dg-require-effective-target powerpc_vsx_ok } */ > /* { dg-options "-O3 -mcpu=power7 -mabi=altivec -ffast-math -fno-unroll-loops" } */ > /* { dg-final { scan-assembler-times "xvaddsp" 3 } } */ > Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c > =================================================================== > --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c > +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c > @@ -1,4 +1,5 @@ > /* { dg-require-effective-target vect_int } */ > +/* { dg-skip-if "cost too high" { powerpc*le-*-* } { "*" } { "" } } */ > > #include <stdarg.h> > #include "../../tree-vect.h" > > > > >
On Wed, Mar 19, 2014 at 3:30 PM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote: > Hi, > > This patch (diff-le-vector) backports the changes to support vector > infrastructure on powerpc64le. Copying Richard and Jakub for the libcpp > bits. > > Thanks, > Bill > > > [gcc] > > 2014-03-29 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > Backport from mainline r205333 > 2013-11-24 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/rs6000.c (rs6000_expand_vec_perm_const_1): Correct > for little endian. > > Backport from mainline r205241 > 2013-11-21 Bill Schmidt <wschmidt@vnet.ibm.com> > > * config/rs6000/vector.md (vec_pack_trunc_v2df): Revert previous > little endian change. > (vec_pack_sfix_trunc_v2df): Likewise. > (vec_pack_ufix_trunc_v2df): Likewise. > * config/rs6000/rs6000.c (rs6000_expand_interleave): Correct > double checking of endianness. > > Backport from mainline r205146 > 2013-11-20 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/vsx.md (vsx_set_<mode>): Adjust for little endian. > (vsx_extract_<mode>): Likewise. > (*vsx_extract_<mode>_one_le): New LE variant on > *vsx_extract_<mode>_zero. > (vsx_extract_v4sf): Adjust for little endian. > > Backport from mainline r205080 > 2013-11-19 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Adjust > V16QI vector splat case for little endian. > > Backport from mainline r205045: > > 2013-11-19 Ulrich Weigand <Ulrich.Weigand@de.ibm.com> > > * config/rs6000/vector.md ("mov<mode>"): Do not call > rs6000_emit_le_vsx_move to move into or out of GPRs. > * config/rs6000/rs6000.c (rs6000_emit_le_vsx_move): Assert > source and destination are not GPR hard regs. > > Backport from mainline r204920 > 2011-11-17 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/rs6000.c (rs6000_frame_related): Add split_reg > parameter and use it in REG_FRAME_RELATED_EXPR note. > (emit_frame_save): Call rs6000_frame_related with extra NULL_RTX > parameter. > (rs6000_emit_prologue): Likewise, but for little endian VSX > stores, pass the source register of the store instead. > > Backport from mainline r204862 > 2013-11-15 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/altivec.md (UNSPEC_VPERM_X, UNSPEC_VPERM_UNS_X): > Remove. > (altivec_vperm_<mode>): Revert earlier little endian change. > (*altivec_vperm_<mode>_internal): Remove. > (altivec_vperm_<mode>_uns): Revert earlier little endian change. > (*altivec_vperm_<mode>_uns_internal): Remove. > * config/rs6000/vector.md (vec_realign_load_<mode>): Revise > commentary. > > Backport from mainline r204441 > 2013-11-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/rs6000.c (rs6000_option_override_internal): > Remove restriction against use of VSX instructions when generating > code for little endian mode. > > Backport from mainline r204440 > 2013-11-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/altivec.md (mulv4si3): Ensure we generate vmulouh > for both big and little endian. > (mulv8hi3): Swap input operands for merge high and merge low > instructions for little endian. > > Backport from mainline r204439 > 2013-11-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/altivec.md (vec_widen_umult_even_v16qi): Change > define_insn to define_expand that uses even patterns for big > endian and odd patterns for little endian. > (vec_widen_smult_even_v16qi): Likewise. > (vec_widen_umult_even_v8hi): Likewise. > (vec_widen_smult_even_v8hi): Likewise. > (vec_widen_umult_odd_v16qi): Likewise. > (vec_widen_smult_odd_v16qi): Likewise. > (vec_widen_umult_odd_v8hi): Likewise. > (vec_widen_smult_odd_v8hi): Likewise. > (altivec_vmuleub): New define_insn. > (altivec_vmuloub): Likewise. > (altivec_vmulesb): Likewise. > (altivec_vmulosb): Likewise. > (altivec_vmuleuh): Likewise. > (altivec_vmulouh): Likewise. > (altivec_vmulesh): Likewise. > (altivec_vmulosh): Likewise. > > Backport from mainline r204395 > 2013-11-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/vector.md (vec_pack_sfix_trunc_v2df): Adjust for > little endian. > (vec_pack_ufix_trunc_v2df): Likewise. > > Backport from mainline r204363 > 2013-11-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/altivec.md (vec_widen_umult_hi_v16qi): Swap > arguments to merge instruction for little endian. > (vec_widen_umult_lo_v16qi): Likewise. > (vec_widen_smult_hi_v16qi): Likewise. > (vec_widen_smult_lo_v16qi): Likewise. > (vec_widen_umult_hi_v8hi): Likewise. > (vec_widen_umult_lo_v8hi): Likewise. > (vec_widen_smult_hi_v8hi): Likewise. > (vec_widen_smult_lo_v8hi): Likewise. > > Backport from mainline r204350 > 2013-11-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/vsx.md (*vsx_le_perm_store_<mode> for VSX_D): > Replace the define_insn_and_split with a define_insn and two > define_splits, with the split after reload re-permuting the source > register to its original value. > (*vsx_le_perm_store_<mode> for VSX_W): Likewise. > (*vsx_le_perm_store_v8hi): Likewise. > (*vsx_le_perm_store_v16qi): Likewise. > > Backport from mainline r204321 > 2013-11-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/vector.md (vec_pack_trunc_v2df): Adjust for > little endian. > > Backport from mainline r204321 > 2013-11-02 Bill Schmidt <wschmidt@vnet.linux.ibm.com> > > * config/rs6000/rs6000.c (rs6000_expand_vector_set): Adjust for > little endian. > > Backport from mainline r203980 > 2013-10-23 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/altivec.md (mulv8hi3): Adjust for little endian. > > Backport from mainline r203930 > 2013-10-22 Bill Schmidt <wschmidt@vnet.ibm.com> > > * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse > meaning of merge-high and merge-low masks for little endian; avoid > use of vector-pack masks for little endian for mismatched modes. > > Backport from mainline r203877 > 2013-10-20 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Adjust for > little endian. > (vec_unpacku_hi_v8hi): Likewise. > (vec_unpacku_lo_v16qi): Likewise. > (vec_unpacku_lo_v8hi): Likewise. > > Backport from mainline r203863 > 2013-10-19 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/rs6000.c (vspltis_constant): Make sure we check > all elements for both endian flavors. > > Backport from mainline r203714 > 2013-10-16 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * gcc/config/rs6000/vector.md (vec_unpacks_hi_v4sf): Correct for > endianness. > (vec_unpacks_lo_v4sf): Likewise. > (vec_unpacks_float_hi_v4si): Likewise. > (vec_unpacks_float_lo_v4si): Likewise. > (vec_unpacku_float_hi_v4si): Likewise. > (vec_unpacku_float_lo_v4si): Likewise. > > Backport from mainline r203713 > 2013-10-16 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/vsx.md (vsx_concat_<mode>): Adjust output for LE. > (vsx_concat_v2sf): Likewise. > > Backport from mainline r203458 > 2013-10-11 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/vsx.md (*vsx_le_perm_load_v2di): Generalize to > handle vector float as well. > (*vsx_le_perm_load_v4si): Likewise. > (*vsx_le_perm_store_v2di): Likewise. > (*vsx_le_perm_store_v4si): Likewise. > > Backport from mainline r203457 > 2013-10-11 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/vector.md (vec_realign_load<mode>): Generate vperm > directly to circumvent subtract from splat{31} workaround. > * config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_le): New > prototype. > * config/rs6000/rs6000.c (altivec_expand_vec_perm_le): New. > * config/rs6000/altivec.md (define_c_enum "unspec"): Add > UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X. > (altivec_vperm_<mode>): Convert to define_insn_and_split to > separate big and little endian logic. > (*altivec_vperm_<mode>_internal): New define_insn. > (altivec_vperm_<mode>_uns): Convert to define_insn_and_split to > separate big and little endian logic. > (*altivec_vperm_<mode>_uns_internal): New define_insn. > (vec_permv16qi): Add little endian logic. > > Backport from mainline r203247 > 2013-10-07 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/rs6000.c (altivec_expand_vec_perm_const_le): New. > (altivec_expand_vec_perm_const): Call it. > > Backport from mainline r203246 > 2013-10-07 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * config/rs6000/vector.md (mov<mode>): Emit permuted move > sequences for LE VSX loads and stores at expand time. > * config/rs6000/rs6000-protos.h (rs6000_emit_le_vsx_move): New > prototype. > * config/rs6000/rs6000.c (rs6000_const_vec): New. > (rs6000_gen_le_vsx_permute): New. > (rs6000_gen_le_vsx_load): New. > (rs6000_gen_le_vsx_store): New. > (rs6000_gen_le_vsx_move): New. > * config/rs6000/vsx.md (*vsx_le_perm_load_v2di): New. > (*vsx_le_perm_load_v4si): New. > (*vsx_le_perm_load_v8hi): New. > (*vsx_le_perm_load_v16qi): New. > (*vsx_le_perm_store_v2di): New. > (*vsx_le_perm_store_v4si): New. > (*vsx_le_perm_store_v8hi): New. > (*vsx_le_perm_store_v16qi): New. > (*vsx_xxpermdi2_le_<mode>): New. > (*vsx_xxpermdi4_le_<mode>): New. > (*vsx_xxpermdi8_le_V8HI): New. > (*vsx_xxpermdi16_le_V16QI): New. > (*vsx_lxvd2x2_le_<mode>): New. > (*vsx_lxvd2x4_le_<mode>): New. > (*vsx_lxvd2x8_le_V8HI): New. > (*vsx_lxvd2x16_le_V16QI): New. > (*vsx_stxvd2x2_le_<mode>): New. > (*vsx_stxvd2x4_le_<mode>): New. > (*vsx_stxvd2x8_le_V8HI): New. > (*vsx_stxvd2x16_le_V16QI): New. > > Backport from mainline r201235 > 2013-07-24 Bill Schmidt <wschmidt@linux.ibm.com> > Anton Blanchard <anton@au1.ibm.com> > > * config/rs6000/altivec.md (altivec_vpkpx): Handle little endian. > (altivec_vpks<VI_char>ss): Likewise. > (altivec_vpks<VI_char>us): Likewise. > (altivec_vpku<VI_char>us): Likewise. > (altivec_vpku<VI_char>um): Likewise. > > Backport from mainline r201208 > 2013-07-24 Bill Schmidt <wschmidt@vnet.linux.ibm.com> > Anton Blanchard <anton@au1.ibm.com> > > * config/rs6000/vector.md (vec_realign_load_<mode>): Reorder input > operands to vperm for little endian. > * config/rs6000/rs6000.c (rs6000_expand_builtin): Use lvsr instead > of lvsl to create the control mask for a vperm for little endian. > > Backport from mainline r201195 > 2013-07-23 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > Anton Blanchard <anton@au1.ibm.com> > > * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse > two operands for little-endian. > > Backport from mainline r201193 > 2013-07-23 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > Anton Blanchard <anton@au1.ibm.com> > > * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Correct > selection of field for vector splat in little endian mode. > > Backport from mainline r201149 > 2013-07-22 Bill Schmidt <wschmidt@vnet.linux.ibm.com> > Anton Blanchard <anton@au1.ibm.com> > > * config/rs6000/rs6000.c (rs6000_expand_vector_init): Fix > endianness when selecting field to splat. > > [gcc/testsuite] > > 2014-03-29 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > Backport from mainline r205638 > 2013-12-03 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c: Skip for little > endian. > > Backport from mainline r205146 > 2013-11-20 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * gcc.target/powerpc/pr48258-1.c: Skip for little endian. > > Backport from mainline r204862 > 2013-11-15 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * gcc.dg/vmx/3b-15.c: Revise for little endian. > > Backport from mainline r204321 > 2013-11-02 Bill Schmidt <wschmidt@vnet.linux.ibm.com> > > * gcc.dg/vmx/vec-set.c: New. > > Backport from mainline r204138 > 2013-10-28 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * gcc.dg/vmx/gcc-bug-i.c: Add little endian variant. > * gcc.dg/vmx/eg-5.c: Likewise. > > Backport from mainline r203930 > 2013-10-22 Bill Schmidt <wschmidt@vnet.ibm.com> > > * gcc.target/powerpc/altivec-perm-1.c: Move the two vector pack > tests into... > * gcc.target/powerpc/altivec-perm-3.c: ...this new test, which is > restricted to big-endian targets. > > Backport from mainline r203246 > 2013-10-07 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * gcc.target/powerpc/pr43154.c: Skip for ppc64 little endian. > * gcc.target/powerpc/fusion.c: Likewise. > > [libcpp] > > 2014-03-29 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > Backport from mainline > 2013-11-18 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * lex.c (search_line_fast): Correct for little endian. PowerPC bits are okay. Thanks, David
Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c =================================================================== --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c @@ -3216,11 +3216,6 @@ rs6000_option_override_internal (bool gl } else if (TARGET_PAIRED_FLOAT) msg = N_("-mvsx and -mpaired are incompatible"); - /* The hardware will allow VSX and little endian, but until we make sure - things like vector select, etc. work don't allow VSX on little endian - systems at this point. */ - else if (!BYTES_BIG_ENDIAN) - msg = N_("-mvsx used with little endian code"); else if (TARGET_AVOID_XFORM > 0) msg = N_("-mvsx needs indexed addressing"); else if (!TARGET_ALTIVEC && (rs6000_isa_flags_explicit @@ -4991,15 +4986,16 @@ vspltis_constant (rtx op, unsigned step, /* Check if VAL is present in every STEP-th element, and the other elements are filled with its most significant bit. */ - for (i = 0; i < nunits - 1; ++i) + for (i = 1; i < nunits; ++i) { HOST_WIDE_INT desired_val; - if (((BYTES_BIG_ENDIAN ? i + 1 : i) & (step - 1)) == 0) + unsigned elt = BYTES_BIG_ENDIAN ? nunits - 1 - i : i; + if ((i & (step - 1)) == 0) desired_val = val; else desired_val = msb_val; - if (desired_val != const_vector_elt_as_int (op, i)) + if (desired_val != const_vector_elt_as_int (op, elt)) return false; } @@ -5446,6 +5442,7 @@ rs6000_expand_vector_init (rtx target, r of 64-bit items is not supported on Altivec. */ if (all_same && GET_MODE_SIZE (inner_mode) <= 4) { + rtx field; mem = assign_stack_temp (mode, GET_MODE_SIZE (inner_mode)); emit_move_insn (adjust_address_nv (mem, inner_mode, 0), XVECEXP (vals, 0, 0)); @@ -5456,9 +5453,11 @@ rs6000_expand_vector_init (rtx target, r gen_rtx_SET (VOIDmode, target, mem), x))); + field = (BYTES_BIG_ENDIAN ? const0_rtx + : GEN_INT (GET_MODE_NUNITS (mode) - 1)); x = gen_rtx_VEC_SELECT (inner_mode, target, gen_rtx_PARALLEL (VOIDmode, - gen_rtvec (1, const0_rtx))); + gen_rtvec (1, field))); emit_insn (gen_rtx_SET (VOIDmode, target, gen_rtx_VEC_DUPLICATE (mode, x))); return; @@ -5531,10 +5530,27 @@ rs6000_expand_vector_set (rtx target, rt XVECEXP (mask, 0, elt*width + i) = GEN_INT (i + 0x10); x = gen_rtx_CONST_VECTOR (V16QImode, XVEC (mask, 0)); - x = gen_rtx_UNSPEC (mode, - gen_rtvec (3, target, reg, - force_reg (V16QImode, x)), - UNSPEC_VPERM); + + if (BYTES_BIG_ENDIAN) + x = gen_rtx_UNSPEC (mode, + gen_rtvec (3, target, reg, + force_reg (V16QImode, x)), + UNSPEC_VPERM); + else + { + /* Invert selector. */ + rtx splat = gen_rtx_VEC_DUPLICATE (V16QImode, + gen_rtx_CONST_INT (QImode, -1)); + rtx tmp = gen_reg_rtx (V16QImode); + emit_move_insn (tmp, splat); + x = gen_rtx_MINUS (V16QImode, tmp, force_reg (V16QImode, x)); + emit_move_insn (tmp, x); + + /* Permute with operands reversed and adjusted selector. */ + x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp), + UNSPEC_VPERM); + } + emit_insn (gen_rtx_SET (VOIDmode, target, x)); } @@ -7830,6 +7846,107 @@ rs6000_eliminate_indexed_memrefs (rtx op copy_addr_to_reg (XEXP (operands[1], 0))); } +/* Generate a vector of constants to permute MODE for a little-endian + storage operation by swapping the two halves of a vector. */ +static rtvec +rs6000_const_vec (enum machine_mode mode) +{ + int i, subparts; + rtvec v; + + switch (mode) + { + case V2DFmode: + case V2DImode: + subparts = 2; + break; + case V4SFmode: + case V4SImode: + subparts = 4; + break; + case V8HImode: + subparts = 8; + break; + case V16QImode: + subparts = 16; + break; + default: + gcc_unreachable(); + } + + v = rtvec_alloc (subparts); + + for (i = 0; i < subparts / 2; ++i) + RTVEC_ELT (v, i) = gen_rtx_CONST_INT (DImode, i + subparts / 2); + for (i = subparts / 2; i < subparts; ++i) + RTVEC_ELT (v, i) = gen_rtx_CONST_INT (DImode, i - subparts / 2); + + return v; +} + +/* Generate a permute rtx that represents an lxvd2x, stxvd2x, or xxpermdi + for a VSX load or store operation. */ +rtx +rs6000_gen_le_vsx_permute (rtx source, enum machine_mode mode) +{ + rtx par = gen_rtx_PARALLEL (VOIDmode, rs6000_const_vec (mode)); + return gen_rtx_VEC_SELECT (mode, source, par); +} + +/* Emit a little-endian load from vector memory location SOURCE to VSX + register DEST in mode MODE. The load is done with two permuting + insn's that represent an lxvd2x and xxpermdi. */ +void +rs6000_emit_le_vsx_load (rtx dest, rtx source, enum machine_mode mode) +{ + rtx tmp = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (dest) : dest; + rtx permute_mem = rs6000_gen_le_vsx_permute (source, mode); + rtx permute_reg = rs6000_gen_le_vsx_permute (tmp, mode); + emit_insn (gen_rtx_SET (VOIDmode, tmp, permute_mem)); + emit_insn (gen_rtx_SET (VOIDmode, dest, permute_reg)); +} + +/* Emit a little-endian store to vector memory location DEST from VSX + register SOURCE in mode MODE. The store is done with two permuting + insn's that represent an xxpermdi and an stxvd2x. */ +void +rs6000_emit_le_vsx_store (rtx dest, rtx source, enum machine_mode mode) +{ + rtx tmp = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (source) : source; + rtx permute_src = rs6000_gen_le_vsx_permute (source, mode); + rtx permute_tmp = rs6000_gen_le_vsx_permute (tmp, mode); + emit_insn (gen_rtx_SET (VOIDmode, tmp, permute_src)); + emit_insn (gen_rtx_SET (VOIDmode, dest, permute_tmp)); +} + +/* Emit a sequence representing a little-endian VSX load or store, + moving data from SOURCE to DEST in mode MODE. This is done + separately from rs6000_emit_move to ensure it is called only + during expand. LE VSX loads and stores introduced later are + handled with a split. The expand-time RTL generation allows + us to optimize away redundant pairs of register-permutes. */ +void +rs6000_emit_le_vsx_move (rtx dest, rtx source, enum machine_mode mode) +{ + gcc_assert (!BYTES_BIG_ENDIAN + && VECTOR_MEM_VSX_P (mode) + && mode != TImode + && !gpr_or_gpr_p (dest, source) + && (MEM_P (source) ^ MEM_P (dest))); + + if (MEM_P (source)) + { + gcc_assert (REG_P (dest)); + rs6000_emit_le_vsx_load (dest, source, mode); + } + else + { + if (!REG_P (source)) + source = force_reg (mode, source); + rs6000_emit_le_vsx_store (dest, source, mode); + } +} + /* Emit a move from SOURCE to DEST in mode MODE. */ void rs6000_emit_move (rtx dest, rtx source, enum machine_mode mode) @@ -12589,7 +12706,8 @@ rs6000_expand_builtin (tree exp, rtx tar case ALTIVEC_BUILTIN_MASK_FOR_LOAD: case ALTIVEC_BUILTIN_MASK_FOR_STORE: { - int icode = (int) CODE_FOR_altivec_lvsr; + int icode = (BYTES_BIG_ENDIAN ? (int) CODE_FOR_altivec_lvsr + : (int) CODE_FOR_altivec_lvsl); enum machine_mode tmode = insn_data[icode].operand[0].mode; enum machine_mode mode = insn_data[icode].operand[1].mode; tree arg; @@ -20880,7 +20998,7 @@ output_probe_stack_range (rtx reg1, rtx static rtx rs6000_frame_related (rtx insn, rtx reg, HOST_WIDE_INT val, - rtx reg2, rtx rreg) + rtx reg2, rtx rreg, rtx split_reg) { rtx real, temp; @@ -20971,6 +21089,11 @@ rs6000_frame_related (rtx insn, rtx reg, } } + /* If a store insn has been split into multiple insns, the + true source register is given by split_reg. */ + if (split_reg != NULL_RTX) + real = gen_rtx_SET (VOIDmode, SET_DEST (real), split_reg); + RTX_FRAME_RELATED_P (insn) = 1; add_reg_note (insn, REG_FRAME_RELATED_EXPR, real); @@ -21078,7 +21201,7 @@ emit_frame_save (rtx frame_reg, enum mac reg = gen_rtx_REG (mode, regno); insn = emit_insn (gen_frame_store (reg, frame_reg, offset)); return rs6000_frame_related (insn, frame_reg, frame_reg_to_sp, - NULL_RTX, NULL_RTX); + NULL_RTX, NULL_RTX, NULL_RTX); } /* Emit an offset memory reference suitable for a frame store, while @@ -21599,7 +21722,7 @@ rs6000_emit_prologue (void) insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p)); rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off, - treg, GEN_INT (-info->total_size)); + treg, GEN_INT (-info->total_size), NULL_RTX); sp_off = frame_off = info->total_size; } @@ -21684,7 +21807,7 @@ rs6000_emit_prologue (void) insn = emit_move_insn (mem, reg); rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off, - NULL_RTX, NULL_RTX); + NULL_RTX, NULL_RTX, NULL_RTX); END_USE (0); } } @@ -21752,7 +21875,7 @@ rs6000_emit_prologue (void) info->lr_save_offset, DFmode, sel); rs6000_frame_related (insn, ptr_reg, sp_off, - NULL_RTX, NULL_RTX); + NULL_RTX, NULL_RTX, NULL_RTX); if (lr) END_USE (0); } @@ -21831,7 +21954,7 @@ rs6000_emit_prologue (void) SAVRES_SAVE | SAVRES_GPR); rs6000_frame_related (insn, spe_save_area_ptr, sp_off - save_off, - NULL_RTX, NULL_RTX); + NULL_RTX, NULL_RTX, NULL_RTX); } /* Move the static chain pointer back. */ @@ -21881,7 +22004,7 @@ rs6000_emit_prologue (void) info->lr_save_offset + ptr_off, reg_mode, sel); rs6000_frame_related (insn, ptr_reg, sp_off - ptr_off, - NULL_RTX, NULL_RTX); + NULL_RTX, NULL_RTX, NULL_RTX); if (lr) END_USE (0); } @@ -21897,7 +22020,7 @@ rs6000_emit_prologue (void) info->gp_save_offset + frame_off + reg_size * i); insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p)); rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off, - NULL_RTX, NULL_RTX); + NULL_RTX, NULL_RTX, NULL_RTX); } else if (!WORLD_SAVE_P (info)) { @@ -22124,7 +22247,7 @@ rs6000_emit_prologue (void) info->altivec_save_offset + ptr_off, 0, V4SImode, SAVRES_SAVE | SAVRES_VR); rs6000_frame_related (insn, scratch_reg, sp_off - ptr_off, - NULL_RTX, NULL_RTX); + NULL_RTX, NULL_RTX, NULL_RTX); if (REGNO (frame_reg_rtx) == REGNO (scratch_reg)) { /* The oddity mentioned above clobbered our frame reg. */ @@ -22140,7 +22263,7 @@ rs6000_emit_prologue (void) for (i = info->first_altivec_reg_save; i <= LAST_ALTIVEC_REGNO; ++i) if (info->vrsave_mask & ALTIVEC_REG_BIT (i)) { - rtx areg, savereg, mem; + rtx areg, savereg, mem, split_reg; int offset; offset = (info->altivec_save_offset + frame_off @@ -22158,8 +22281,18 @@ rs6000_emit_prologue (void) insn = emit_move_insn (mem, savereg); + /* When we split a VSX store into two insns, we need to make + sure the DWARF info knows which register we are storing. + Pass it in to be used on the appropriate note. */ + if (!BYTES_BIG_ENDIAN + && GET_CODE (PATTERN (insn)) == SET + && GET_CODE (SET_SRC (PATTERN (insn))) == VEC_SELECT) + split_reg = savereg; + else + split_reg = NULL_RTX; + rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off, - areg, GEN_INT (offset)); + areg, GEN_INT (offset), split_reg); } } @@ -28813,6 +28946,136 @@ rs6000_emit_parity (rtx dst, rtx src) } } +/* Expand an Altivec constant permutation for little endian mode. + There are two issues: First, the two input operands must be + swapped so that together they form a double-wide array in LE + order. Second, the vperm instruction has surprising behavior + in LE mode: it interprets the elements of the source vectors + in BE mode ("left to right") and interprets the elements of + the destination vector in LE mode ("right to left"). To + correct for this, we must subtract each element of the permute + control vector from 31. + + For example, suppose we want to concatenate vr10 = {0, 1, 2, 3} + with vr11 = {4, 5, 6, 7} and extract {0, 2, 4, 6} using a vperm. + We place {0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27} in vr12 to + serve as the permute control vector. Then, in BE mode, + + vperm 9,10,11,12 + + places the desired result in vr9. However, in LE mode the + vector contents will be + + vr10 = 00000003 00000002 00000001 00000000 + vr11 = 00000007 00000006 00000005 00000004 + + The result of the vperm using the same permute control vector is + + vr9 = 05000000 07000000 01000000 03000000 + + That is, the leftmost 4 bytes of vr10 are interpreted as the + source for the rightmost 4 bytes of vr9, and so on. + + If we change the permute control vector to + + vr12 = {31,20,29,28,23,22,21,20,15,14,13,12,7,6,5,4} + + and issue + + vperm 9,11,10,12 + + we get the desired + + vr9 = 00000006 00000004 00000002 00000000. */ + +void +altivec_expand_vec_perm_const_le (rtx operands[4]) +{ + unsigned int i; + rtx perm[16]; + rtx constv, unspec; + rtx target = operands[0]; + rtx op0 = operands[1]; + rtx op1 = operands[2]; + rtx sel = operands[3]; + + /* Unpack and adjust the constant selector. */ + for (i = 0; i < 16; ++i) + { + rtx e = XVECEXP (sel, 0, i); + unsigned int elt = 31 - (INTVAL (e) & 31); + perm[i] = GEN_INT (elt); + } + + /* Expand to a permute, swapping the inputs and using the + adjusted selector. */ + if (!REG_P (op0)) + op0 = force_reg (V16QImode, op0); + if (!REG_P (op1)) + op1 = force_reg (V16QImode, op1); + + constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm)); + constv = force_reg (V16QImode, constv); + unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, constv), + UNSPEC_VPERM); + if (!REG_P (target)) + { + rtx tmp = gen_reg_rtx (V16QImode); + emit_move_insn (tmp, unspec); + unspec = tmp; + } + + emit_move_insn (target, unspec); +} + +/* Similarly to altivec_expand_vec_perm_const_le, we must adjust the + permute control vector. But here it's not a constant, so we must + generate a vector splat/subtract to do the adjustment. */ + +void +altivec_expand_vec_perm_le (rtx operands[4]) +{ + rtx splat, unspec; + rtx target = operands[0]; + rtx op0 = operands[1]; + rtx op1 = operands[2]; + rtx sel = operands[3]; + rtx tmp = target; + + /* Get everything in regs so the pattern matches. */ + if (!REG_P (op0)) + op0 = force_reg (V16QImode, op0); + if (!REG_P (op1)) + op1 = force_reg (V16QImode, op1); + if (!REG_P (sel)) + sel = force_reg (V16QImode, sel); + if (!REG_P (target)) + tmp = gen_reg_rtx (V16QImode); + + /* SEL = splat(31) - SEL. */ + /* We want to subtract from 31, but we can't vspltisb 31 since + it's out of range. -1 works as well because only the low-order + five bits of the permute control vector elements are used. */ + splat = gen_rtx_VEC_DUPLICATE (V16QImode, + gen_rtx_CONST_INT (QImode, -1)); + emit_move_insn (tmp, splat); + sel = gen_rtx_MINUS (V16QImode, tmp, sel); + emit_move_insn (tmp, sel); + + /* Permute with operands reversed and adjusted selector. */ + unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, tmp), + UNSPEC_VPERM); + + /* Copy into target, possibly by way of a register. */ + if (!REG_P (target)) + { + emit_move_insn (tmp, unspec); + unspec = tmp; + } + + emit_move_insn (target, unspec); +} + /* Expand an Altivec constant permutation. Return true if we match an efficient implementation; false to fall back to VPERM. */ @@ -28829,17 +29092,23 @@ altivec_expand_vec_perm_const (rtx opera { 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } }, { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum, { 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } }, - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb, + { OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb : CODE_FOR_altivec_vmrglb, { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh, + { OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh : CODE_FOR_altivec_vmrglh, { 0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23 } }, - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw, + { OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw : CODE_FOR_altivec_vmrglw, { 0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23 } }, - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb, + { OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb : CODE_FOR_altivec_vmrghb, { 8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31 } }, - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh, + { OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh : CODE_FOR_altivec_vmrghh, { 8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31 } }, - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw, + { OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw : CODE_FOR_altivec_vmrghw, { 8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } }, { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew, { 0, 1, 2, 3, 16, 17, 18, 19, 8, 9, 10, 11, 24, 25, 26, 27 } }, @@ -28901,6 +29170,8 @@ altivec_expand_vec_perm_const (rtx opera break; if (i == 16) { + if (!BYTES_BIG_ENDIAN) + elt = 15 - elt; emit_insn (gen_altivec_vspltb (target, op0, GEN_INT (elt))); return true; } @@ -28912,9 +29183,10 @@ altivec_expand_vec_perm_const (rtx opera break; if (i == 16) { + int field = BYTES_BIG_ENDIAN ? elt / 2 : 7 - elt / 2; x = gen_reg_rtx (V8HImode); emit_insn (gen_altivec_vsplth (x, gen_lowpart (V8HImode, op0), - GEN_INT (elt / 2))); + GEN_INT (field))); emit_move_insn (target, gen_lowpart (V16QImode, x)); return true; } @@ -28930,9 +29202,10 @@ altivec_expand_vec_perm_const (rtx opera break; if (i == 16) { + int field = BYTES_BIG_ENDIAN ? elt / 4 : 3 - elt / 4; x = gen_reg_rtx (V4SImode); emit_insn (gen_altivec_vspltw (x, gen_lowpart (V4SImode, op0), - GEN_INT (elt / 4))); + GEN_INT (field))); emit_move_insn (target, gen_lowpart (V16QImode, x)); return true; } @@ -28970,7 +29243,30 @@ altivec_expand_vec_perm_const (rtx opera enum machine_mode omode = insn_data[icode].operand[0].mode; enum machine_mode imode = insn_data[icode].operand[1].mode; - if (swapped) + /* For little-endian, don't use vpkuwum and vpkuhum if the + underlying vector type is not V4SI and V8HI, respectively. + For example, using vpkuwum with a V8HI picks up the even + halfwords (BE numbering) when the even halfwords (LE + numbering) are what we need. */ + if (!BYTES_BIG_ENDIAN + && icode == CODE_FOR_altivec_vpkuwum + && ((GET_CODE (op0) == REG + && GET_MODE (op0) != V4SImode) + || (GET_CODE (op0) == SUBREG + && GET_MODE (XEXP (op0, 0)) != V4SImode))) + continue; + if (!BYTES_BIG_ENDIAN + && icode == CODE_FOR_altivec_vpkuhum + && ((GET_CODE (op0) == REG + && GET_MODE (op0) != V8HImode) + || (GET_CODE (op0) == SUBREG + && GET_MODE (XEXP (op0, 0)) != V8HImode))) + continue; + + /* For little-endian, the two input operands must be swapped + (or swapped back) to ensure proper right-to-left numbering + from 0 to 2N-1. */ + if (swapped ^ !BYTES_BIG_ENDIAN) x = op0, op0 = op1, op1 = x; if (imode != V16QImode) { @@ -28988,6 +29284,12 @@ altivec_expand_vec_perm_const (rtx opera } } + if (!BYTES_BIG_ENDIAN) + { + altivec_expand_vec_perm_const_le (operands); + return true; + } + return false; } @@ -29037,6 +29339,21 @@ rs6000_expand_vec_perm_const_1 (rtx targ gcc_assert (GET_MODE_NUNITS (vmode) == 2); dmode = mode_for_vector (GET_MODE_INNER (vmode), 4); + /* For little endian, swap operands and invert/swap selectors + to get the correct xxpermdi. The operand swap sets up the + inputs as a little endian array. The selectors are swapped + because they are defined to use big endian ordering. The + selectors are inverted to get the correct doublewords for + little endian ordering. */ + if (!BYTES_BIG_ENDIAN) + { + int n; + perm0 = 3 - perm0; + perm1 = 3 - perm1; + n = perm0, perm0 = perm1, perm1 = n; + x = op0, op0 = op1, op1 = x; + } + x = gen_rtx_VEC_CONCAT (dmode, op0, op1); v = gen_rtvec (2, GEN_INT (perm0), GEN_INT (perm1)); x = gen_rtx_VEC_SELECT (vmode, x, gen_rtx_PARALLEL (VOIDmode, v)); @@ -29132,7 +29449,7 @@ rs6000_expand_interleave (rtx target, rt unsigned i, high, nelt = GET_MODE_NUNITS (vmode); rtx perm[16]; - high = (highp == BYTES_BIG_ENDIAN ? 0 : nelt / 2); + high = (highp ? 0 : nelt / 2); for (i = 0; i < nelt / 2; i++) { perm[i * 2] = GEN_INT (i + high); Index: gcc-4_8-test/gcc/config/rs6000/vector.md =================================================================== --- gcc-4_8-test.orig/gcc/config/rs6000/vector.md +++ gcc-4_8-test/gcc/config/rs6000/vector.md @@ -88,7 +88,8 @@ (smax "smax")]) -;; Vector move instructions. +;; Vector move instructions. Little-endian VSX loads and stores require +;; special handling to circumvent "element endianness." (define_expand "mov<mode>" [(set (match_operand:VEC_M 0 "nonimmediate_operand" "") (match_operand:VEC_M 1 "any_operand" ""))] @@ -104,6 +105,16 @@ && !vlogical_operand (operands[1], <MODE>mode)) operands[1] = force_reg (<MODE>mode, operands[1]); } + if (!BYTES_BIG_ENDIAN + && VECTOR_MEM_VSX_P (<MODE>mode) + && <MODE>mode != TImode + && !gpr_or_gpr_p (operands[0], operands[1]) + && (memory_operand (operands[0], <MODE>mode) + ^ memory_operand (operands[1], <MODE>mode))) + { + rs6000_emit_le_vsx_move (operands[0], operands[1], <MODE>mode); + DONE; + } }) ;; Generic vector floating point load/store instructions. These will match @@ -862,7 +873,7 @@ { rtx reg = gen_reg_rtx (V4SFmode); - rs6000_expand_interleave (reg, operands[1], operands[1], true); + rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN); emit_insn (gen_vsx_xvcvspdp (operands[0], reg)); DONE; }) @@ -874,7 +885,7 @@ { rtx reg = gen_reg_rtx (V4SFmode); - rs6000_expand_interleave (reg, operands[1], operands[1], false); + rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN); emit_insn (gen_vsx_xvcvspdp (operands[0], reg)); DONE; }) @@ -886,7 +897,7 @@ { rtx reg = gen_reg_rtx (V4SImode); - rs6000_expand_interleave (reg, operands[1], operands[1], true); + rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN); emit_insn (gen_vsx_xvcvsxwdp (operands[0], reg)); DONE; }) @@ -898,7 +909,7 @@ { rtx reg = gen_reg_rtx (V4SImode); - rs6000_expand_interleave (reg, operands[1], operands[1], false); + rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN); emit_insn (gen_vsx_xvcvsxwdp (operands[0], reg)); DONE; }) @@ -910,7 +921,7 @@ { rtx reg = gen_reg_rtx (V4SImode); - rs6000_expand_interleave (reg, operands[1], operands[1], true); + rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN); emit_insn (gen_vsx_xvcvuxwdp (operands[0], reg)); DONE; }) @@ -922,7 +933,7 @@ { rtx reg = gen_reg_rtx (V4SImode); - rs6000_expand_interleave (reg, operands[1], operands[1], false); + rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN); emit_insn (gen_vsx_xvcvuxwdp (operands[0], reg)); DONE; }) @@ -936,8 +947,19 @@ (match_operand:V16QI 3 "vlogical_operand" "")] "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)" { - emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1], operands[2], - operands[3])); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1], + operands[2], operands[3])); + else + { + /* We have changed lvsr to lvsl, so to complete the transformation + of vperm for LE, we must swap the inputs. */ + rtx unspec = gen_rtx_UNSPEC (<MODE>mode, + gen_rtvec (3, operands[2], + operands[1], operands[3]), + UNSPEC_VPERM); + emit_move_insn (operands[0], unspec); + } DONE; }) Index: gcc-4_8-test/gcc/config/rs6000/altivec.md =================================================================== --- gcc-4_8-test.orig/gcc/config/rs6000/altivec.md +++ gcc-4_8-test/gcc/config/rs6000/altivec.md @@ -649,7 +649,7 @@ convert_move (small_swap, swap, 0); low_product = gen_reg_rtx (V4SImode); - emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two)); + emit_insn (gen_altivec_vmulouh (low_product, one, two)); high_product = gen_reg_rtx (V4SImode); emit_insn (gen_altivec_vmsumuhm (high_product, one, small_swap, zero)); @@ -676,10 +676,18 @@ emit_insn (gen_vec_widen_smult_even_v8hi (even, operands[1], operands[2])); emit_insn (gen_vec_widen_smult_odd_v8hi (odd, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw (high, even, odd)); - emit_insn (gen_altivec_vmrglw (low, even, odd)); - - emit_insn (gen_altivec_vpkuwum (operands[0], high, low)); + if (BYTES_BIG_ENDIAN) + { + emit_insn (gen_altivec_vmrghw (high, even, odd)); + emit_insn (gen_altivec_vmrglw (low, even, odd)); + emit_insn (gen_altivec_vpkuwum (operands[0], high, low)); + } + else + { + emit_insn (gen_altivec_vmrghw (high, odd, even)); + emit_insn (gen_altivec_vmrglw (low, odd, even)); + emit_insn (gen_altivec_vpkuwum (operands[0], low, high)); + } DONE; }") @@ -967,7 +975,111 @@ "vmrgow %0,%1,%2" [(set_attr "type" "vecperm")]) -(define_insn "vec_widen_umult_even_v16qi" +(define_expand "vec_widen_umult_even_v16qi" + [(use (match_operand:V8HI 0 "register_operand" "")) + (use (match_operand:V16QI 1 "register_operand" "")) + (use (match_operand:V16QI 2 "register_operand" ""))] + "TARGET_ALTIVEC" +{ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmuleub (operands[0], operands[1], operands[2])); + else + emit_insn (gen_altivec_vmuloub (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "vec_widen_smult_even_v16qi" + [(use (match_operand:V8HI 0 "register_operand" "")) + (use (match_operand:V16QI 1 "register_operand" "")) + (use (match_operand:V16QI 2 "register_operand" ""))] + "TARGET_ALTIVEC" +{ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmulesb (operands[0], operands[1], operands[2])); + else + emit_insn (gen_altivec_vmulosb (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "vec_widen_umult_even_v8hi" + [(use (match_operand:V4SI 0 "register_operand" "")) + (use (match_operand:V8HI 1 "register_operand" "")) + (use (match_operand:V8HI 2 "register_operand" ""))] + "TARGET_ALTIVEC" +{ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmuleuh (operands[0], operands[1], operands[2])); + else + emit_insn (gen_altivec_vmulouh (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "vec_widen_smult_even_v8hi" + [(use (match_operand:V4SI 0 "register_operand" "")) + (use (match_operand:V8HI 1 "register_operand" "")) + (use (match_operand:V8HI 2 "register_operand" ""))] + "TARGET_ALTIVEC" +{ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmulesh (operands[0], operands[1], operands[2])); + else + emit_insn (gen_altivec_vmulosh (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "vec_widen_umult_odd_v16qi" + [(use (match_operand:V8HI 0 "register_operand" "")) + (use (match_operand:V16QI 1 "register_operand" "")) + (use (match_operand:V16QI 2 "register_operand" ""))] + "TARGET_ALTIVEC" +{ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmuloub (operands[0], operands[1], operands[2])); + else + emit_insn (gen_altivec_vmuleub (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "vec_widen_smult_odd_v16qi" + [(use (match_operand:V8HI 0 "register_operand" "")) + (use (match_operand:V16QI 1 "register_operand" "")) + (use (match_operand:V16QI 2 "register_operand" ""))] + "TARGET_ALTIVEC" +{ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmulosb (operands[0], operands[1], operands[2])); + else + emit_insn (gen_altivec_vmulesb (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "vec_widen_umult_odd_v8hi" + [(use (match_operand:V4SI 0 "register_operand" "")) + (use (match_operand:V8HI 1 "register_operand" "")) + (use (match_operand:V8HI 2 "register_operand" ""))] + "TARGET_ALTIVEC" +{ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmulouh (operands[0], operands[1], operands[2])); + else + emit_insn (gen_altivec_vmuleuh (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "vec_widen_smult_odd_v8hi" + [(use (match_operand:V4SI 0 "register_operand" "")) + (use (match_operand:V8HI 1 "register_operand" "")) + (use (match_operand:V8HI 2 "register_operand" ""))] + "TARGET_ALTIVEC" +{ + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmulosh (operands[0], operands[1], operands[2])); + else + emit_insn (gen_altivec_vmulesh (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_insn "altivec_vmuleub" [(set (match_operand:V8HI 0 "register_operand" "=v") (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v") (match_operand:V16QI 2 "register_operand" "v")] @@ -976,43 +1088,25 @@ "vmuleub %0,%1,%2" [(set_attr "type" "veccomplex")]) -(define_insn "vec_widen_smult_even_v16qi" +(define_insn "altivec_vmuloub" [(set (match_operand:V8HI 0 "register_operand" "=v") (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v") (match_operand:V16QI 2 "register_operand" "v")] - UNSPEC_VMULESB))] - "TARGET_ALTIVEC" - "vmulesb %0,%1,%2" - [(set_attr "type" "veccomplex")]) - -(define_insn "vec_widen_umult_even_v8hi" - [(set (match_operand:V4SI 0 "register_operand" "=v") - (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v") - (match_operand:V8HI 2 "register_operand" "v")] - UNSPEC_VMULEUH))] - "TARGET_ALTIVEC" - "vmuleuh %0,%1,%2" - [(set_attr "type" "veccomplex")]) - -(define_insn "vec_widen_smult_even_v8hi" - [(set (match_operand:V4SI 0 "register_operand" "=v") - (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v") - (match_operand:V8HI 2 "register_operand" "v")] - UNSPEC_VMULESH))] + UNSPEC_VMULOUB))] "TARGET_ALTIVEC" - "vmulesh %0,%1,%2" + "vmuloub %0,%1,%2" [(set_attr "type" "veccomplex")]) -(define_insn "vec_widen_umult_odd_v16qi" +(define_insn "altivec_vmulesb" [(set (match_operand:V8HI 0 "register_operand" "=v") (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v") (match_operand:V16QI 2 "register_operand" "v")] - UNSPEC_VMULOUB))] + UNSPEC_VMULESB))] "TARGET_ALTIVEC" - "vmuloub %0,%1,%2" + "vmulesb %0,%1,%2" [(set_attr "type" "veccomplex")]) -(define_insn "vec_widen_smult_odd_v16qi" +(define_insn "altivec_vmulosb" [(set (match_operand:V8HI 0 "register_operand" "=v") (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v") (match_operand:V16QI 2 "register_operand" "v")] @@ -1021,7 +1115,16 @@ "vmulosb %0,%1,%2" [(set_attr "type" "veccomplex")]) -(define_insn "vec_widen_umult_odd_v8hi" +(define_insn "altivec_vmuleuh" + [(set (match_operand:V4SI 0 "register_operand" "=v") + (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v") + (match_operand:V8HI 2 "register_operand" "v")] + UNSPEC_VMULEUH))] + "TARGET_ALTIVEC" + "vmuleuh %0,%1,%2" + [(set_attr "type" "veccomplex")]) + +(define_insn "altivec_vmulouh" [(set (match_operand:V4SI 0 "register_operand" "=v") (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v") (match_operand:V8HI 2 "register_operand" "v")] @@ -1030,7 +1133,16 @@ "vmulouh %0,%1,%2" [(set_attr "type" "veccomplex")]) -(define_insn "vec_widen_smult_odd_v8hi" +(define_insn "altivec_vmulesh" + [(set (match_operand:V4SI 0 "register_operand" "=v") + (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v") + (match_operand:V8HI 2 "register_operand" "v")] + UNSPEC_VMULESH))] + "TARGET_ALTIVEC" + "vmulesh %0,%1,%2" + [(set_attr "type" "veccomplex")]) + +(define_insn "altivec_vmulosh" [(set (match_operand:V4SI 0 "register_operand" "=v") (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v") (match_operand:V8HI 2 "register_operand" "v")] @@ -1047,7 +1159,13 @@ (match_operand:V4SI 2 "register_operand" "v")] UNSPEC_VPKPX))] "TARGET_ALTIVEC" - "vpkpx %0,%1,%2" + "* + { + if (BYTES_BIG_ENDIAN) + return \"vpkpx %0,%1,%2\"; + else + return \"vpkpx %0,%2,%1\"; + }" [(set_attr "type" "vecperm")]) (define_insn "altivec_vpks<VI_char>ss" @@ -1056,7 +1174,13 @@ (match_operand:VP 2 "register_operand" "v")] UNSPEC_VPACK_SIGN_SIGN_SAT))] "<VI_unit>" - "vpks<VI_char>ss %0,%1,%2" + "* + { + if (BYTES_BIG_ENDIAN) + return \"vpks<VI_char>ss %0,%1,%2\"; + else + return \"vpks<VI_char>ss %0,%2,%1\"; + }" [(set_attr "type" "vecperm")]) (define_insn "altivec_vpks<VI_char>us" @@ -1065,7 +1189,13 @@ (match_operand:VP 2 "register_operand" "v")] UNSPEC_VPACK_SIGN_UNS_SAT))] "<VI_unit>" - "vpks<VI_char>us %0,%1,%2" + "* + { + if (BYTES_BIG_ENDIAN) + return \"vpks<VI_char>us %0,%1,%2\"; + else + return \"vpks<VI_char>us %0,%2,%1\"; + }" [(set_attr "type" "vecperm")]) (define_insn "altivec_vpku<VI_char>us" @@ -1074,7 +1204,13 @@ (match_operand:VP 2 "register_operand" "v")] UNSPEC_VPACK_UNS_UNS_SAT))] "<VI_unit>" - "vpku<VI_char>us %0,%1,%2" + "* + { + if (BYTES_BIG_ENDIAN) + return \"vpku<VI_char>us %0,%1,%2\"; + else + return \"vpku<VI_char>us %0,%2,%1\"; + }" [(set_attr "type" "vecperm")]) (define_insn "altivec_vpku<VI_char>um" @@ -1083,7 +1219,13 @@ (match_operand:VP 2 "register_operand" "v")] UNSPEC_VPACK_UNS_UNS_MOD))] "<VI_unit>" - "vpku<VI_char>um %0,%1,%2" + "* + { + if (BYTES_BIG_ENDIAN) + return \"vpku<VI_char>um %0,%1,%2\"; + else + return \"vpku<VI_char>um %0,%2,%1\"; + }" [(set_attr "type" "vecperm")]) (define_insn "*altivec_vrl<VI_char>" @@ -1276,7 +1418,12 @@ (match_operand:V16QI 3 "register_operand" "")] UNSPEC_VPERM))] "TARGET_ALTIVEC" - "") +{ + if (!BYTES_BIG_ENDIAN) { + altivec_expand_vec_perm_le (operands); + DONE; + } +}) (define_expand "vec_perm_constv16qi" [(match_operand:V16QI 0 "register_operand" "") @@ -1928,25 +2075,26 @@ rtx vzero = gen_reg_rtx (V8HImode); rtx mask = gen_reg_rtx (V16QImode); rtvec v = rtvec_alloc (16); + bool be = BYTES_BIG_ENDIAN; emit_insn (gen_altivec_vspltish (vzero, const0_rtx)); - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 0); - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 1); - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 2); - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 3); - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 4); - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 5); - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 6); - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 7); + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 7); + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 0 : 16); + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 16 : 6); + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 1 : 16); + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 5); + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 2 : 16); + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 16 : 4); + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 3 : 16); + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 3); + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 4 : 16); + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 2); + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 5 : 16); + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 1); + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 6 : 16); + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 : 0); + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 7 : 16); emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v))); emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask)); @@ -1963,25 +2111,26 @@ rtx vzero = gen_reg_rtx (V4SImode); rtx mask = gen_reg_rtx (V16QImode); rtvec v = rtvec_alloc (16); + bool be = BYTES_BIG_ENDIAN; emit_insn (gen_altivec_vspltisw (vzero, const0_rtx)); - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 17); - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 0); - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 1); - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 17); - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 2); - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 3); - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17); - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 4); - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 5); - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 17); - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 6); - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 7); + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 7); + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 17 : 6); + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 0 : 17); + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 1 : 16); + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 5); + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 17 : 4); + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 2 : 17); + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 3 : 16); + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 3); + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 17 : 2); + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 4 : 17); + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 5 : 16); + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 1); + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 : 0); + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 6 : 17); + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 7 : 16); emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v))); emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask)); @@ -1998,25 +2147,26 @@ rtx vzero = gen_reg_rtx (V8HImode); rtx mask = gen_reg_rtx (V16QImode); rtvec v = rtvec_alloc (16); + bool be = BYTES_BIG_ENDIAN; emit_insn (gen_altivec_vspltish (vzero, const0_rtx)); - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 8); - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 9); - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 10); - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 11); - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 12); - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 13); - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 14); - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 15); + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15); + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 8 : 16); + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 16 : 14); + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 9 : 16); + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13); + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 10 : 16); + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 16 : 12); + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16); + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11); + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 12 : 16); + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 10); + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16); + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 9); + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 14 : 16); + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 : 8); + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16); emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v))); emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask)); @@ -2033,25 +2183,26 @@ rtx vzero = gen_reg_rtx (V4SImode); rtx mask = gen_reg_rtx (V16QImode); rtvec v = rtvec_alloc (16); + bool be = BYTES_BIG_ENDIAN; emit_insn (gen_altivec_vspltisw (vzero, const0_rtx)); - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 17); - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 8); - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 9); - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 17); - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 10); - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 11); - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17); - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 12); - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 13); - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16); - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 17); - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 14); - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 15); + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15); + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 17 : 14); + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 8 : 17); + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 9 : 16); + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13); + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 17 : 12); + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 10 : 17); + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16); + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11); + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 17 : 10); + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 12 : 17); + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16); + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 9); + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 : 8); + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17); + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16); emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v))); emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask)); @@ -2071,7 +2222,10 @@ emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); + else + emit_insn (gen_altivec_vmrghh (operands[0], vo, ve)); DONE; }") @@ -2088,7 +2242,10 @@ emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); + else + emit_insn (gen_altivec_vmrglh (operands[0], vo, ve)); DONE; }") @@ -2105,7 +2262,10 @@ emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrghh (operands[0], ve, vo)); + else + emit_insn (gen_altivec_vmrghh (operands[0], vo, ve)); DONE; }") @@ -2122,7 +2282,10 @@ emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrglh (operands[0], ve, vo)); + else + emit_insn (gen_altivec_vmrglh (operands[0], vo, ve)); DONE; }") @@ -2139,7 +2302,10 @@ emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); + else + emit_insn (gen_altivec_vmrghw (operands[0], vo, ve)); DONE; }") @@ -2156,7 +2322,10 @@ emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrglw (operands[0], ve, vo)); + else + emit_insn (gen_altivec_vmrglw (operands[0], vo, ve)); DONE; }") @@ -2173,7 +2342,10 @@ emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrghw (operands[0], ve, vo)); + else + emit_insn (gen_altivec_vmrghw (operands[0], vo, ve)); DONE; }") @@ -2190,7 +2362,10 @@ emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2])); emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw (operands[0], ve, vo)); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrglw (operands[0], ve, vo)); + else + emit_insn (gen_altivec_vmrglw (operands[0], vo, ve)); DONE; }") Index: gcc-4_8-test/gcc/config/rs6000/rs6000-protos.h =================================================================== --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000-protos.h +++ gcc-4_8-test/gcc/config/rs6000/rs6000-protos.h @@ -56,6 +56,7 @@ extern void paired_expand_vector_init (r extern void rs6000_expand_vector_set (rtx, rtx, int); extern void rs6000_expand_vector_extract (rtx, rtx, int); extern bool altivec_expand_vec_perm_const (rtx op[4]); +extern void altivec_expand_vec_perm_le (rtx op[4]); extern bool rs6000_expand_vec_perm_const (rtx op[4]); extern void rs6000_expand_extract_even (rtx, rtx, rtx); extern void rs6000_expand_interleave (rtx, rtx, rtx, bool); @@ -122,6 +123,7 @@ extern rtx rs6000_longcall_ref (rtx); extern void rs6000_fatal_bad_address (rtx); extern rtx create_TOC_reference (rtx, rtx); extern void rs6000_split_multireg_move (rtx, rtx); +extern void rs6000_emit_le_vsx_move (rtx, rtx, enum machine_mode); extern void rs6000_emit_move (rtx, rtx, enum machine_mode); extern rtx rs6000_secondary_memory_needed_rtx (enum machine_mode); extern rtx (*rs6000_legitimize_reload_address_ptr) (rtx, enum machine_mode, Index: gcc-4_8-test/gcc/config/rs6000/vsx.md =================================================================== --- gcc-4_8-test.orig/gcc/config/rs6000/vsx.md +++ gcc-4_8-test/gcc/config/rs6000/vsx.md @@ -216,6 +216,359 @@ ]) ;; VSX moves + +;; The patterns for LE permuted loads and stores come before the general +;; VSX moves so they match first. +(define_insn_and_split "*vsx_le_perm_load_<mode>" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (match_operand:VSX_D 1 "memory_operand" "Z"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "#" + "!BYTES_BIG_ENDIAN && TARGET_VSX" + [(set (match_dup 2) + (vec_select:<MODE> + (match_dup 1) + (parallel [(const_int 1) (const_int 0)]))) + (set (match_dup 0) + (vec_select:<MODE> + (match_dup 2) + (parallel [(const_int 1) (const_int 0)])))] + " +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0]) + : operands[0]; +} + " + [(set_attr "type" "vecload") + (set_attr "length" "8")]) + +(define_insn_and_split "*vsx_le_perm_load_<mode>" + [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") + (match_operand:VSX_W 1 "memory_operand" "Z"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "#" + "!BYTES_BIG_ENDIAN && TARGET_VSX" + [(set (match_dup 2) + (vec_select:<MODE> + (match_dup 1) + (parallel [(const_int 2) (const_int 3) + (const_int 0) (const_int 1)]))) + (set (match_dup 0) + (vec_select:<MODE> + (match_dup 2) + (parallel [(const_int 2) (const_int 3) + (const_int 0) (const_int 1)])))] + " +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0]) + : operands[0]; +} + " + [(set_attr "type" "vecload") + (set_attr "length" "8")]) + +(define_insn_and_split "*vsx_le_perm_load_v8hi" + [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa") + (match_operand:V8HI 1 "memory_operand" "Z"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "#" + "!BYTES_BIG_ENDIAN && TARGET_VSX" + [(set (match_dup 2) + (vec_select:V8HI + (match_dup 1) + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3)]))) + (set (match_dup 0) + (vec_select:V8HI + (match_dup 2) + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3)])))] + " +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0]) + : operands[0]; +} + " + [(set_attr "type" "vecload") + (set_attr "length" "8")]) + +(define_insn_and_split "*vsx_le_perm_load_v16qi" + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") + (match_operand:V16QI 1 "memory_operand" "Z"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "#" + "!BYTES_BIG_ENDIAN && TARGET_VSX" + [(set (match_dup 2) + (vec_select:V16QI + (match_dup 1) + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)]))) + (set (match_dup 0) + (vec_select:V16QI + (match_dup 2) + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])))] + " +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0]) + : operands[0]; +} + " + [(set_attr "type" "vecload") + (set_attr "length" "8")]) + +(define_insn "*vsx_le_perm_store_<mode>" + [(set (match_operand:VSX_D 0 "memory_operand" "=Z") + (match_operand:VSX_D 1 "vsx_register_operand" "+wa"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "#" + [(set_attr "type" "vecstore") + (set_attr "length" "12")]) + +(define_split + [(set (match_operand:VSX_D 0 "memory_operand" "") + (match_operand:VSX_D 1 "vsx_register_operand" ""))] + "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed" + [(set (match_dup 2) + (vec_select:<MODE> + (match_dup 1) + (parallel [(const_int 1) (const_int 0)]))) + (set (match_dup 0) + (vec_select:<MODE> + (match_dup 2) + (parallel [(const_int 1) (const_int 0)])))] +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) + : operands[1]; +}) + +;; The post-reload split requires that we re-permute the source +;; register in case it is still live. +(define_split + [(set (match_operand:VSX_D 0 "memory_operand" "") + (match_operand:VSX_D 1 "vsx_register_operand" ""))] + "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed" + [(set (match_dup 1) + (vec_select:<MODE> + (match_dup 1) + (parallel [(const_int 1) (const_int 0)]))) + (set (match_dup 0) + (vec_select:<MODE> + (match_dup 1) + (parallel [(const_int 1) (const_int 0)]))) + (set (match_dup 1) + (vec_select:<MODE> + (match_dup 1) + (parallel [(const_int 1) (const_int 0)])))] + "") + +(define_insn "*vsx_le_perm_store_<mode>" + [(set (match_operand:VSX_W 0 "memory_operand" "=Z") + (match_operand:VSX_W 1 "vsx_register_operand" "+wa"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "#" + [(set_attr "type" "vecstore") + (set_attr "length" "12")]) + +(define_split + [(set (match_operand:VSX_W 0 "memory_operand" "") + (match_operand:VSX_W 1 "vsx_register_operand" ""))] + "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed" + [(set (match_dup 2) + (vec_select:<MODE> + (match_dup 1) + (parallel [(const_int 2) (const_int 3) + (const_int 0) (const_int 1)]))) + (set (match_dup 0) + (vec_select:<MODE> + (match_dup 2) + (parallel [(const_int 2) (const_int 3) + (const_int 0) (const_int 1)])))] +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) + : operands[1]; +}) + +;; The post-reload split requires that we re-permute the source +;; register in case it is still live. +(define_split + [(set (match_operand:VSX_W 0 "memory_operand" "") + (match_operand:VSX_W 1 "vsx_register_operand" ""))] + "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed" + [(set (match_dup 1) + (vec_select:<MODE> + (match_dup 1) + (parallel [(const_int 2) (const_int 3) + (const_int 0) (const_int 1)]))) + (set (match_dup 0) + (vec_select:<MODE> + (match_dup 1) + (parallel [(const_int 2) (const_int 3) + (const_int 0) (const_int 1)]))) + (set (match_dup 1) + (vec_select:<MODE> + (match_dup 1) + (parallel [(const_int 2) (const_int 3) + (const_int 0) (const_int 1)])))] + "") + +(define_insn "*vsx_le_perm_store_v8hi" + [(set (match_operand:V8HI 0 "memory_operand" "=Z") + (match_operand:V8HI 1 "vsx_register_operand" "+wa"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "#" + [(set_attr "type" "vecstore") + (set_attr "length" "12")]) + +(define_split + [(set (match_operand:V8HI 0 "memory_operand" "") + (match_operand:V8HI 1 "vsx_register_operand" ""))] + "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed" + [(set (match_dup 2) + (vec_select:V8HI + (match_dup 1) + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3)]))) + (set (match_dup 0) + (vec_select:V8HI + (match_dup 2) + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3)])))] +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) + : operands[1]; +}) + +;; The post-reload split requires that we re-permute the source +;; register in case it is still live. +(define_split + [(set (match_operand:V8HI 0 "memory_operand" "") + (match_operand:V8HI 1 "vsx_register_operand" ""))] + "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed" + [(set (match_dup 1) + (vec_select:V8HI + (match_dup 1) + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3)]))) + (set (match_dup 0) + (vec_select:V8HI + (match_dup 1) + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3)]))) + (set (match_dup 1) + (vec_select:V8HI + (match_dup 1) + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3)])))] + "") + +(define_insn "*vsx_le_perm_store_v16qi" + [(set (match_operand:V16QI 0 "memory_operand" "=Z") + (match_operand:V16QI 1 "vsx_register_operand" "+wa"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX" + "#" + [(set_attr "type" "vecstore") + (set_attr "length" "12")]) + +(define_split + [(set (match_operand:V16QI 0 "memory_operand" "") + (match_operand:V16QI 1 "vsx_register_operand" ""))] + "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed" + [(set (match_dup 2) + (vec_select:V16QI + (match_dup 1) + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)]))) + (set (match_dup 0) + (vec_select:V16QI + (match_dup 2) + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])))] +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) + : operands[1]; +}) + +;; The post-reload split requires that we re-permute the source +;; register in case it is still live. +(define_split + [(set (match_operand:V16QI 0 "memory_operand" "") + (match_operand:V16QI 1 "vsx_register_operand" ""))] + "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed" + [(set (match_dup 1) + (vec_select:V16QI + (match_dup 1) + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)]))) + (set (match_dup 0) + (vec_select:V16QI + (match_dup 1) + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)]))) + (set (match_dup 1) + (vec_select:V16QI + (match_dup 1) + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])))] + "") + + (define_insn "*vsx_mov<mode>" [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=Z,<VSr>,<VSr>,?Z,?wa,?wa,wQ,?&r,??Y,??r,??r,<VSr>,?wa,*r,v,wZ, v") (match_operand:VSX_M 1 "input_operand" "<VSr>,Z,<VSr>,wa,Z,wa,r,wQ,r,Y,r,j,j,j,W,v,wZ"))] @@ -962,7 +1315,12 @@ (match_operand:<VS_scalar> 1 "vsx_register_operand" "ws,wa") (match_operand:<VS_scalar> 2 "vsx_register_operand" "ws,wa")))] "VECTOR_MEM_VSX_P (<MODE>mode)" - "xxpermdi %x0,%x1,%x2,0" +{ + if (BYTES_BIG_ENDIAN) + return "xxpermdi %x0,%x1,%x2,0"; + else + return "xxpermdi %x0,%x2,%x1,0"; +} [(set_attr "type" "vecperm")]) ;; Special purpose concat using xxpermdi to glue two single precision values @@ -975,9 +1333,161 @@ (match_operand:SF 2 "vsx_register_operand" "f,f")] UNSPEC_VSX_CONCAT))] "VECTOR_MEM_VSX_P (V2DFmode)" - "xxpermdi %x0,%x1,%x2,0" +{ + if (BYTES_BIG_ENDIAN) + return "xxpermdi %x0,%x1,%x2,0"; + else + return "xxpermdi %x0,%x2,%x1,0"; +} + [(set_attr "type" "vecperm")]) + +;; xxpermdi for little endian loads and stores. We need several of +;; these since the form of the PARALLEL differs by mode. +(define_insn "*vsx_xxpermdi2_le_<mode>" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_select:VSX_D + (match_operand:VSX_D 1 "vsx_register_operand" "wa") + (parallel [(const_int 1) (const_int 0)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)" + "xxpermdi %x0,%x1,%x1,2" + [(set_attr "type" "vecperm")]) + +(define_insn "*vsx_xxpermdi4_le_<mode>" + [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") + (vec_select:VSX_W + (match_operand:VSX_W 1 "vsx_register_operand" "wa") + (parallel [(const_int 2) (const_int 3) + (const_int 0) (const_int 1)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)" + "xxpermdi %x0,%x1,%x1,2" + [(set_attr "type" "vecperm")]) + +(define_insn "*vsx_xxpermdi8_le_V8HI" + [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa") + (vec_select:V8HI + (match_operand:V8HI 1 "vsx_register_operand" "wa") + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)" + "xxpermdi %x0,%x1,%x1,2" + [(set_attr "type" "vecperm")]) + +(define_insn "*vsx_xxpermdi16_le_V16QI" + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") + (vec_select:V16QI + (match_operand:V16QI 1 "vsx_register_operand" "wa") + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)" + "xxpermdi %x0,%x1,%x1,2" [(set_attr "type" "vecperm")]) +;; lxvd2x for little endian loads. We need several of +;; these since the form of the PARALLEL differs by mode. +(define_insn "*vsx_lxvd2x2_le_<mode>" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_select:VSX_D + (match_operand:VSX_D 1 "memory_operand" "Z") + (parallel [(const_int 1) (const_int 0)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)" + "lxvd2x %x0,%y1" + [(set_attr "type" "vecload")]) + +(define_insn "*vsx_lxvd2x4_le_<mode>" + [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") + (vec_select:VSX_W + (match_operand:VSX_W 1 "memory_operand" "Z") + (parallel [(const_int 2) (const_int 3) + (const_int 0) (const_int 1)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)" + "lxvd2x %x0,%y1" + [(set_attr "type" "vecload")]) + +(define_insn "*vsx_lxvd2x8_le_V8HI" + [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa") + (vec_select:V8HI + (match_operand:V8HI 1 "memory_operand" "Z") + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)" + "lxvd2x %x0,%y1" + [(set_attr "type" "vecload")]) + +(define_insn "*vsx_lxvd2x16_le_V16QI" + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") + (vec_select:V16QI + (match_operand:V16QI 1 "memory_operand" "Z") + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)" + "lxvd2x %x0,%y1" + [(set_attr "type" "vecload")]) + +;; stxvd2x for little endian stores. We need several of +;; these since the form of the PARALLEL differs by mode. +(define_insn "*vsx_stxvd2x2_le_<mode>" + [(set (match_operand:VSX_D 0 "memory_operand" "=Z") + (vec_select:VSX_D + (match_operand:VSX_D 1 "vsx_register_operand" "wa") + (parallel [(const_int 1) (const_int 0)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)" + "stxvd2x %x1,%y0" + [(set_attr "type" "vecstore")]) + +(define_insn "*vsx_stxvd2x4_le_<mode>" + [(set (match_operand:VSX_W 0 "memory_operand" "=Z") + (vec_select:VSX_W + (match_operand:VSX_W 1 "vsx_register_operand" "wa") + (parallel [(const_int 2) (const_int 3) + (const_int 0) (const_int 1)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)" + "stxvd2x %x1,%y0" + [(set_attr "type" "vecstore")]) + +(define_insn "*vsx_stxvd2x8_le_V8HI" + [(set (match_operand:V8HI 0 "memory_operand" "=Z") + (vec_select:V8HI + (match_operand:V8HI 1 "vsx_register_operand" "wa") + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)" + "stxvd2x %x1,%y0" + [(set_attr "type" "vecstore")]) + +(define_insn "*vsx_stxvd2x16_le_V16QI" + [(set (match_operand:V16QI 0 "memory_operand" "=Z") + (vec_select:V16QI + (match_operand:V16QI 1 "vsx_register_operand" "wa") + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15) + (const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])))] + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)" + "stxvd2x %x1,%y0" + [(set_attr "type" "vecstore")]) + ;; Set the element of a V2DI/VD2F mode (define_insn "vsx_set_<mode>" [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?wa") @@ -987,9 +1497,10 @@ UNSPEC_VSX_SET))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - if (INTVAL (operands[3]) == 0) + int idx_first = BYTES_BIG_ENDIAN ? 0 : 1; + if (INTVAL (operands[3]) == idx_first) return \"xxpermdi %x0,%x2,%x1,1\"; - else if (INTVAL (operands[3]) == 1) + else if (INTVAL (operands[3]) == 1 - idx_first) return \"xxpermdi %x0,%x1,%x2,0\"; else gcc_unreachable (); @@ -1004,8 +1515,12 @@ [(match_operand:QI 2 "u5bit_cint_operand" "i,i,i")])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { + int fldDM; gcc_assert (UINTVAL (operands[2]) <= 1); - operands[3] = GEN_INT (INTVAL (operands[2]) << 1); + fldDM = INTVAL (operands[2]) << 1; + if (!BYTES_BIG_ENDIAN) + fldDM = 3 - fldDM; + operands[3] = GEN_INT (fldDM); return \"xxpermdi %x0,%x1,%x1,%3\"; } [(set_attr "type" "vecperm")]) @@ -1025,6 +1540,21 @@ (const_string "fpload"))) (set_attr "length" "4")]) +;; Optimize extracting element 1 from memory for little endian +(define_insn "*vsx_extract_<mode>_one_le" + [(set (match_operand:<VS_scalar> 0 "vsx_register_operand" "=ws,d,?wa") + (vec_select:<VS_scalar> + (match_operand:VSX_D 1 "indexed_or_indirect_operand" "Z,Z,Z") + (parallel [(const_int 1)])))] + "VECTOR_MEM_VSX_P (<MODE>mode) && !WORDS_BIG_ENDIAN" + "lxsd%U1x %x0,%y1" + [(set (attr "type") + (if_then_else + (match_test "update_indexed_address_mem (operands[1], VOIDmode)") + (const_string "fpload_ux") + (const_string "fpload"))) + (set_attr "length" "4")]) + ;; Extract a SF element from V4SF (define_insn_and_split "vsx_extract_v4sf" [(set (match_operand:SF 0 "vsx_register_operand" "=f,f") @@ -1045,7 +1575,7 @@ rtx op2 = operands[2]; rtx op3 = operands[3]; rtx tmp; - HOST_WIDE_INT ele = INTVAL (op2); + HOST_WIDE_INT ele = BYTES_BIG_ENDIAN ? INTVAL (op2) : 3 - INTVAL (op2); if (ele == 0) tmp = op1; Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/fusion.c =================================================================== --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/fusion.c +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/fusion.c @@ -1,5 +1,6 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ /* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */ Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr43154.c =================================================================== --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/pr43154.c +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr43154.c @@ -1,5 +1,6 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */ /* { dg-require-effective-target powerpc_vsx_ok } */ /* { dg-options "-O2 -mcpu=power7" } */ Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c =================================================================== --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c @@ -19,19 +19,6 @@ V b4(V x) return __builtin_shuffle(x, (V){ 4,5,6,7, 4,5,6,7, 4,5,6,7, 4,5,6,7, }); } -V p2(V x, V y) -{ - return __builtin_shuffle(x, y, - (V){ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 }); - -} - -V p4(V x, V y) -{ - return __builtin_shuffle(x, y, - (V){ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 }); -} - V h1(V x, V y) { return __builtin_shuffle(x, y, @@ -72,5 +59,3 @@ V l4(V x, V y) /* { dg-final { scan-assembler "vspltb" } } */ /* { dg-final { scan-assembler "vsplth" } } */ /* { dg-final { scan-assembler "vspltw" } } */ -/* { dg-final { scan-assembler "vpkuhum" } } */ -/* { dg-final { scan-assembler "vpkuwum" } } */ Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c =================================================================== --- /dev/null +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */ +/* { dg-options "-O -maltivec -mno-vsx" } */ + +typedef unsigned char V __attribute__((vector_size(16))); + +V p2(V x, V y) +{ + return __builtin_shuffle(x, y, + (V){ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 }); + +} + +V p4(V x, V y) +{ + return __builtin_shuffle(x, y, + (V){ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 }); +} + +/* { dg-final { scan-assembler-not "vperm" } } */ +/* { dg-final { scan-assembler "vpkuhum" } } */ +/* { dg-final { scan-assembler "vpkuwum" } } */ Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/eg-5.c =================================================================== --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vmx/eg-5.c +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/eg-5.c @@ -7,10 +7,17 @@ matvecmul4 (vector float c0, vector floa /* Set result to a vector of f32 0's */ vector float result = ((vector float){0.,0.,0.,0.}); +#ifdef __LITTLE_ENDIAN__ + result = vec_madd (c0, vec_splat (v, 3), result); + result = vec_madd (c1, vec_splat (v, 2), result); + result = vec_madd (c2, vec_splat (v, 1), result); + result = vec_madd (c3, vec_splat (v, 0), result); +#else result = vec_madd (c0, vec_splat (v, 0), result); result = vec_madd (c1, vec_splat (v, 1), result); result = vec_madd (c2, vec_splat (v, 2), result); result = vec_madd (c3, vec_splat (v, 3), result); +#endif return result; } Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c =================================================================== --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c @@ -13,12 +13,27 @@ #define DO_INLINE __attribute__ ((always_inline)) #define DONT_INLINE __attribute__ ((noinline)) +#ifdef __LITTLE_ENDIAN__ +static inline DO_INLINE int inline_me(vector signed short data) +{ + union {vector signed short v; signed short s[8];} u; + signed short x; + unsigned char x1, x2; + + u.v = data; + x = u.s[7]; + x1 = (x >> 8) & 0xff; + x2 = x & 0xff; + return ((x2 << 8) | x1); +} +#else static inline DO_INLINE int inline_me(vector signed short data) { union {vector signed short v; signed short s[8];} u; u.v = data; return u.s[7]; } +#endif static DONT_INLINE int foo(vector signed short data) { Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/vec-set.c =================================================================== --- /dev/null +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/vec-set.c @@ -0,0 +1,14 @@ +#include "harness.h" + +vector short +vec_set (short m) +{ + return (vector short){m, 0, 0, 0, 0, 0, 0, 0}; +} + +static void test() +{ + check (vec_all_eq (vec_set (7), + ((vector short){7, 0, 0, 0, 0, 0, 0, 0})), + "vec_set"); +} Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/3b-15.c =================================================================== --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vmx/3b-15.c +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/3b-15.c @@ -3,7 +3,11 @@ vector unsigned char f (vector unsigned char a, vector unsigned char b, vector unsigned char c) { +#ifdef __BIG_ENDIAN__ return vec_perm(a,b,c); +#else + return vec_perm(b,a,c); +#endif } static void test() @@ -12,8 +16,13 @@ static void test() 8,9,10,11,12,13,14,15}), ((vector unsigned char){70,71,72,73,74,75,76,77, 78,79,80,81,82,83,84,85}), +#ifdef __BIG_ENDIAN__ ((vector unsigned char){0x1,0x14,0x18,0x10,0x16,0x15,0x19,0x1a, 0x1c,0x1c,0x1c,0x12,0x8,0x1d,0x1b,0xe})), +#else + ((vector unsigned char){0x1e,0xb,0x7,0xf,0x9,0xa,0x6,0x5, + 0x3,0x3,0x3,0xd,0x17,0x2,0x4,0x11})), +#endif ((vector unsigned char){1,74,78,70,76,75,79,80,82,82,82,72,8,83,81,14})), "f"); } Index: gcc-4_8-test/libcpp/lex.c =================================================================== --- gcc-4_8-test.orig/libcpp/lex.c +++ gcc-4_8-test/libcpp/lex.c @@ -559,8 +559,13 @@ search_line_fast (const uchar *s, const beginning with all ones and shifting in zeros according to the mis-alignment. The LVSR instruction pulls the exact shift we want from the address. */ +#ifdef __BIG_ENDIAN__ mask = __builtin_vec_lvsr(0, s); mask = __builtin_vec_perm(zero, ones, mask); +#else + mask = __builtin_vec_lvsl(0, s); + mask = __builtin_vec_perm(ones, zero, mask); +#endif data &= mask; /* While altivec loads mask addresses, we still need to align S so @@ -624,7 +629,11 @@ search_line_fast (const uchar *s, const /* L now contains 0xff in bytes for which we matched one of the relevant characters. We can find the byte index by finding its bit index and dividing by 8. */ +#ifdef __BIG_ENDIAN__ l = __builtin_clzl(l) >> 3; +#else + l = __builtin_ctzl(l) >> 3; +#endif return s + l; #undef N Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr48258-1.c =================================================================== --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/pr48258-1.c +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr48258-1.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */ /* { dg-require-effective-target powerpc_vsx_ok } */ /* { dg-options "-O3 -mcpu=power7 -mabi=altivec -ffast-math -fno-unroll-loops" } */ /* { dg-final { scan-assembler-times "xvaddsp" 3 } } */ Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c =================================================================== --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c @@ -1,4 +1,5 @@ /* { dg-require-effective-target vect_int } */ +/* { dg-skip-if "cost too high" { powerpc*le-*-* } { "*" } { "" } } */ #include <stdarg.h> #include "../../tree-vect.h"