Message ID | 4E96158E.5030106@redhat.com |
---|---|
State | New |
Headers | show |
From: Richard Henderson <rth@redhat.com> Date: Wed, 12 Oct 2011 15:32:46 -0700 > I suppose technically the middle-end could be improved to implement > ashl<mode> as vashl<mode> by broadcasting the scalar, but Altivec > is the only extant SIMD ISA that would make use of this. All of > the others can arrange for constant shifts to be encoded into the > insn, and so implement the ashl<mode> named pattern. I'm pretty sure Sparc's VIS3 can do this too, see the '<vis3_shift_insn><vbits>_vis' patterns in sparc.md
On 10/12/2011 03:37 PM, David Miller wrote: > From: Richard Henderson <rth@redhat.com> > Date: Wed, 12 Oct 2011 15:32:46 -0700 > >> I suppose technically the middle-end could be improved to implement >> ashl<mode> as vashl<mode> by broadcasting the scalar, but Altivec >> is the only extant SIMD ISA that would make use of this. All of >> the others can arrange for constant shifts to be encoded into the >> insn, and so implement the ashl<mode> named pattern. > > I'm pretty sure Sparc's VIS3 can do this too, see the > '<vis3_shift_insn><vbits>_vis' patterns in sparc.md Ok, if I read the rtl correctly, you can perform a vector shift, where each shift count comes from the corresponding element of op2. But VIS has no vector shift where the shift count comes from a single scalar (immediate or register)? If so, please rename this pattern to the "v<shift_pat_name><mode>3" form and I'll work on more middle-end support for re-use of the v<shift_pat_name> optab. r~
From: Richard Henderson <rth@redhat.com> Date: Wed, 12 Oct 2011 15:49:28 -0700 > Ok, if I read the rtl correctly, you can perform a vector shift, > where each shift count comes from the corresponding element of op2. > But VIS has no vector shift where the shift count comes from a > single scalar (immediate or register)? That's correct. > If so, please rename this pattern to the "v<shift_pat_name><mode>3" > form and I'll work on more middle-end support for re-use of the > v<shift_pat_name> optab. Will do, thanks Richard.
On Wed, Oct 12, 2011 at 6:32 PM, Richard Henderson <rth@redhat.com> wrote: > I suppose technically the middle-end could be improved to implement > ashl<mode> as vashl<mode> by broadcasting the scalar, but Altivec > is the only extant SIMD ISA that would make use of this. All of > the others can arrange for constant shifts to be encoded into the > insn, and so implement the ashl<mode> named pattern. > > Tested on ppc64-linux, --with-cpu=G5. Richard, Are there testcases in the GCC testsuite that exercise these patterns? Thanks, David
On 10/13/2011 11:36 AM, David Edelsohn wrote:
> Are there testcases in the GCC testsuite that exercise these patterns?
I thought the vectorizer would use them. E.g. gcc.dg/vect/vect-shift-3.c.
I see that I should have added ppc to check_effective_target_vect_shift_scalar,
though, to enable even more testing.
r~
On Wed, Oct 12, 2011 at 6:32 PM, Richard Henderson <rth@redhat.com> wrote: > I suppose technically the middle-end could be improved to implement > ashl<mode> as vashl<mode> by broadcasting the scalar, but Altivec > is the only extant SIMD ISA that would make use of this. All of > the others can arrange for constant shifts to be encoded into the > insn, and so implement the ashl<mode> named pattern. > > Tested on ppc64-linux, --with-cpu=G5. > > Ok? > > > r~ > > > * config/rs6000/rs6000.c (rs6000_expand_vector_broadcast): New. > * config/rs6000/rs6000-protos.h: Update. > * config/rs6000/vector.md (ashl<VEC_I>3): New. > (lshr<VEC_I>3, ashr<VEC_I>3): New. The patch is fine. Thanks, David
On Thu, Oct 13, 2011 at 11:43:35AM -0700, Richard Henderson wrote: > On 10/13/2011 11:36 AM, David Edelsohn wrote: > > Are there testcases in the GCC testsuite that exercise these patterns? > > I thought the vectorizer would use them. E.g. gcc.dg/vect/vect-shift-3.c. > > I see that I should have added ppc to check_effective_target_vect_shift_scalar, > though, to enable even more testing. I tried this patch on trunk, and I'm not seeing any changes in the code. I'll include the test case and asm as attachments. This is due to the code I put into tree-vect-generic.c (in expand_vector_operations_1) that converts between vector shift by vector and vector shift by scalar. Note, that AMD's XOP shifts are also vector/vector shifts. The code shifting by a scalar is pretty bad in that it recalcuates the splat of the shift element every time in the loop, rather than doing the splat once before the loop. We also have the problem we've had for a couple of years that if the type is signed char or signed short, the compiler wants to promote the items to int and does this by several unpacks and repacks.
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 73da0f6..4dee23f 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -55,6 +55,7 @@ extern void rs6000_expand_vector_init (rtx, rtx); extern void paired_expand_vector_init (rtx, rtx); extern void rs6000_expand_vector_set (rtx, rtx, int); extern void rs6000_expand_vector_extract (rtx, rtx, int); +extern rtx rs6000_expand_vector_broadcast (enum machine_mode, rtx); extern void build_mask64_2_operands (rtx, rtx *); extern int expand_block_clear (rtx[]); extern int expand_block_move (rtx[]); diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 63c0f0c..786736d 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -4890,6 +4890,35 @@ rs6000_expand_vector_extract (rtx target, rtx vec, int elt) emit_move_insn (target, adjust_address_nv (mem, inner_mode, 0)); } +/* Broadcast an element to all parts of a vector, loaded into a register. + Used to turn vector shifts by a scalar into vector shifts by a vector. */ + +rtx +rs6000_expand_vector_broadcast (enum machine_mode mode, rtx elt) +{ + rtx repl, vec[16]; + int i, n; + + n = GET_MODE_NUNITS (mode); + for (i = 0; i < n; ++i) + vec[i] = elt; + + if (CONSTANT_P (elt)) + { + repl = gen_rtx_CONST_VECTOR (mode, gen_rtvec_v (n, vec)); + repl = force_reg (mode, repl); + } + else + { + rtx par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (n, vec)); + repl = gen_reg_rtx (mode); + rs6000_expand_vector_init (repl, par); + } + + return repl; +} + + /* Generates shifts and masks for a pair of rldicl or rldicr insns to implement ANDing by the mask IN. */ void diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md index 0179cd9..24b473e 100644 --- a/gcc/config/rs6000/vector.md +++ b/gcc/config/rs6000/vector.md @@ -987,6 +987,16 @@ "TARGET_ALTIVEC" "") +(define_expand "ashl<mode>3" + [(set (match_operand:VEC_I 0 "vint_operand" "") + (ashift:VEC_I + (match_operand:VEC_I 1 "vint_operand" "") + (match_operand:<VEC_base> 2 "nonmemory_operand" "")))] + "TARGET_ALTIVEC" +{ + operands[2] = rs6000_expand_vector_broadcast (<MODE>mode, operands[2]); +}) + ;; Expanders for logical shift right on each vector element (define_expand "vlshr<mode>3" [(set (match_operand:VEC_I 0 "vint_operand" "") @@ -995,6 +1005,16 @@ "TARGET_ALTIVEC" "") +(define_expand "lshr<mode>3" + [(set (match_operand:VEC_I 0 "vint_operand" "") + (lshiftrt:VEC_I + (match_operand:VEC_I 1 "vint_operand" "") + (match_operand:<VEC_base> 2 "nonmemory_operand" "")))] + "TARGET_ALTIVEC" +{ + operands[2] = rs6000_expand_vector_broadcast (<MODE>mode, operands[2]); +}) + ;; Expanders for arithmetic shift right on each vector element (define_expand "vashr<mode>3" [(set (match_operand:VEC_I 0 "vint_operand" "") @@ -1002,6 +1022,16 @@ (match_operand:VEC_I 2 "vint_operand" "")))] "TARGET_ALTIVEC" "") + +(define_expand "ashr<mode>3" + [(set (match_operand:VEC_I 0 "vint_operand" "") + (ashiftrt:VEC_I + (match_operand:VEC_I 1 "vint_operand" "") + (match_operand:<VEC_base> 2 "nonmemory_operand" "")))] + "TARGET_ALTIVEC" +{ + operands[2] = rs6000_expand_vector_broadcast (<MODE>mode, operands[2]); +}) ;; Vector reduction expanders for VSX