Message ID | B71DF1153024A14EABB94E39368E44A604268D3E@SJEXCHMB13.corp.ad.broadcom.com |
---|---|
State | New |
Headers | show |
On Tue, Jan 28, 2014 at 4:17 PM, Bingfeng Mei <bmei@broadcom.com> wrote: > I checked vectorization code, it seems that only relevant place vec_widen_mult_even/odd & vec_widen_mult_lo/hi are generated is in supportable_widening_operation. One of these pairs is selected, with priority given to vec_widen_mult_even/odd if it is a reduction loop. However, lo/hi pair seems to have wider usage than even/odd pair (non-loop? Non-reduction?). Maybe that's why AltiVec and x86 still implement both pairs. Is following patch OK? Ok. Thanks, Richard. > Index: gcc/ChangeLog > =================================================================== > --- gcc/ChangeLog (revision 207183) > +++ gcc/ChangeLog (working copy) > @@ -1,3 +1,9 @@ > +2014-01-28 Bingfeng Mei <bmei@broadcom.com> > + > + * doc/md.texi: Mention that a target shouldn't implement > + vec_widen_(s|u)mul_even/odd pair if it is less efficient > + than hi/lo pair. > + > 2014-01-28 Richard Biener <rguenther@suse.de> > > Revert > Index: gcc/doc/md.texi > =================================================================== > --- gcc/doc/md.texi (revision 207183) > +++ gcc/doc/md.texi (working copy) > @@ -4918,7 +4918,8 @@ the output vector (operand 0). > Signed/Unsigned widening multiplication. The two inputs (operands 1 and 2) > are vectors with N signed/unsigned elements of size S@. Multiply the high/low > or even/odd elements of the two vectors, and put the N/2 products of size 2*S > -in the output vector (operand 0). > +in the output vector (operand 0). A target shouldn't implement even/odd pattern > +pair if it is less efficient than lo/hi one. > > @cindex @code{vec_widen_ushiftl_hi_@var{m}} instruction pattern > @cindex @code{vec_widen_ushiftl_lo_@var{m}} instruction pattern > > > -----Original Message----- > From: Richard Biener [mailto:richard.guenther@gmail.com] > Sent: 28 January 2014 12:56 > To: Bingfeng Mei > Cc: gcc@gcc.gnu.org > Subject: Re: VEC_WIDEN_MULT_(LO|HI)_EXPR vs. VEC_WIDEN_MULT_(EVEN|ODD)_EXPR in vectorization. > > On Tue, Jan 28, 2014 at 12:08 PM, Bingfeng Mei <bmei@broadcom.com> wrote: >> Thanks, Richard. It is not very clear from documents. >> >> "Signed/Unsigned widening multiplication. The two inputs (operands 1 and 2) >> are vectors with N signed/unsigned elements of size S. Multiply the high/low >> or even/odd elements of the two vectors, and put the N/2 products of size 2*S >> in the output vector (operand 0)." >> >> So I thought that implementing both can help vectorizer to optimize more loops. >> Maybe we should improve documents. > > Maybe. But my answer was from the top of my head - so better double-check > in the vectorizer sources. > > Richard. > >> Bingfeng >> >> >> >> -----Original Message----- >> From: Richard Biener [mailto:richard.guenther@gmail.com] >> Sent: 28 January 2014 11:02 >> To: Bingfeng Mei >> Cc: gcc@gcc.gnu.org >> Subject: Re: VEC_WIDEN_MULT_(LO|HI)_EXPR vs. VEC_WIDEN_MULT_(EVEN|ODD)_EXPR in vectorization. >> >> On Wed, Jan 22, 2014 at 1:20 PM, Bingfeng Mei <bmei@broadcom.com> wrote: >>> Hi, >>> I noticed there is a regression of 4.8 against ancient 4.5 in vectorization on our port. After a bit investigation, I found following code that prefer even|odd version instead of lo|hi one. This is obviously the case for AltiVec and maybe some other targets. But even|odd (expanding to a series of instructions) versions are less efficient on our target than lo|hi ones. Shouldn't there be a target-specific hook to do the choice instead of hard-coded one here, or utilizing some cost-estimating technique to compare two alternatives? >> >> Hmm, what's the reason for a target to support both? I think the idea >> was that a target only supports either (the more efficient case). >> >> Richard. >> >>> /* The result of a vectorized widening operation usually requires >>> two vectors (because the widened results do not fit into one vector). >>> The generated vector results would normally be expected to be >>> generated in the same order as in the original scalar computation, >>> i.e. if 8 results are generated in each vector iteration, they are >>> to be organized as follows: >>> vect1: [res1,res2,res3,res4], >>> vect2: [res5,res6,res7,res8]. >>> >>> However, in the special case that the result of the widening >>> operation is used in a reduction computation only, the order doesn't >>> matter (because when vectorizing a reduction we change the order of >>> the computation). Some targets can take advantage of this and >>> generate more efficient code. For example, targets like Altivec, >>> that support widen_mult using a sequence of {mult_even,mult_odd} >>> generate the following vectors: >>> vect1: [res1,res3,res5,res7], >>> vect2: [res2,res4,res6,res8]. >>> >>> When vectorizing outer-loops, we execute the inner-loop sequentially >>> (each vectorized inner-loop iteration contributes to VF outer-loop >>> iterations in parallel). We therefore don't allow to change the >>> order of the computation in the inner-loop during outer-loop >>> vectorization. */ >>> /* TODO: Another case in which order doesn't *really* matter is when we >>> widen and then contract again, e.g. (short)((int)x * y >> 8). >>> Normally, pack_trunc performs an even/odd permute, whereas the >>> repack from an even/odd expansion would be an interleave, which >>> would be significantly simpler for e.g. AVX2. */ >>> /* In any case, in order to avoid duplicating the code below, recurse >>> on VEC_WIDEN_MULT_EVEN_EXPR. If it succeeds, all the return values >>> are properly set up for the caller. If we fail, we'll continue with >>> a VEC_WIDEN_MULT_LO/HI_EXPR check. */ >>> if (vect_loop >>> && STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction >>> && !nested_in_vect_loop_p (vect_loop, stmt) >>> && supportable_widening_operation (VEC_WIDEN_MULT_EVEN_EXPR, >>> stmt, vectype_out, vectype_in, >>> code1, code2, multi_step_cvt, >>> interm_types)) >>> return true; >>> >>> >>> Thanks, >>> Bingfeng Mei
Index: gcc/ChangeLog =================================================================== --- gcc/ChangeLog (revision 207183) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,9 @@ +2014-01-28 Bingfeng Mei <bmei@broadcom.com> + + * doc/md.texi: Mention that a target shouldn't implement + vec_widen_(s|u)mul_even/odd pair if it is less efficient + than hi/lo pair. + 2014-01-28 Richard Biener <rguenther@suse.de> Revert Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi (revision 207183) +++ gcc/doc/md.texi (working copy) @@ -4918,7 +4918,8 @@ the output vector (operand 0). Signed/Unsigned widening multiplication. The two inputs (operands 1 and 2) are vectors with N signed/unsigned elements of size S@. Multiply the high/low or even/odd elements of the two vectors, and put the N/2 products of size 2*S -in the output vector (operand 0). +in the output vector (operand 0). A target shouldn't implement even/odd pattern +pair if it is less efficient than lo/hi one. @cindex @code{vec_widen_ushiftl_hi_@var{m}} instruction pattern @cindex @code{vec_widen_ushiftl_lo_@var{m}} instruction pattern