Message ID | alpine.DEB.2.02.1209300954110.4012@stedding.saclay.inria.fr |
---|---|
State | New |
Headers | show |
> 2012-09-09 Marc Glisse <marc.glisse@inria.fr> > > gcc/ > * simplify-rtx.c (simplify_binary_operation_1) <VEC_SELECT>: > Detect the identity. > <VEC_CONCAT>: Handle VEC_SELECTs from the same vector. > > gcc/testsuite/ > * gcc.target/i386/vect-rebuild.c: New testcase. OK if you adjust the above date and add the missing space at the end of: /* Try to merge 2 VEC_SELECTs from the same vector into a single one. */
On Mon, 1 Oct 2012, Eric Botcazou wrote: >> 2012-09-09 Marc Glisse <marc.glisse@inria.fr> >> >> gcc/ >> * simplify-rtx.c (simplify_binary_operation_1) <VEC_SELECT>: >> Detect the identity. >> <VEC_CONCAT>: Handle VEC_SELECTs from the same vector. >> >> gcc/testsuite/ >> * gcc.target/i386/vect-rebuild.c: New testcase. > > OK if you adjust the above date and add the missing space at the end of: > > /* Try to merge 2 VEC_SELECTs from the same vector into a single one. */ I was trying to avoid splitting in 2 lines, but ok I'll split. Thank you for the quick reply,
> > /* Try to merge 2 VEC_SELECTs from the same vector into a single one. */ > > I was trying to avoid splitting in 2 lines, but ok I'll split. Indeed. Then you can remove the '2' above, it doesn't add much.
On Sun, 30 Sep 2012, Marc Glisse wrote: > On Sat, 29 Sep 2012, Eric Botcazou wrote: > >>> this patch lets the compiler try to rewrite: >>> >>> (vec_concat (vec_select x [a]) (vec_select x [b])) >>> >>> as: >>> >>> vec_select x [a b] >>> >>> or even just "x" if appropriate. [...] >> Why not generalizing to all kinds of VEC_SELECTs instead of just scalar >> ones? > > Ok, I changed the patch a bit to handle arbitrary VEC_SELECTs, and moved the > identity recognition to VEC_SELECT handling (where it belonged). Testing with > non-scalar VEC_SELECTs was limited though, because they are not that easy to > generate. Also, the identity case is the only one where it actually > optimized. To handle more cases, I'd have to look through several layers of > VEC_SELECTs, which gets a bit complicated (for instance, the permutation > 0,1,3,2 will appear as a vec_concat of a vec_select(v,[0,1]) and a > vec_select(vec_select(v,[2,3]),[1,0]), or worse with a vec_concat in the > middle). It also didn't optimize 3,2,3,2, possibly because that meant > substituting the same rtx twice (I didn't go that far in gdb). Then there is > also the vec_duplicate case (I should try to replace vec_duplicate with > vec_concat in simplify-rtx to see what happens...). Still, the identity case > is nice to have. I think I may have been a bit too hasty removing the restriction to the scalar case (and even that version was possibly wrong for some targets). For instance, if I have a vec_concat of 2 permutations of a vector, this will generate a vec_select with a result twice the size of its input, which will likely fail to be recognized. For now I have only managed to reach this situation in combine, where the unrecognizable expression is simply ignored. But I think simplify-rtx is used in less forgiving places (and even in combine it could cause optimizations to be missed). My current understanding of simplify-rtx is that we should only do "safe" optimizations in it (make sure we only create expressions that every target will recognize), and if I want more advanced optimizations, I should do them elsewhere (not sure where). So I should probably at least restrict this one to the case where the result and XEXP (trueop0, 0) have the same mode. Does that make sense? Or is the current code ok and I am worrying for nothing? For reference, the conversation started here: http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00540.html and I include a copy of the relevant part of the patch that was committed: > 2012-09-09 Marc Glisse <marc.glisse@inria.fr> > > gcc/ > * simplify-rtx.c (simplify_binary_operation_1) > <VEC_CONCAT>: Handle VEC_SELECTs from the same vector. + /* Try to merge VEC_SELECTs from the same vector into a single one. */ + if (GET_CODE (trueop0) == VEC_SELECT + && GET_CODE (trueop1) == VEC_SELECT + && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0))) + { + rtx par0 = XEXP (trueop0, 1); + rtx par1 = XEXP (trueop1, 1); + int len0 = XVECLEN (par0, 0); + int len1 = XVECLEN (par1, 0); + rtvec vec = rtvec_alloc (len0 + len1); + for (int i = 0; i < len0; i++) + RTVEC_ELT (vec, i) = XVECEXP (par0, 0, i); + for (int i = 0; i < len1; i++) + RTVEC_ELT (vec, len0 + i) = XVECEXP (par1, 0, i); + return simplify_gen_binary (VEC_SELECT, mode, XEXP (trueop0, 0), + gen_rtx_PARALLEL (VOIDmode, vec)); + }
> My current understanding of simplify-rtx is that we should only do "safe" > optimizations in it (make sure we only create expressions that every > target will recognize), and if I want more advanced optimizations, I > should do them elsewhere (not sure where). So I should probably at least > restrict this one to the case where the result and XEXP (trueop0, 0) have > the same mode. Yes, simplify-rtx should only canonicalize or simplify expressions. Merging 2 VEC_SELECTs into a single one seems to fall into the second category, but if no target can take advantage of it, that's probably a bit questionable indeed.
Index: gcc/testsuite/gcc.target/i386/vect-rebuild.c =================================================================== --- gcc/testsuite/gcc.target/i386/vect-rebuild.c (revision 0) +++ gcc/testsuite/gcc.target/i386/vect-rebuild.c (revision 0) @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-O -mavx -fno-tree-forwprop" } */ + +typedef double v2df __attribute__ ((__vector_size__ (16))); +typedef double v4df __attribute__ ((__vector_size__ (32))); + +v2df f1 (v2df x) +{ + v2df xx = { x[0], x[1] }; + return xx; +} + +v4df f2 (v4df x) +{ + v4df xx = { x[0], x[1], x[2], x[3] }; + return xx; +} + +v2df g (v2df x) +{ + v2df xx = { x[1], x[0] }; + return xx; +} + +v2df h (v4df x) +{ + v2df xx = { x[2], x[3] }; + return xx; +} + +/* { dg-final { scan-assembler-not "unpck" } } */ +/* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */ +/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */ Property changes on: gcc/testsuite/gcc.target/i386/vect-rebuild.c ___________________________________________________________________ Added: svn:keywords + Author Date Id Revision URL Added: svn:eol-style + native Index: gcc/simplify-rtx.c =================================================================== --- gcc/simplify-rtx.c (revision 191865) +++ gcc/simplify-rtx.c (working copy) @@ -3239,20 +3239,37 @@ simplify_binary_operation_1 (enum rtx_co rtx x = XVECEXP (trueop1, 0, i); gcc_assert (CONST_INT_P (x)); RTVEC_ELT (v, i) = CONST_VECTOR_ELT (trueop0, INTVAL (x)); } return gen_rtx_CONST_VECTOR (mode, v); } + /* Recognize the identity. */ + if (GET_MODE (trueop0) == mode) + { + bool maybe_ident = true; + for (int i = 0; i < XVECLEN (trueop1, 0); i++) + { + rtx j = XVECEXP (trueop1, 0, i); + if (!CONST_INT_P (j) || INTVAL (j) != i) + { + maybe_ident = false; + break; + } + } + if (maybe_ident) + return trueop0; + } + /* If we build {a,b} then permute it, build the result directly. */ if (XVECLEN (trueop1, 0) == 2 && CONST_INT_P (XVECEXP (trueop1, 0, 0)) && CONST_INT_P (XVECEXP (trueop1, 0, 1)) && GET_CODE (trueop0) == VEC_CONCAT && GET_CODE (XEXP (trueop0, 0)) == VEC_CONCAT && GET_MODE (XEXP (trueop0, 0)) == mode && GET_CODE (XEXP (trueop0, 1)) == VEC_CONCAT && GET_MODE (XEXP (trueop0, 1)) == mode) { @@ -3364,20 +3381,38 @@ simplify_binary_operation_1 (enum rtx_co if (!VECTOR_MODE_P (op1_mode)) RTVEC_ELT (v, i) = trueop1; else RTVEC_ELT (v, i) = CONST_VECTOR_ELT (trueop1, i - in_n_elts); } } return gen_rtx_CONST_VECTOR (mode, v); } + + /* Try to merge 2 VEC_SELECTs from the same vector into a single one. */ + if (GET_CODE (trueop0) == VEC_SELECT + && GET_CODE (trueop1) == VEC_SELECT + && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0))) + { + rtx par0 = XEXP (trueop0, 1); + rtx par1 = XEXP (trueop1, 1); + int len0 = XVECLEN (par0, 0); + int len1 = XVECLEN (par1, 0); + rtvec vec = rtvec_alloc (len0 + len1); + for (int i = 0; i < len0; i++) + RTVEC_ELT (vec, i) = XVECEXP (par0, 0, i); + for (int i = 0; i < len1; i++) + RTVEC_ELT (vec, len0 + i) = XVECEXP (par1, 0, i); + return simplify_gen_binary (VEC_SELECT, mode, XEXP (trueop0, 0), + gen_rtx_PARALLEL (VOIDmode, vec)); + } } return 0; default: gcc_unreachable (); } return 0; }